[v2] Linux: Access memory even if threads are running

Message ID 20210611180015.3877124-1-pedro@palves.net
State New
Headers show
Series
  • [v2] Linux: Access memory even if threads are running
Related show

Commit Message

Pedro Alves June 11, 2021, 6 p.m.
In v2:
  The access-mem-running.exp testcase is now single-threaded, and was
  moved from gdb.threads/ to gdb.base/.

Currently, on GNU/Linux, if you try to access memory and you have a
running thread selected, GDB fails the memory accesses, like:

 (gdb) c&
 Continuing.
 (gdb) p global_var
 Cannot access memory at address 0x555555558010

Or:

 (gdb) b main
 Breakpoint 2 at 0x55555555524d: file access-mem-running.c, line 59.
 Warning:
 Cannot insert breakpoint 2.
 Cannot access memory at address 0x55555555524d

This patch removes this limitation.  It teaches the native Linux
target to read/write memory even if the target is running.  And it
does this without temporarily stopping threads.  We now get:

 (gdb) c&
 Continuing.
 (gdb) p global_var
 $1 = 123
 (gdb) b main
 Breakpoint 2 at 0x555555555259: file access-mem-running.c, line 62.

(The scenarios above work correctly with current GDBserver, because
GDBserver temporarily stops all threads in the process whenever GDB
wants to access memory (see prepare_to_access_memory /
done_accessing_memory).  Freezing the whole process makes sense when
we need to be sure that we have a consistent view of memory and don't
race with the inferior changing it at the same time as GDB is
accessing it.  But I think that's a too-heavy hammer for the default
behavior.  I think that ideally, whether to stop all threads or not
should be policy decided by gdb core, probably best implemented by
exposing something like gdbserver's prepare_to_access_memory /
done_accessing_memory to gdb core.)

Currently, if we're accessing (reading/writing) just a few bytes, then
the Linux native backend does not try accessing memory via
/proc/<pid>/mem and goes straight to ptrace
PTRACE_PEEKTEXT/PTRACE_POKETEXT.  However, ptrace always fails when
the ptracee is running.  So the first step is to prefer
/proc/<pid>/mem even for small accesses.  Without further changes
however, that may cause a performance regression, due to constantly
opening and closing /proc/<pid>/mem for each memory access.  So the
next step is to keep the /proc/<pid>/mem file open across memory
accesses.  If we have this, then it doesn't make sense anymore to even
have the ptrace fallback, so the patch disables it.

I've made it such that GDB only ever has one /proc/<pid>/mem file open
at any time.  As long as a memory access hits the same inferior
process as the previous access, then we reuse the previously open
file.  If however, we access memory of a different process, then we
close the previous file and open a new one for the new process.

If we wanted, we could keep one /proc/<pid>/mem file open per
inferior, and never close them (unless the inferior exits or execs).
However, having seen bfd patches recently about hitting too many open
file descriptors, I kept the logic to have only one file open tops.
Also, we need to handle memory accesses for processes for which we
don't have an inferior object, for when we need to detach a
fork-child, and we'd probaly want to handle caching the open file for
that scenario (no inferior for process) too, which would probably end
up meaning caching for last non-inferior process, which is very much
what I'm proposing anyhow.  So always having one file open likely ends
up a smaller patch.

The next step is handling the case of GDB reading/writing memory
through a thread that is running and exits.  The access should not
result in a user-visible failure if the inferior/process is still
alive.

Once we manage to open a /proc/<lwpid>/mem file, then that file is
usable for memory accesses even if the corresponding lwp exits and is
reaped.  I double checked that trying to open the same
/proc/<lwpid>/mem path again fails because the lwp is really gone so
there's no /proc/<lwpid>/ entry on the filesystem anymore, but the
previously open file remains usable.  It's only when the whole process
execs that we need to reopen a new file.

When the kernel destroys the whole address space, i.e., when the
process exits or execs, the reads/writes fail with 0 aka EOF, in which
case there's nothing else to do than returning a memory access
failure.  Note this means that when we get an exec event, we need to
reopen the file, to access the process's new address space.

If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP
we're opening it for exits before we open it and before we reap the
LWP (i.e., the LWP is zombie), the open fails with EACCES.  The patch
handles this by just looking for another thread until it finds one
that we can open a /proc/<pid>/mem successfully for.

If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP
we're opening it had exited and we already reaped it, which is the
case if the selected thread is in THREAD_EXIT state, the open fails
with ENOENT.  The patch handles this the same way as a zombie race
(EACESS), instead of checking upfront whether we're accessing a
known-exited thread, because that would result in more complicated
code, because we also need to handle accessing lwps that are not
listed in the core thread list, and it's the core thread list that
records the THREAD_EXIT state.

The patch includes two testcases:

#1 - gdb.base/access-mem-running.exp

  This is the conceptually simplest - it is single-threaded, and has
  GDB read and write memory while the program is running.  It also
  tests setting a breakpoint while the program is running, and checks
  that the breakpoint is hit immediately.

#2 - gdb.threads/access-mem-running-thread-exit.exp

  This one is more elaborate, as it continuously spawns short-lived
  threads in order to exercise accessing memory just while threads are
  exiting.  It also spawns two different processes and alternates
  accessing memory between the two processes to exercise the reopening
  the /proc file frequently.  This also ends up exercising GDB reading
  from an exited thread frequently.  I confirmed by putting abort()
  calls in the EACESS/ENOENT paths added by the patch that we do hit
  all of them frequently with the testcase.  It also exits the
  process's main thread (i.e., the main thread becomes zombie), to
  make sure accessing memory in such a corner-case scenario works now
  and in the future.

The tests fail on GNU/Linux native before the code changes, and pass
after.  They pass against current GDBserver, again because GDBserver
supports memory access even if all threads are running, by
transparently pausing the whole process.

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <pedro@palves.net>

	* linux-nat.c (linux_nat_target::detach): Close the
	/proc/<pid>/mem file if it was open for this process.
	(linux_handle_extended_wait) <PTRACE_EVENT_EXEC>: Close the
	/proc/<pid>/mem file if it was open for this process.
	(linux_nat_target::mourn_inferior): Close the /proc/<pid>/mem file
	if it was open for this process.
	(linux_nat_target::xfer_partial): Adjust.  Do not fall back to
	inf_ptrace_target::xfer_partial for memory accesses.
	(last_proc_mem_file): New.
	(maybe_close_proc_mem_file): New.
	(linux_proc_xfer_memory_partial_pid): New, with bits factored out
	from linux_proc_xfer_partial.
	(linux_proc_xfer_partial): Delete.
	(linux_proc_xfer_memory_partial): New.

gdb/testsuite/ChangeLog
yyyy-mm-dd  Pedro Alves  <pedro@palves.net>

	* gdb.base/access-mem-running.c: New.
	* gdb.base/access-mem-running.exp: New.
	* gdb.threads/access-mem-running-thread-exit.c: New.
	* gdb.threads/access-mem-running-thread-exit.exp: New.

Change-Id: Ib3c082528872662a3fc0ca9b31c34d4876c874c9
---
 gdb/linux-nat.c                               | 262 ++++++++++++++----
 gdb/testsuite/gdb.base/access-mem-running.c   |  47 ++++
 gdb/testsuite/gdb.base/access-mem-running.exp | 124 +++++++++
 .../access-mem-running-thread-exit.c          | 123 ++++++++
 .../access-mem-running-thread-exit.exp        | 166 +++++++++++
 5 files changed, 675 insertions(+), 47 deletions(-)
 create mode 100644 gdb/testsuite/gdb.base/access-mem-running.c
 create mode 100644 gdb/testsuite/gdb.base/access-mem-running.exp
 create mode 100644 gdb/testsuite/gdb.threads/access-mem-running-thread-exit.c
 create mode 100644 gdb/testsuite/gdb.threads/access-mem-running-thread-exit.exp


base-commit: 873793ae09b5dcba8c8da7345ee283f296558b8e
-- 
2.26.2

Comments

Luis Machado via Gdb-patches June 25, 2021, 4:48 p.m. | #1
Hi Pedrot,

This patch LGTM, I noted some small comments below.

On 2021-06-11 2:00 p.m., Pedro Alves wrote:
> In v2:

>   The access-mem-running.exp testcase is now single-threaded, and was

>   moved from gdb.threads/ to gdb.base/.

> 

> Currently, on GNU/Linux, if you try to access memory and you have a

> running thread selected, GDB fails the memory accesses, like:

> 

>  (gdb) c&

>  Continuing.

>  (gdb) p global_var

>  Cannot access memory at address 0x555555558010

> 

> Or:

> 

>  (gdb) b main

>  Breakpoint 2 at 0x55555555524d: file access-mem-running.c, line 59.

>  Warning:

>  Cannot insert breakpoint 2.

>  Cannot access memory at address 0x55555555524d

> 

> This patch removes this limitation.  It teaches the native Linux

> target to read/write memory even if the target is running.  And it

> does this without temporarily stopping threads.  We now get:

> 

>  (gdb) c&

>  Continuing.

>  (gdb) p global_var

>  $1 = 123

>  (gdb) b main

>  Breakpoint 2 at 0x555555555259: file access-mem-running.c, line 62.

> 

> (The scenarios above work correctly with current GDBserver, because

> GDBserver temporarily stops all threads in the process whenever GDB

> wants to access memory (see prepare_to_access_memory /

> done_accessing_memory).  Freezing the whole process makes sense when

> we need to be sure that we have a consistent view of memory and don't

> race with the inferior changing it at the same time as GDB is

> accessing it.  But I think that's a too-heavy hammer for the default

> behavior.  I think that ideally, whether to stop all threads or not

> should be policy decided by gdb core, probably best implemented by

> exposing something like gdbserver's prepare_to_access_memory /

> done_accessing_memory to gdb core.)

> 

> Currently, if we're accessing (reading/writing) just a few bytes, then

> the Linux native backend does not try accessing memory via

> /proc/<pid>/mem and goes straight to ptrace

> PTRACE_PEEKTEXT/PTRACE_POKETEXT.  However, ptrace always fails when

> the ptracee is running.  So the first step is to prefer

> /proc/<pid>/mem even for small accesses.  Without further changes

> however, that may cause a performance regression, due to constantly

> opening and closing /proc/<pid>/mem for each memory access.  So the

> next step is to keep the /proc/<pid>/mem file open across memory

> accesses.  If we have this, then it doesn't make sense anymore to even

> have the ptrace fallback, so the patch disables it.

> 

> I've made it such that GDB only ever has one /proc/<pid>/mem file open

> at any time.  As long as a memory access hits the same inferior

> process as the previous access, then we reuse the previously open

> file.  If however, we access memory of a different process, then we

> close the previous file and open a new one for the new process.

> 

> If we wanted, we could keep one /proc/<pid>/mem file open per

> inferior, and never close them (unless the inferior exits or execs).

> However, having seen bfd patches recently about hitting too many open

> file descriptors, I kept the logic to have only one file open tops.

> Also, we need to handle memory accesses for processes for which we

> don't have an inferior object, for when we need to detach a

> fork-child, and we'd probaly want to handle caching the open file for

> that scenario (no inferior for process) too, which would probably end

> up meaning caching for last non-inferior process, which is very much

> what I'm proposing anyhow.  So always having one file open likely ends

> up a smaller patch.

> 

> The next step is handling the case of GDB reading/writing memory

> through a thread that is running and exits.  The access should not

> result in a user-visible failure if the inferior/process is still

> alive.

> 

> Once we manage to open a /proc/<lwpid>/mem file, then that file is

> usable for memory accesses even if the corresponding lwp exits and is

> reaped.  I double checked that trying to open the same

> /proc/<lwpid>/mem path again fails because the lwp is really gone so

> there's no /proc/<lwpid>/ entry on the filesystem anymore, but the

> previously open file remains usable.  It's only when the whole process

> execs that we need to reopen a new file.

> 

> When the kernel destroys the whole address space, i.e., when the

> process exits or execs, the reads/writes fail with 0 aka EOF, in which

> case there's nothing else to do than returning a memory access

> failure.  Note this means that when we get an exec event, we need to

> reopen the file, to access the process's new address space.

> 

> If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP

> we're opening it for exits before we open it and before we reap the

> LWP (i.e., the LWP is zombie), the open fails with EACCES.  The patch

> handles this by just looking for another thread until it finds one

> that we can open a /proc/<pid>/mem successfully for.

> 

> If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP

> we're opening it had exited and we already reaped it, which is the


had -> has

> case if the selected thread is in THREAD_EXIT state, the open fails

> with ENOENT.  The patch handles this the same way as a zombie race

> (EACESS), instead of checking upfront whether we're accessing a


EACESS -> EACCES

> known-exited thread, because that would result in more complicated

> code, because we also need to handle accessing lwps that are not

> listed in the core thread list, and it's the core thread list that

> records the THREAD_EXIT state.

> 

> The patch includes two testcases:

> 

> #1 - gdb.base/access-mem-running.exp

> 

>   This is the conceptually simplest - it is single-threaded, and has

>   GDB read and write memory while the program is running.  It also

>   tests setting a breakpoint while the program is running, and checks

>   that the breakpoint is hit immediately.

> 

> #2 - gdb.threads/access-mem-running-thread-exit.exp

> 

>   This one is more elaborate, as it continuously spawns short-lived

>   threads in order to exercise accessing memory just while threads are

>   exiting.  It also spawns two different processes and alternates

>   accessing memory between the two processes to exercise the reopening

>   the /proc file frequently.  This also ends up exercising GDB reading

>   from an exited thread frequently.  I confirmed by putting abort()

>   calls in the EACESS/ENOENT paths added by the patch that we do hit


EACESS -> EACCES

> +/* Keep the /proc/<pid>/mem file open between memory accesses, as a

> +   cache to avoid constantly closing/opening the file in the common

> +   case of multiple memory reads/writes from/to the same inferior.

> +   Note we don't keep a file open per inferior to avoid keeping too

> +   many file descriptors open, which can run into resource limits.  */

> +static struct

> +{

> +  /* The LWP this open file is for.  Note that after opening the file,

> +     even if the thread subsequently exits, the open file is still

> +     usable for accessing memory.  It's only when the whole process

> +     exits or execs that the file becomes invalid (at which point

> +     reads/writes return EOF).  */

> +  ptid_t ptid;


I don't think there is, but just to be sure: is there a problem with
tid reuse here?  If we open the mem file for a given ptid, that thread
exits, and another subsequent thread uses the same ptid.  As long as the
process doesn't exit/exec, the file handle will stay valid, as you
explained.

>  

> -static enum target_xfer_status

> -linux_proc_xfer_partial (enum target_object object,

> -			 const char *annex, gdb_byte *readbuf,

> -			 const gdb_byte *writebuf,

> -			 ULONGEST offset, LONGEST len, ULONGEST *xfered_len)

> +  /* The file descriptor.  -1 if file is not open.  */

> +  int fd = -1;

> +

> +  /* Close FD and clear it to -1.  */

> +  void close ()

> +  {

> +    linux_nat_debug_printf ("closing fd %d for pid %ld\n",

> +			    fd, ptid.lwp ());


To be unambiguous and to be able to relate this to other debug messages,
I think it would be good to print both pid and lwp here.

> +static ssize_t

> +linux_proc_xfer_memory_partial_pid (ptid_t ptid,

> +				    gdb_byte *readbuf, const gdb_byte *writebuf,

> +				    ULONGEST offset, LONGEST len)

> +{

> +  ssize_t ret;

>  

> -  /* We could keep this file open and cache it - possibly one per

> -     thread.  That requires some juggling, but is even faster.  */

> -  xsnprintf (filename, sizeof filename, "/proc/%ld/mem",

> -	     inferior_ptid.lwp ());

> -  fd = gdb_open_cloexec (filename, ((readbuf ? O_RDONLY : O_WRONLY)

> -				    | O_LARGEFILE), 0);

> -  if (fd == -1)

> -    return TARGET_XFER_EOF;

> +  /* As long as we're hitting the same inferior, the previously open

> +     file is good, even if the thread it was open for exits.  */

> +  if (last_proc_mem_file.fd != -1

> +      && last_proc_mem_file.ptid.pid () != ptid.pid ())

> +    last_proc_mem_file.close ();

> +

> +  if (last_proc_mem_file.fd == -1)

> +    {

> +      /* Actually use /proc/<pid>/task/<lwp>/mem instead of

> +	 /proc/<lwp>/mem to avoid PID-reuse races, as we may be trying

> +	 to read memory via a thread which we've already reaped.  If

> +	 the PID is reused for the same process it's OK.  */

> +      char filename[64];

> +      xsnprintf (filename, sizeof filename,

> +		 "/proc/%d/task/%ld/mem", ptid.pid (), ptid.lwp ());

> +

> +      last_proc_mem_file.fd

> +	= gdb_open_cloexec (filename, O_RDWR | O_LARGEFILE, 0);

> +

> +      if (last_proc_mem_file.fd == -1)

> +	{

> +	  linux_nat_debug_printf ("opening fd for lwp %ld failed: %s (%d)\n",

> +				  ptid.lwp (),

> +				  safe_strerror (errno), errno);


Here, you might as well print the path we tried to open (`filename`):

  opening %s failed: ...

> +proc test { non_stop } {

> +    global srcfile binfile

> +    global gdb_prompt

> +    global GDBFLAGS

> +    global decimal

> +

> +    save_vars { GDBFLAGS } {

> +      append GDBFLAGS " -ex \"set non-stop $non_stop\""

> +      clean_restart ${binfile}

> +    }


I'm starting to think that clean_restart should have a switch:

  clean_restart ${binfile} -append-gdbflags "..."

Simon
Pedro Alves July 1, 2021, 1:07 p.m. | #2
Hi Simon,

Thanks for the review.

On 2021-06-25 5:48 p.m., Simon Marchi wrote:

>> If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP

>> we're opening it had exited and we already reaped it, which is the

> 

> had -> has


I changed, though I think past perfect also worked.  I don't know why
I put the "it" in there though, I removed it now:

    If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP
    we're opening has exited and we already reaped it, which is the case

> 

>> case if the selected thread is in THREAD_EXIT state, the open fails

>> with ENOENT.  The patch handles this the same way as a zombie race

>> (EACESS), instead of checking upfront whether we're accessing a

> 

> EACESS -> EACCES


Thanks, fixed throughout.

> I don't think there is, but just to be sure: is there a problem with

> tid reuse here?  If we open the mem file for a given ptid, that thread

> exits, and another subsequent thread uses the same ptid.  As long as the

> process doesn't exit/exec, the file handle will stay valid, as you

> explained.


Right, the pid is only used by the kernel when the file is opened.  Internally in the
kernel, the file holds a reference to the mm (address space).  At  read/lseek time, the
pid is not used at all.  This is all in fs/proc/base.c in the kernel, see the
proc_mem_operations callbacks.

> 

>>  

>> -static enum target_xfer_status

>> -linux_proc_xfer_partial (enum target_object object,

>> -			 const char *annex, gdb_byte *readbuf,

>> -			 const gdb_byte *writebuf,

>> -			 ULONGEST offset, LONGEST len, ULONGEST *xfered_len)

>> +  /* The file descriptor.  -1 if file is not open.  */

>> +  int fd = -1;

>> +

>> +  /* Close FD and clear it to -1.  */

>> +  void close ()

>> +  {

>> +    linux_nat_debug_printf ("closing fd %d for pid %ld\n",

>> +			    fd, ptid.lwp ());

> 

> To be unambiguous and to be able to relate this to other debug messages,

> I think it would be good to print both pid and lwp here.


I've made it print the full path here:

+    linux_nat_debug_printf ("closing fd %d for /proc/%d/task/%ld/mem\n",
+                           fd, ptid.pid (), ptid.lwp ());

>> +      if (last_proc_mem_file.fd == -1)

>> +	{

>> +	  linux_nat_debug_printf ("opening fd for lwp %ld failed: %s (%d)\n",

>> +				  ptid.lwp (),

>> +				  safe_strerror (errno), errno);

> 

> Here, you might as well print the path we tried to open (`filename`):

> 

>   opening %s failed: ...


Indeed, I've done that now, thanks:

+         linux_nat_debug_printf ("opening %s failed: %s (%d)\n",
+                                 filename, safe_strerror (errno), errno);


I've now merged the patch, as below.

BTW, I forgot to mention earlier, but the first thing I tried was to use
process_vm_readv/process_vm_writev but those turn out useless for us,
because they don't bypass memory protections, i.e., you can't write
breakpoints with them, for example.

From 05c06f318fd9a112529dfc313e6512b399a645e4 Mon Sep 17 00:00:00 2001
From: Pedro Alves <pedro@palves.net>

Date: Fri, 11 Jun 2021 17:56:32 +0100
Subject: [PATCH] Linux: Access memory even if threads are running

Currently, on GNU/Linux, if you try to access memory and you have a
running thread selected, GDB fails the memory accesses, like:

 (gdb) c&
 Continuing.
 (gdb) p global_var
 Cannot access memory at address 0x555555558010

Or:

 (gdb) b main
 Breakpoint 2 at 0x55555555524d: file access-mem-running.c, line 59.
 Warning:
 Cannot insert breakpoint 2.
 Cannot access memory at address 0x55555555524d

This patch removes this limitation.  It teaches the native Linux
target to read/write memory even if the target is running.  And it
does this without temporarily stopping threads.  We now get:

 (gdb) c&
 Continuing.
 (gdb) p global_var
 $1 = 123
 (gdb) b main
 Breakpoint 2 at 0x555555555259: file access-mem-running.c, line 62.

(The scenarios above work correctly with current GDBserver, because
GDBserver temporarily stops all threads in the process whenever GDB
wants to access memory (see prepare_to_access_memory /
done_accessing_memory).  Freezing the whole process makes sense when
we need to be sure that we have a consistent view of memory and don't
race with the inferior changing it at the same time as GDB is
accessing it.  But I think that's a too-heavy hammer for the default
behavior.  I think that ideally, whether to stop all threads or not
should be policy decided by gdb core, probably best implemented by
exposing something like gdbserver's prepare_to_access_memory /
done_accessing_memory to gdb core.)

Currently, if we're accessing (reading/writing) just a few bytes, then
the Linux native backend does not try accessing memory via
/proc/<pid>/mem and goes straight to ptrace
PTRACE_PEEKTEXT/PTRACE_POKETEXT.  However, ptrace always fails when
the ptracee is running.  So the first step is to prefer
/proc/<pid>/mem even for small accesses.  Without further changes
however, that may cause a performance regression, due to constantly
opening and closing /proc/<pid>/mem for each memory access.  So the
next step is to keep the /proc/<pid>/mem file open across memory
accesses.  If we have this, then it doesn't make sense anymore to even
have the ptrace fallback, so the patch disables it.

I've made it such that GDB only ever has one /proc/<pid>/mem file open
at any time.  As long as a memory access hits the same inferior
process as the previous access, then we reuse the previously open
file.  If however, we access memory of a different process, then we
close the previous file and open a new one for the new process.

If we wanted, we could keep one /proc/<pid>/mem file open per
inferior, and never close them (unless the inferior exits or execs).
However, having seen bfd patches recently about hitting too many open
file descriptors, I kept the logic to have only one file open tops.
Also, we need to handle memory accesses for processes for which we
don't have an inferior object, for when we need to detach a
fork-child, and we'd probaly want to handle caching the open file for
that scenario (no inferior for process) too, which would probably end
up meaning caching for last non-inferior process, which is very much
what I'm proposing anyhow.  So always having one file open likely ends
up a smaller patch.

The next step is handling the case of GDB reading/writing memory
through a thread that is running and exits.  The access should not
result in a user-visible failure if the inferior/process is still
alive.

Once we manage to open a /proc/<lwpid>/mem file, then that file is
usable for memory accesses even if the corresponding lwp exits and is
reaped.  I double checked that trying to open the same
/proc/<lwpid>/mem path again fails because the lwp is really gone so
there's no /proc/<lwpid>/ entry on the filesystem anymore, but the
previously open file remains usable.  It's only when the whole process
execs that we need to reopen a new file.

When the kernel destroys the whole address space, i.e., when the
process exits or execs, the reads/writes fail with 0 aka EOF, in which
case there's nothing else to do than returning a memory access
failure.  Note this means that when we get an exec event, we need to
reopen the file, to access the process's new address space.

If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP
we're opening it for exits before we open it and before we reap the
LWP (i.e., the LWP is zombie), the open fails with EACCES.  The patch
handles this by just looking for another thread until it finds one
that we can open a /proc/<pid>/mem successfully for.

If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP
we're opening has exited and we already reaped it, which is the case
if the selected thread is in THREAD_EXIT state, the open fails with
ENOENT.  The patch handles this the same way as a zombie race
(EACCES), instead of checking upfront whether we're accessing a
known-exited thread, because that would result in more complicated
code, because we also need to handle accessing lwps that are not
listed in the core thread list, and it's the core thread list that
records the THREAD_EXIT state.

The patch includes two testcases:

#1 - gdb.base/access-mem-running.exp

  This is the conceptually simplest - it is single-threaded, and has
  GDB read and write memory while the program is running.  It also
  tests setting a breakpoint while the program is running, and checks
  that the breakpoint is hit immediately.

#2 - gdb.threads/access-mem-running-thread-exit.exp

  This one is more elaborate, as it continuously spawns short-lived
  threads in order to exercise accessing memory just while threads are
  exiting.  It also spawns two different processes and alternates
  accessing memory between the two processes to exercise the reopening
  the /proc file frequently.  This also ends up exercising GDB reading
  from an exited thread frequently.  I confirmed by putting abort()
  calls in the EACCES/ENOENT paths added by the patch that we do hit
  all of them frequently with the testcase.  It also exits the
  process's main thread (i.e., the main thread becomes zombie), to
  make sure accessing memory in such a corner-case scenario works now
  and in the future.

The tests fail on GNU/Linux native before the code changes, and pass
after.  They pass against current GDBserver, again because GDBserver
supports memory access even if all threads are running, by
transparently pausing the whole process.

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <pedro@palves.net>

	PR mi/15729
	PR gdb/13463
	* linux-nat.c (linux_nat_target::detach): Close the
	/proc/<pid>/mem file if it was open for this process.
	(linux_handle_extended_wait) <PTRACE_EVENT_EXEC>: Close the
	/proc/<pid>/mem file if it was open for this process.
	(linux_nat_target::mourn_inferior): Close the /proc/<pid>/mem file
	if it was open for this process.
	(linux_nat_target::xfer_partial): Adjust.  Do not fall back to
	inf_ptrace_target::xfer_partial for memory accesses.
	(last_proc_mem_file): New.
	(maybe_close_proc_mem_file): New.
	(linux_proc_xfer_memory_partial_pid): New, with bits factored out
	from linux_proc_xfer_partial.
	(linux_proc_xfer_partial): Delete.
	(linux_proc_xfer_memory_partial): New.

gdb/testsuite/ChangeLog
yyyy-mm-dd  Pedro Alves  <pedro@palves.net>

	PR mi/15729
	PR gdb/13463
	* gdb.base/access-mem-running.c: New.
	* gdb.base/access-mem-running.exp: New.
	* gdb.threads/access-mem-running-thread-exit.c: New.
	* gdb.threads/access-mem-running-thread-exit.exp: New.

Change-Id: Ib3c082528872662a3fc0ca9b31c34d4876c874c9
---
 gdb/ChangeLog                                 |  19 ++
 gdb/linux-nat.c                               | 265 ++++++++++++++----
 gdb/testsuite/ChangeLog                       |   9 +
 gdb/testsuite/gdb.base/access-mem-running.c   |  47 ++++
 gdb/testsuite/gdb.base/access-mem-running.exp | 124 ++++++++
 .../access-mem-running-thread-exit.c          | 123 ++++++++
 .../access-mem-running-thread-exit.exp        | 166 +++++++++++
 7 files changed, 706 insertions(+), 47 deletions(-)
 create mode 100644 gdb/testsuite/gdb.base/access-mem-running.c
 create mode 100644 gdb/testsuite/gdb.base/access-mem-running.exp
 create mode 100644 gdb/testsuite/gdb.threads/access-mem-running-thread-exit.c
 create mode 100644 gdb/testsuite/gdb.threads/access-mem-running-thread-exit.exp

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 699bf6fe568..d1091792b26 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,22 @@
+2021-07-01  Pedro Alves  <pedro@palves.net>
+
+	PR mi/15729
+	PR gdb/13463
+	* linux-nat.c (linux_nat_target::detach): Close the
+	/proc/<pid>/mem file if it was open for this process.
+	(linux_handle_extended_wait) <PTRACE_EVENT_EXEC>: Close the
+	/proc/<pid>/mem file if it was open for this process.
+	(linux_nat_target::mourn_inferior): Close the /proc/<pid>/mem file
+	if it was open for this process.
+	(linux_nat_target::xfer_partial): Adjust.  Do not fall back to
+	inf_ptrace_target::xfer_partial for memory accesses.
+	(last_proc_mem_file): New.
+	(maybe_close_proc_mem_file): New.
+	(linux_proc_xfer_memory_partial_pid): New, with bits factored out
+	from linux_proc_xfer_partial.
+	(linux_proc_xfer_partial): Delete.
+	(linux_proc_xfer_memory_partial): New.
+
 2021-06-29  Simon Marchi  <simon.marchi@polymtl.ca>
 
 	* frame.h (FRAME_SCOPED_DEBUG_ENTER_EXIT): New.
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 34a2aee41d7..f206b874929 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -280,6 +280,8 @@ static int lwp_status_pending_p (struct lwp_info *lp);
 
 static void save_stop_reason (struct lwp_info *lp);
 
+static void maybe_close_proc_mem_file (pid_t pid);
+
 
 /* LWP accessors.  */
 
@@ -1486,6 +1488,8 @@ linux_nat_target::detach (inferior *inf, int from_tty)
 
       detach_success (inf);
     }
+
+  maybe_close_proc_mem_file (pid);
 }
 
 /* Resume execution of the inferior process.  If STEP is nonzero,
@@ -2025,6 +2029,10 @@ linux_handle_extended_wait (struct lwp_info *lp, int status)
     {
       linux_nat_debug_printf ("Got exec event from LWP %ld", lp->ptid.lwp ());
 
+      /* Close the /proc/<pid>/mem file if it was open for this
+	 inferior.  */
+      maybe_close_proc_mem_file (lp->ptid.pid ());
+
       ourstatus->kind = TARGET_WAITKIND_EXECD;
       ourstatus->value.execd_pathname
 	= xstrdup (linux_proc_pid_to_exec_file (pid));
@@ -3589,6 +3597,10 @@ linux_nat_target::mourn_inferior ()
 
   purge_lwp_list (pid);
 
+  /* Close the /proc/<pid>/mem file if it was open for this
+     inferior.  */
+  maybe_close_proc_mem_file (pid);
+
   if (! forks_exist_p ())
     /* Normal case, no other forks available.  */
     inf_ptrace_target::mourn_inferior ();
@@ -3681,10 +3693,8 @@ linux_nat_xfer_osdata (enum target_object object,
 		       ULONGEST *xfered_len);
 
 static enum target_xfer_status
-linux_proc_xfer_partial (enum target_object object,
-			 const char *annex, gdb_byte *readbuf,
-			 const gdb_byte *writebuf,
-			 ULONGEST offset, LONGEST len, ULONGEST *xfered_len);
+linux_proc_xfer_memory_partial (gdb_byte *readbuf, const gdb_byte *writebuf,
+				ULONGEST offset, LONGEST len, ULONGEST *xfered_len);
 
 enum target_xfer_status
 linux_nat_target::xfer_partial (enum target_object object,
@@ -3692,8 +3702,6 @@ linux_nat_target::xfer_partial (enum target_object object,
 				const gdb_byte *writebuf,
 				ULONGEST offset, ULONGEST len, ULONGEST *xfered_len)
 {
-  enum target_xfer_status xfer;
-
   if (object == TARGET_OBJECT_SIGNAL_INFO)
     return linux_xfer_siginfo (object, annex, readbuf, writebuf,
 			       offset, len, xfered_len);
@@ -3712,25 +3720,21 @@ linux_nat_target::xfer_partial (enum target_object object,
     return linux_nat_xfer_osdata (object, annex, readbuf, writebuf,
 				  offset, len, xfered_len);
 
-  /* GDB calculates all addresses in the largest possible address
-     width.
-     The address width must be masked before its final use - either by
-     linux_proc_xfer_partial or inf_ptrace_target::xfer_partial.
-
-     Compare ADDR_BIT first to avoid a compiler warning on shift overflow.  */
-
   if (object == TARGET_OBJECT_MEMORY)
     {
+      /* GDB calculates all addresses in the largest possible address
+	 width.  The address width must be masked before its final use
+	 by linux_proc_xfer_partial.
+
+	 Compare ADDR_BIT first to avoid a compiler warning on shift overflow.  */
       int addr_bit = gdbarch_addr_bit (target_gdbarch ());
 
       if (addr_bit < (sizeof (ULONGEST) * HOST_CHAR_BIT))
 	offset &= ((ULONGEST) 1 << addr_bit) - 1;
-    }
 
-  xfer = linux_proc_xfer_partial (object, annex, readbuf, writebuf,
-				  offset, len, xfered_len);
-  if (xfer != TARGET_XFER_EOF)
-    return xfer;
+      return linux_proc_xfer_memory_partial (readbuf, writebuf,
+					     offset, len, xfered_len);
+    }
 
   return inf_ptrace_target::xfer_partial (object, annex, readbuf, writebuf,
 					  offset, len, xfered_len);
@@ -3794,35 +3798,91 @@ linux_nat_target::pid_to_exec_file (int pid)
   return linux_proc_pid_to_exec_file (pid);
 }
 
-/* Implement the to_xfer_partial target method using /proc/<pid>/mem.
-   Because we can use a single read/write call, this can be much more
-   efficient than banging away at PTRACE_PEEKTEXT.  */
+/* Keep the /proc/<pid>/mem file open between memory accesses, as a
+   cache to avoid constantly closing/opening the file in the common
+   case of multiple memory reads/writes from/to the same inferior.
+   Note we don't keep a file open per inferior to avoid keeping too
+   many file descriptors open, which can run into resource limits.  */
+static struct
+{
+  /* The LWP this open file is for.  Note that after opening the file,
+     even if the thread subsequently exits, the open file is still
+     usable for accessing memory.  It's only when the whole process
+     exits or execs that the file becomes invalid (at which point
+     reads/writes return EOF).  */
+  ptid_t ptid;
 
-static enum target_xfer_status
-linux_proc_xfer_partial (enum target_object object,
-			 const char *annex, gdb_byte *readbuf,
-			 const gdb_byte *writebuf,
-			 ULONGEST offset, LONGEST len, ULONGEST *xfered_len)
+  /* The file descriptor.  -1 if file is not open.  */
+  int fd = -1;
+
+  /* Close FD and clear it to -1.  */
+  void close ()
+  {
+    linux_nat_debug_printf ("closing fd %d for /proc/%d/task/%ld/mem\n",
+			    fd, ptid.pid (), ptid.lwp ());
+    ::close (fd);
+    fd = -1;
+  }
+} last_proc_mem_file;
+
+/* Close the /proc/<pid>/mem file if its LWP matches PTID.  */
+
+static void
+maybe_close_proc_mem_file (pid_t pid)
 {
-  LONGEST ret;
-  int fd;
-  char filename[64];
+  if (last_proc_mem_file.ptid.pid () == pid)
+    last_proc_mem_file.close ();
+}
 
-  if (object != TARGET_OBJECT_MEMORY)
-    return TARGET_XFER_EOF;
+/* Helper for linux_proc_xfer_memory_partial.  Accesses /proc via
+   PTID.  Returns -1 on error, with errno set.  Returns number of
+   read/written bytes otherwise.  Returns 0 on EOF, which indicates
+   the address space is gone (because the process exited or
+   execed).  */
 
-  /* Don't bother for one word.  */
-  if (len < 3 * sizeof (long))
-    return TARGET_XFER_EOF;
+static ssize_t
+linux_proc_xfer_memory_partial_pid (ptid_t ptid,
+				    gdb_byte *readbuf, const gdb_byte *writebuf,
+				    ULONGEST offset, LONGEST len)
+{
+  ssize_t ret;
 
-  /* We could keep this file open and cache it - possibly one per
-     thread.  That requires some juggling, but is even faster.  */
-  xsnprintf (filename, sizeof filename, "/proc/%ld/mem",
-	     inferior_ptid.lwp ());
-  fd = gdb_open_cloexec (filename, ((readbuf ? O_RDONLY : O_WRONLY)
-				    | O_LARGEFILE), 0);
-  if (fd == -1)
-    return TARGET_XFER_EOF;
+  /* As long as we're hitting the same inferior, the previously open
+     file is good, even if the thread it was open for exits.  */
+  if (last_proc_mem_file.fd != -1
+      && last_proc_mem_file.ptid.pid () != ptid.pid ())
+    last_proc_mem_file.close ();
+
+  if (last_proc_mem_file.fd == -1)
+    {
+      /* Actually use /proc/<pid>/task/<lwp>/mem instead of
+	 /proc/<lwp>/mem to avoid PID-reuse races, as we may be trying
+	 to read memory via a thread which we've already reaped.
+	 /proc/<lwp>/mem could open a file for the wrong process.  If
+	 the LWPID is reused for the same process it's OK, we can read
+	 memory through it just fine.  If the LWPID is reused for a
+	 different process, then the open will fail because the path
+	 won't exist.  */
+      char filename[64];
+      xsnprintf (filename, sizeof filename,
+		 "/proc/%d/task/%ld/mem", ptid.pid (), ptid.lwp ());
+
+      last_proc_mem_file.fd
+	= gdb_open_cloexec (filename, O_RDWR | O_LARGEFILE, 0);
+
+      if (last_proc_mem_file.fd == -1)
+	{
+	  linux_nat_debug_printf ("opening %s failed: %s (%d)\n",
+				  filename, safe_strerror (errno), errno);
+	  return -1;
+	}
+      last_proc_mem_file.ptid = ptid;
+
+      linux_nat_debug_printf ("opened fd %d for %s\n",
+			      last_proc_mem_file.fd, filename);
+    }
+
+  int fd = last_proc_mem_file.fd;
 
   /* Use pread64/pwrite64 if available, since they save a syscall and can
      handle 64-bit offsets even on 32-bit platforms (for instance, SPARC
@@ -3837,17 +3897,128 @@ linux_proc_xfer_partial (enum target_object object,
 	   : write (fd, writebuf, len));
 #endif
 
-  close (fd);
+  if (ret == -1)
+    {
+      linux_nat_debug_printf ("accessing fd %d for pid %ld failed: %s (%d)\n",
+			      fd, ptid.lwp (),
+			      safe_strerror (errno), errno);
+    }
+  else if (ret == 0)
+    {
+      linux_nat_debug_printf ("accessing fd %d for pid %ld got EOF\n",
+			      fd, ptid.lwp ());
+    }
+
+  return ret;
+}
 
-  if (ret == -1 || ret == 0)
-    return TARGET_XFER_EOF;
+/* Implement the to_xfer_partial target method using /proc/<pid>/mem.
+   Because we can use a single read/write call, this can be much more
+   efficient than banging away at PTRACE_PEEKTEXT.  Also, unlike
+   PTRACE_PEEKTEXT/PTRACE_POKETEXT, this works with running
+   threads.  */
+
+static enum target_xfer_status
+linux_proc_xfer_memory_partial (gdb_byte *readbuf, const gdb_byte *writebuf,
+				ULONGEST offset, LONGEST len,
+				ULONGEST *xfered_len)
+{
+  /* Unlike PTRACE_PEEKTEXT/PTRACE_POKETEXT, reading/writing from/to
+     /proc/<pid>/mem works with running threads, and even exited
+     threads if the file was already open.  If we need to open or
+     reopen the /proc file though, we may get an EACCES error
+     ("Permission denied"), meaning the thread is gone but its exit
+     status isn't reaped yet, or ENOENT if the thread is gone and
+     already reaped.  In that case, just try opening the file for
+     another thread in the process.  If all threads fail, then it must
+     mean the whole process exited, in which case there's nothing else
+     to do and we just fail the memory access.
+
+     Note we don't simply always access via the leader thread because
+     the leader may exit without exiting the whole process.  See
+     gdb.threads/leader-exit.exp, for example.  */
+
+  /* It's frequently the case that the selected thread is stopped, and
+     is thus not likely to exit (unless something kills the process
+     outside our control, with e.g., SIGKILL).  Give that one a try
+     first.
+
+     Also, inferior_ptid may actually point at an LWP not in lwp_list.
+     This happens when we're detaching from a fork child that we don't
+     want to debug ("set detach-on-fork on"), and the breakpoints
+     module uninstalls breakpoints from the fork child.  Which process
+     to access is given by inferior_ptid.  */
+  int res = linux_proc_xfer_memory_partial_pid (inferior_ptid,
+						readbuf, writebuf,
+						offset, len);
+  if (res == 0)
+    {
+      /* EOF means the address space is gone, the whole
+	 process exited or execed.  */
+      return TARGET_XFER_EOF;
+    }
+  else if (res != -1)
+    {
+      *xfered_len = res;
+      return TARGET_XFER_OK;
+    }
   else
     {
-      *xfered_len = ret;
+      /* If we simply raced with the thread exiting (EACCES), or the
+	 current thread is THREAD_EXITED (ENOENT), try some other
+	 thread.  It's easier to handle an ENOENT failure than check
+	 for THREAD_EXIT upfront because this function is called
+	 before a thread for inferior_ptid is added to the thread
+	 list.  */
+      if (errno != EACCES && errno != ENOENT)
+	return TARGET_XFER_EOF;
+    }
+
+  int cur_pid = current_inferior ()->pid;
+
+  if (inferior_ptid.pid () != cur_pid)
+    {
+      /* We're accessing a fork child, and the access above failed.
+	 Don't bother iterating the LWP list, since there's no other
+	 LWP for this process.  */
+      return TARGET_XFER_EOF;
+    }
+
+  /* Iterate over LWPs of the current inferior, trying to access
+     memory through one of them.  */
+  for (lwp_info *lp = lwp_list; lp != nullptr; lp = lp->next)
+    {
+      if (lp->ptid.pid () != cur_pid)
+	continue;
+
+      res = linux_proc_xfer_memory_partial_pid (lp->ptid,
+						readbuf, writebuf,
+						offset, len);
+
+      if (res == 0)
+	{
+	  /* EOF means the address space is gone, the whole process
+	     exited or execed.  */
+	  return TARGET_XFER_EOF;
+	}
+      else if (res == -1)
+	{
+	  if (errno == EACCES)
+	    {
+	      /* This LWP is gone, try another one.  */
+	      continue;
+	    }
+
+	  return TARGET_XFER_EOF;
+	}
+
+      *xfered_len = res;
       return TARGET_XFER_OK;
     }
-}
 
+  /* No luck.  */
+  return TARGET_XFER_EOF;
+}
 
 /* Parse LINE as a signal set and add its set bits to SIGS.  */
 
diff --git a/gdb/testsuite/ChangeLog b/gdb/testsuite/ChangeLog
index 38d8bbccd6c..67fa42cbe3b 100644
--- a/gdb/testsuite/ChangeLog
+++ b/gdb/testsuite/ChangeLog
@@ -1,3 +1,12 @@
+2021-07-01  Pedro Alves  <pedro@palves.net>
+
+	PR mi/15729
+	PR gdb/13463
+	* gdb.base/access-mem-running.c: New.
+	* gdb.base/access-mem-running.exp: New.
+	* gdb.threads/access-mem-running-thread-exit.c: New.
+	* gdb.threads/access-mem-running-thread-exit.exp: New.
+
 2021-06-29  Simon Marchi  <simon.marchi@polymtl.ca>
 
 	* gdb.dwarf2/dw2-reg-undefined.exp: Update regexp.
diff --git a/gdb/testsuite/gdb.base/access-mem-running.c b/gdb/testsuite/gdb.base/access-mem-running.c
new file mode 100644
index 00000000000..2bb4f9766b5
--- /dev/null
+++ b/gdb/testsuite/gdb.base/access-mem-running.c
@@ -0,0 +1,47 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2021 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <unistd.h>
+
+static unsigned int global_counter = 1;
+
+static volatile unsigned int global_var = 123;
+
+static void
+maybe_stop_here ()
+{
+}
+
+int
+main (void)
+{
+  global_counter = 1;
+
+  while (global_counter > 0)
+    {
+      global_counter++;
+
+      /* Less than 1s, so the counter increments at least once while
+	 the .exp sleep 1s, but slow enough that the counter doesn't
+	 wrap in 1s.  */
+      usleep (5000);
+
+      maybe_stop_here ();
+    }
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/access-mem-running.exp b/gdb/testsuite/gdb.base/access-mem-running.exp
new file mode 100644
index 00000000000..6990d906da2
--- /dev/null
+++ b/gdb/testsuite/gdb.base/access-mem-running.exp
@@ -0,0 +1,124 @@
+# Copyright (C) 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test that we can access memory while the inferior is running.
+
+standard_testfile
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile {debug}] == -1} {
+    return -1
+}
+
+# The test proper.  NON_STOP indicates whether we're testing in
+# non-stop, or all-stop mode.
+
+proc test { non_stop } {
+    global srcfile binfile
+    global gdb_prompt
+    global GDBFLAGS
+    global decimal
+
+    save_vars { GDBFLAGS } {
+      append GDBFLAGS " -ex \"set non-stop $non_stop\""
+      clean_restart ${binfile}
+    }
+
+    if ![runto_main] {
+	return -1
+    }
+
+    # If debugging with target remote, check whether the all-stop variant
+    # of the RSP is being used.  If so, we can't run the background tests.
+    if {!$non_stop
+	&& [target_info exists gdb_protocol]
+	&& ([target_info gdb_protocol] == "remote"
+	    || [target_info gdb_protocol] == "extended-remote")} {
+
+	gdb_test_multiple "maint show target-non-stop" "" {
+	    -wrap -re "(is|currently) on.*" {
+	    }
+	    -wrap -re "(is|currently) off.*" {
+		unsupported "can't issue commands while target is running"
+		return 0
+	    }
+	}
+    }
+
+    delete_breakpoints
+
+    if {$non_stop == "off"} {
+	set cmd "continue &"
+    } else {
+	set cmd "continue -a &"
+    }
+    gdb_test_multiple $cmd "continuing" {
+	-re "Continuing\.\r\n$gdb_prompt " {
+	    pass $gdb_test_name
+	}
+    }
+
+    # Check we can read/write variables.
+
+    # Check that we can read the counter variable, and that the
+    # counter is increasing, meaning the process really is running.
+
+    sleep 1
+    set global_counter1 \
+	[get_integer_valueof "global_counter" 0 \
+	     "get global_counter once"]
+
+    sleep 1
+    set global_counter2 \
+	[get_integer_valueof "global_counter" 0 \
+	     "get global_counter twice"]
+
+    gdb_assert {$global_counter1 != 0 \
+		    && $global_counter2 != 0 \
+		    && $global_counter1 != $global_counter2} \
+	"value changed"
+
+    # Check that we can write variables.
+
+    gdb_test "print global_var" " = 123" \
+	"print global_var before writing"
+    gdb_test "print global_var = 321" " = 321" \
+	"write to global_var"
+    gdb_test "print global_var" " = 321" \
+	"print global_var after writing"
+    gdb_test "print global_var = 123" " = 123" \
+	"write to global_var again"
+
+    # Check we can set a breakpoint while the process is running.  The
+    # breakpoint should hit immediately.
+    set any "\[^\r\n\]*"
+
+    gdb_test_multiple "b maybe_stop_here" "" {
+	-re "Breakpoint $decimal at $any: file $any${srcfile}, line $decimal.\r\n$gdb_prompt " {
+	    pass $gdb_test_name
+	}
+    }
+    gdb_test_multiple "" "breakpoint hits" {
+	-re "Breakpoint $decimal, maybe_stop_here \\(\\) at $any${srcfile}:$decimal\r\n" {
+	    pass "$gdb_test_name"
+	}
+    }
+}
+
+foreach non_stop { "off" "on" } {
+    set stop_mode [expr ($non_stop=="off")?"all-stop":"non-stop"]
+    with_test_prefix "$stop_mode" {
+	test $non_stop
+    }
+}
diff --git a/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.c b/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.c
new file mode 100644
index 00000000000..5f89d223065
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.c
@@ -0,0 +1,123 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2021 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+
+#define THREADS 20
+
+static volatile unsigned int global_var = 123;
+
+/* Wrapper around pthread_create.   */
+
+static void
+create_thread (pthread_t *child,
+	       void *(*start_routine) (void *), void *arg)
+{
+  int rc;
+
+  while ((rc = pthread_create (child, NULL, start_routine, arg)) != 0)
+    {
+      fprintf (stderr, "unexpected error from pthread_create: %s (%d)\n",
+	       strerror (rc), rc);
+      sleep (1);
+    }
+}
+
+/* Data passed to threads on creation.  This is allocated on the heap
+   and ownership transferred from parent to child.  */
+
+struct thread_arg
+{
+  /* The thread's parent.  */
+  pthread_t parent;
+
+  /* Whether to call pthread_join on the parent.  */
+  int join_parent;
+};
+
+/* Entry point for threads.  */
+
+static void *
+thread_fn (void *arg)
+{
+  struct thread_arg *p = arg;
+
+  /* Passing no argument makes the thread exit immediately.  */
+  if (p == NULL)
+    return NULL;
+
+  if (p->join_parent)
+    assert (pthread_join (p->parent, NULL) == 0);
+
+  /* Spawn a number of threads that exit immediately, and then join
+     them.  The idea is to maximize the time window when we mostly
+     have threads exiting.  */
+  {
+    pthread_t child[THREADS];
+    int i;
+
+    /* Passing no argument makes the thread exit immediately.  */
+    for (i = 0; i < THREADS; i++)
+      create_thread (&child[i], thread_fn, NULL);
+
+    for (i = 0; i < THREADS; i++)
+      pthread_join (child[i], NULL);
+  }
+
+  /* Spawn a new thread that joins us, and exit.  The idea here is to
+     not have any thread that stays around forever.  */
+  {
+    pthread_t child;
+
+    p->parent = pthread_self ();
+    p->join_parent = 1;
+    create_thread (&child, thread_fn, p);
+  }
+
+  return NULL;
+}
+
+int
+main (void)
+{
+  int i;
+
+  for (i = 0; i < 4; i++)
+    {
+      struct thread_arg *p;
+      pthread_t child;
+
+      p = malloc (sizeof *p);
+      p->parent = pthread_self ();
+      /* Only join the parent once.  */
+      if (i == 0)
+	p->join_parent = 1;
+      else
+	p->join_parent = 0;
+      create_thread (&child, thread_fn, p);
+    }
+
+  /* Exit the leader to make sure that we can access memory with the
+     leader gone.  */
+  pthread_exit (NULL);
+}
diff --git a/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.exp b/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.exp
new file mode 100644
index 00000000000..ea228e4ba13
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.exp
@@ -0,0 +1,166 @@
+# Copyright (C) 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test that we can access memory while all the threads of the inferior
+# are running, and even if:
+#
+# - the leader thread exits
+# - the selected thread exits
+#
+# This test constantly spawns short lived threads to make sure that on
+# systems with debug APIs that require passing down a specific thread
+# to work with (e.g., GNU/Linux ptrace and /proc filesystem), GDB
+# copes with accessing memory just while the thread it is accessing
+# memory through exits.
+#
+# The test spawns two processes and alternates memory accesses between
+# them to force flushing per-process caches.  At the time of writing,
+# the Linux backend accesses inferior memory via /proc/<pid>/mem, and
+# keeps one such file open, as a cache.  Alternating inferiors forces
+# opening such file for a different process, which fails if GDB tries
+# to open the file for a thread that exited.  The test does ensures
+# those reopen/fail code paths are exercised.
+
+standard_testfile
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile {debug pthreads}] == -1} {
+    return -1
+}
+
+# The test proper.  NON_STOP indicates whether we're testing in
+# non-stop, or all-stop mode.
+
+proc test { non_stop } {
+    global binfile
+    global gdb_prompt
+    global GDBFLAGS
+
+    save_vars { GDBFLAGS } {
+      append GDBFLAGS " -ex \"set non-stop $non_stop\""
+      clean_restart ${binfile}
+    }
+
+    if ![runto_main] {
+	fail "cannot run to main"
+	return -1
+    }
+
+    # If debugging with target remote, check whether the all-stop variant
+    # of the RSP is being used.  If so, we can't run the background tests.
+    if {!$non_stop
+	&& [target_info exists gdb_protocol]
+	&& ([target_info gdb_protocol] == "remote"
+	    || [target_info gdb_protocol] == "extended-remote")} {
+
+	gdb_test_multiple "maint show target-non-stop" "" {
+	    -wrap -re "(is|currently) on.*" {
+	    }
+	    -wrap -re "(is|currently) off.*" {
+		unsupported "can't issue commands while target is running"
+		return 0
+	    }
+	}
+    }
+
+    delete_breakpoints
+
+    # Start the second inferior.
+    with_test_prefix "second inferior" {
+	gdb_test "add-inferior -no-connection" "New inferior 2.*"
+	gdb_test "inferior 2" "Switching to inferior 2 .*"
+
+	gdb_load $binfile
+
+	if ![runto_main] {
+	    fail "cannot run to main"
+	    return -1
+	}
+    }
+
+    delete_breakpoints
+
+    # These put too much noise in the logs.
+    gdb_test_no_output "set print thread-events off"
+
+    # Continue all threads of both processes.
+    gdb_test_no_output "set schedule-multiple on"
+    if {$non_stop == "off"} {
+	set cmd "continue &"
+    } else {
+	set cmd "continue -a &"
+    }
+    gdb_test_multiple $cmd "continuing" {
+	-re "Continuing\.\r\n$gdb_prompt " {
+	    pass $gdb_test_name
+	}
+    }
+
+    # Like gdb_test, but:
+    # - don't issue a pass on success.
+    # - on failure, clear the ok variable in the calling context, and
+    #   break it.
+    proc my_gdb_test {cmd pattern message} {
+	upvar inf inf
+	upvar iter iter
+	if {[gdb_test_multiple $cmd "access mem ($message, inf=$inf, iter=$iter)" {
+	    -wrap -re $pattern {
+	    }
+	}] != 0} {
+	    uplevel 1 {set ok 0}
+	    return -code break
+	}
+    }
+
+    # Hammer away for 5 seconds, alternating between inferiors.
+    set ::done 0
+    after 5000 { set ::done 1 }
+
+    set inf 1
+    set ok 1
+    set iter 0
+    while {!$::done && $ok} {
+	incr iter
+	verbose -log "xxxxx: iteration $iter"
+	gdb_test "info threads" ".*" ""
+
+	if {$inf == 1} {
+	    set inf 2
+	} else {
+	    set inf 1
+	}
+
+	my_gdb_test "inferior $inf" ".*" "inferior $inf"
+
+	my_gdb_test "print global_var = 555" " = 555" \
+	    "write to global_var"
+	my_gdb_test "print global_var" " = 555" \
+	    "print global_var after writing"
+	my_gdb_test "print global_var = 333" " = 333" \
+	    "write to global_var again"
+	my_gdb_test "print global_var" " = 333" \
+	    "print global_var after writing again"
+    }
+
+   if {$ok} {
+       pass "access mem"
+   }
+}
+
+foreach non_stop { "off" "on" } {
+    set stop_mode [expr ($non_stop=="off")?"all-stop":"non-stop"]
+    with_test_prefix "$stop_mode" {
+	test $non_stop
+    }
+}

base-commit: 75a2da57a1bbff8686f56a43aabe1d7e55147894
-- 
2.26.2
Tom Tromey July 1, 2021, 1:25 p.m. | #3
Pedro> BTW, I forgot to mention earlier, but the first thing I tried was to use
Pedro> process_vm_readv/process_vm_writev but those turn out useless for us,
Pedro> because they don't bypass memory protections, i.e., you can't write
Pedro> breakpoints with them, for example.

Thanks for looking into that.  I've thought about changing gdb to use
these, in the past, but now we can officially not bother.

Tom

Patch

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 34a2aee41d7..0b4107af523 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -280,6 +280,8 @@  static int lwp_status_pending_p (struct lwp_info *lp);
 
 static void save_stop_reason (struct lwp_info *lp);
 
+static void maybe_close_proc_mem_file (pid_t pid);
+
 
 /* LWP accessors.  */
 
@@ -1486,6 +1488,8 @@  linux_nat_target::detach (inferior *inf, int from_tty)
 
       detach_success (inf);
     }
+
+  maybe_close_proc_mem_file (pid);
 }
 
 /* Resume execution of the inferior process.  If STEP is nonzero,
@@ -2025,6 +2029,10 @@  linux_handle_extended_wait (struct lwp_info *lp, int status)
     {
       linux_nat_debug_printf ("Got exec event from LWP %ld", lp->ptid.lwp ());
 
+      /* Close the /proc/<pid>/mem file if it was open for this
+	 inferior.  */
+      maybe_close_proc_mem_file (lp->ptid.pid ());
+
       ourstatus->kind = TARGET_WAITKIND_EXECD;
       ourstatus->value.execd_pathname
 	= xstrdup (linux_proc_pid_to_exec_file (pid));
@@ -3589,6 +3597,10 @@  linux_nat_target::mourn_inferior ()
 
   purge_lwp_list (pid);
 
+  /* Close the /proc/<pid>/mem file if it was open for this
+     inferior.  */
+  maybe_close_proc_mem_file (pid);
+
   if (! forks_exist_p ())
     /* Normal case, no other forks available.  */
     inf_ptrace_target::mourn_inferior ();
@@ -3681,10 +3693,8 @@  linux_nat_xfer_osdata (enum target_object object,
 		       ULONGEST *xfered_len);
 
 static enum target_xfer_status
-linux_proc_xfer_partial (enum target_object object,
-			 const char *annex, gdb_byte *readbuf,
-			 const gdb_byte *writebuf,
-			 ULONGEST offset, LONGEST len, ULONGEST *xfered_len);
+linux_proc_xfer_memory_partial (gdb_byte *readbuf, const gdb_byte *writebuf,
+				ULONGEST offset, LONGEST len, ULONGEST *xfered_len);
 
 enum target_xfer_status
 linux_nat_target::xfer_partial (enum target_object object,
@@ -3692,8 +3702,6 @@  linux_nat_target::xfer_partial (enum target_object object,
 				const gdb_byte *writebuf,
 				ULONGEST offset, ULONGEST len, ULONGEST *xfered_len)
 {
-  enum target_xfer_status xfer;
-
   if (object == TARGET_OBJECT_SIGNAL_INFO)
     return linux_xfer_siginfo (object, annex, readbuf, writebuf,
 			       offset, len, xfered_len);
@@ -3712,25 +3720,21 @@  linux_nat_target::xfer_partial (enum target_object object,
     return linux_nat_xfer_osdata (object, annex, readbuf, writebuf,
 				  offset, len, xfered_len);
 
-  /* GDB calculates all addresses in the largest possible address
-     width.
-     The address width must be masked before its final use - either by
-     linux_proc_xfer_partial or inf_ptrace_target::xfer_partial.
-
-     Compare ADDR_BIT first to avoid a compiler warning on shift overflow.  */
-
   if (object == TARGET_OBJECT_MEMORY)
     {
+      /* GDB calculates all addresses in the largest possible address
+	 width.  The address width must be masked before its final use
+	 by linux_proc_xfer_partial.
+
+	 Compare ADDR_BIT first to avoid a compiler warning on shift overflow.  */
       int addr_bit = gdbarch_addr_bit (target_gdbarch ());
 
       if (addr_bit < (sizeof (ULONGEST) * HOST_CHAR_BIT))
 	offset &= ((ULONGEST) 1 << addr_bit) - 1;
-    }
 
-  xfer = linux_proc_xfer_partial (object, annex, readbuf, writebuf,
-				  offset, len, xfered_len);
-  if (xfer != TARGET_XFER_EOF)
-    return xfer;
+      return linux_proc_xfer_memory_partial (readbuf, writebuf,
+					     offset, len, xfered_len);
+    }
 
   return inf_ptrace_target::xfer_partial (object, annex, readbuf, writebuf,
 					  offset, len, xfered_len);
@@ -3794,35 +3798,88 @@  linux_nat_target::pid_to_exec_file (int pid)
   return linux_proc_pid_to_exec_file (pid);
 }
 
-/* Implement the to_xfer_partial target method using /proc/<pid>/mem.
-   Because we can use a single read/write call, this can be much more
-   efficient than banging away at PTRACE_PEEKTEXT.  */
+/* Keep the /proc/<pid>/mem file open between memory accesses, as a
+   cache to avoid constantly closing/opening the file in the common
+   case of multiple memory reads/writes from/to the same inferior.
+   Note we don't keep a file open per inferior to avoid keeping too
+   many file descriptors open, which can run into resource limits.  */
+static struct
+{
+  /* The LWP this open file is for.  Note that after opening the file,
+     even if the thread subsequently exits, the open file is still
+     usable for accessing memory.  It's only when the whole process
+     exits or execs that the file becomes invalid (at which point
+     reads/writes return EOF).  */
+  ptid_t ptid;
 
-static enum target_xfer_status
-linux_proc_xfer_partial (enum target_object object,
-			 const char *annex, gdb_byte *readbuf,
-			 const gdb_byte *writebuf,
-			 ULONGEST offset, LONGEST len, ULONGEST *xfered_len)
+  /* The file descriptor.  -1 if file is not open.  */
+  int fd = -1;
+
+  /* Close FD and clear it to -1.  */
+  void close ()
+  {
+    linux_nat_debug_printf ("closing fd %d for pid %ld\n",
+			    fd, ptid.lwp ());
+    ::close (fd);
+    fd = -1;
+  }
+} last_proc_mem_file;
+
+/* Close the /proc/<pid>/mem file if its LWP matches PTID.  */
+
+static void
+maybe_close_proc_mem_file (pid_t pid)
 {
-  LONGEST ret;
-  int fd;
-  char filename[64];
+  if (last_proc_mem_file.ptid.pid () == pid)
+    last_proc_mem_file.close ();
+}
 
-  if (object != TARGET_OBJECT_MEMORY)
-    return TARGET_XFER_EOF;
+/* Helper for linux_proc_xfer_memory_partial.  Accesses /proc via
+   PTID.  Returns -1 on error, with errno set.  Returns number of
+   read/written bytes otherwise.  Returns 0 on EOF, which indicates
+   the address space is gone (because the process exited or
+   execed).  */
 
-  /* Don't bother for one word.  */
-  if (len < 3 * sizeof (long))
-    return TARGET_XFER_EOF;
+static ssize_t
+linux_proc_xfer_memory_partial_pid (ptid_t ptid,
+				    gdb_byte *readbuf, const gdb_byte *writebuf,
+				    ULONGEST offset, LONGEST len)
+{
+  ssize_t ret;
 
-  /* We could keep this file open and cache it - possibly one per
-     thread.  That requires some juggling, but is even faster.  */
-  xsnprintf (filename, sizeof filename, "/proc/%ld/mem",
-	     inferior_ptid.lwp ());
-  fd = gdb_open_cloexec (filename, ((readbuf ? O_RDONLY : O_WRONLY)
-				    | O_LARGEFILE), 0);
-  if (fd == -1)
-    return TARGET_XFER_EOF;
+  /* As long as we're hitting the same inferior, the previously open
+     file is good, even if the thread it was open for exits.  */
+  if (last_proc_mem_file.fd != -1
+      && last_proc_mem_file.ptid.pid () != ptid.pid ())
+    last_proc_mem_file.close ();
+
+  if (last_proc_mem_file.fd == -1)
+    {
+      /* Actually use /proc/<pid>/task/<lwp>/mem instead of
+	 /proc/<lwp>/mem to avoid PID-reuse races, as we may be trying
+	 to read memory via a thread which we've already reaped.  If
+	 the PID is reused for the same process it's OK.  */
+      char filename[64];
+      xsnprintf (filename, sizeof filename,
+		 "/proc/%d/task/%ld/mem", ptid.pid (), ptid.lwp ());
+
+      last_proc_mem_file.fd
+	= gdb_open_cloexec (filename, O_RDWR | O_LARGEFILE, 0);
+
+      if (last_proc_mem_file.fd == -1)
+	{
+	  linux_nat_debug_printf ("opening fd for lwp %ld failed: %s (%d)\n",
+				  ptid.lwp (),
+				  safe_strerror (errno), errno);
+	  return -1;
+	}
+      last_proc_mem_file.ptid = ptid;
+
+      linux_nat_debug_printf ("opened fd %d for lwp %ld\n",
+			      last_proc_mem_file.fd, ptid.lwp ());
+    }
+
+  int fd = last_proc_mem_file.fd;
 
   /* Use pread64/pwrite64 if available, since they save a syscall and can
      handle 64-bit offsets even on 32-bit platforms (for instance, SPARC
@@ -3837,17 +3894,128 @@  linux_proc_xfer_partial (enum target_object object,
 	   : write (fd, writebuf, len));
 #endif
 
-  close (fd);
+  if (ret == -1)
+    {
+      linux_nat_debug_printf ("accessing fd %d for pid %ld failed: %s (%d)\n",
+			      fd, ptid.lwp (),
+			      safe_strerror (errno), errno);
+    }
+  else if (ret == 0)
+    {
+      linux_nat_debug_printf ("accessing fd %d for pid %ld got EOF\n",
+			      fd, ptid.lwp ());
+    }
+
+  return ret;
+}
 
-  if (ret == -1 || ret == 0)
-    return TARGET_XFER_EOF;
+/* Implement the to_xfer_partial target method using /proc/<pid>/mem.
+   Because we can use a single read/write call, this can be much more
+   efficient than banging away at PTRACE_PEEKTEXT.  Also, unlike
+   PTRACE_PEEKTEXT/PTRACE_POKETEXT, this works with running
+   threads.  */
+
+static enum target_xfer_status
+linux_proc_xfer_memory_partial (gdb_byte *readbuf, const gdb_byte *writebuf,
+				ULONGEST offset, LONGEST len,
+				ULONGEST *xfered_len)
+{
+  /* Unlike PTRACE_PEEKTEXT/PTRACE_POKETEXT, reading/writing from/to
+     /proc/<pid>/mem works with running threads, and even exited
+     threads if the file was already open.  If we need to open or
+     reopen the /proc file though, we may get an EACCES error
+     ("Permission denied"), meaning the thread is gone but its exit
+     status isn't reaped yet, or ENOENT if the thread is gone and
+     already reaped.  In that case, just try opening the file for
+     another thread in the process.  If all threads fail, then it must
+     mean the whole process exited, in which case there's nothing else
+     to do and we just fail the memory access.
+
+     Note we don't simply always access via the leader thread because
+     the leader may exit without exiting the whole process.  See
+     gdb.threads/leader-exit.exp, for example.  */
+
+  /* It's frequently the case that the selected thread is stopped, and
+     is thus not likely to exit (unless something kills the process
+     outside our control, with e.g., SIGKILL).  Give that one a try
+     first.
+
+     Also, inferior_ptid may actually point at an LWP not in lwp_list.
+     This happens when we're detaching from a fork child that we don't
+     want to debug ("set detach-on-fork on"), and the breakpoints
+     module uninstalls breakpoints from the fork child.  Which process
+     to access is given by inferior_ptid.  */
+  int res = linux_proc_xfer_memory_partial_pid (inferior_ptid,
+						readbuf, writebuf,
+						offset, len);
+  if (res == 0)
+    {
+      /* EOF means the address space is gone, the whole
+	 process exited or execed.  */
+      return TARGET_XFER_EOF;
+    }
+  else if (res != -1)
+    {
+      *xfered_len = res;
+      return TARGET_XFER_OK;
+    }
   else
     {
-      *xfered_len = ret;
+      /* If we simply raced with the thread exiting (EACCES), or the
+	 current thread is THREAD_EXITED (ENOENT), try some other
+	 thread.  It's easier to handle an ENOENT failure than check
+	 for THREAD_EXIT upfront because this function is called
+	 before a thread for inferior_ptid is added to the thread
+	 list.  */
+      if (errno != EACCES && errno != ENOENT)
+	return TARGET_XFER_EOF;
+    }
+
+  int cur_pid = current_inferior ()->pid;
+
+  if (inferior_ptid.pid () != cur_pid)
+    {
+      /* We're accessing a fork child, and the access above failed.
+	 Don't bother iterating the LWP list, since there's no other
+	 LWP for this process.  */
+      return TARGET_XFER_EOF;
+    }
+
+  /* Iterate over LWPs of the current inferior, trying to access
+     memory through one of them.  */
+  for (lwp_info *lp = lwp_list; lp != nullptr; lp = lp->next)
+    {
+      if (lp->ptid.pid () != cur_pid)
+	continue;
+
+      res = linux_proc_xfer_memory_partial_pid (lp->ptid,
+						readbuf, writebuf,
+						offset, len);
+
+      if (res == 0)
+	{
+	  /* EOF means the address space is gone, the whole process
+	     exited or execed.  */
+	  return TARGET_XFER_EOF;
+	}
+      else if (res == -1)
+	{
+	  if (errno == EACCES)
+	    {
+	      /* This LWP is gone, try another one.  */
+	      continue;
+	    }
+
+	  return TARGET_XFER_EOF;
+	}
+
+      *xfered_len = res;
       return TARGET_XFER_OK;
     }
-}
 
+  /* No luck.  */
+  return TARGET_XFER_EOF;
+}
 
 /* Parse LINE as a signal set and add its set bits to SIGS.  */
 
diff --git a/gdb/testsuite/gdb.base/access-mem-running.c b/gdb/testsuite/gdb.base/access-mem-running.c
new file mode 100644
index 00000000000..2bb4f9766b5
--- /dev/null
+++ b/gdb/testsuite/gdb.base/access-mem-running.c
@@ -0,0 +1,47 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2021 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <unistd.h>
+
+static unsigned int global_counter = 1;
+
+static volatile unsigned int global_var = 123;
+
+static void
+maybe_stop_here ()
+{
+}
+
+int
+main (void)
+{
+  global_counter = 1;
+
+  while (global_counter > 0)
+    {
+      global_counter++;
+
+      /* Less than 1s, so the counter increments at least once while
+	 the .exp sleep 1s, but slow enough that the counter doesn't
+	 wrap in 1s.  */
+      usleep (5000);
+
+      maybe_stop_here ();
+    }
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/access-mem-running.exp b/gdb/testsuite/gdb.base/access-mem-running.exp
new file mode 100644
index 00000000000..6990d906da2
--- /dev/null
+++ b/gdb/testsuite/gdb.base/access-mem-running.exp
@@ -0,0 +1,124 @@ 
+# Copyright (C) 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test that we can access memory while the inferior is running.
+
+standard_testfile
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile {debug}] == -1} {
+    return -1
+}
+
+# The test proper.  NON_STOP indicates whether we're testing in
+# non-stop, or all-stop mode.
+
+proc test { non_stop } {
+    global srcfile binfile
+    global gdb_prompt
+    global GDBFLAGS
+    global decimal
+
+    save_vars { GDBFLAGS } {
+      append GDBFLAGS " -ex \"set non-stop $non_stop\""
+      clean_restart ${binfile}
+    }
+
+    if ![runto_main] {
+	return -1
+    }
+
+    # If debugging with target remote, check whether the all-stop variant
+    # of the RSP is being used.  If so, we can't run the background tests.
+    if {!$non_stop
+	&& [target_info exists gdb_protocol]
+	&& ([target_info gdb_protocol] == "remote"
+	    || [target_info gdb_protocol] == "extended-remote")} {
+
+	gdb_test_multiple "maint show target-non-stop" "" {
+	    -wrap -re "(is|currently) on.*" {
+	    }
+	    -wrap -re "(is|currently) off.*" {
+		unsupported "can't issue commands while target is running"
+		return 0
+	    }
+	}
+    }
+
+    delete_breakpoints
+
+    if {$non_stop == "off"} {
+	set cmd "continue &"
+    } else {
+	set cmd "continue -a &"
+    }
+    gdb_test_multiple $cmd "continuing" {
+	-re "Continuing\.\r\n$gdb_prompt " {
+	    pass $gdb_test_name
+	}
+    }
+
+    # Check we can read/write variables.
+
+    # Check that we can read the counter variable, and that the
+    # counter is increasing, meaning the process really is running.
+
+    sleep 1
+    set global_counter1 \
+	[get_integer_valueof "global_counter" 0 \
+	     "get global_counter once"]
+
+    sleep 1
+    set global_counter2 \
+	[get_integer_valueof "global_counter" 0 \
+	     "get global_counter twice"]
+
+    gdb_assert {$global_counter1 != 0 \
+		    && $global_counter2 != 0 \
+		    && $global_counter1 != $global_counter2} \
+	"value changed"
+
+    # Check that we can write variables.
+
+    gdb_test "print global_var" " = 123" \
+	"print global_var before writing"
+    gdb_test "print global_var = 321" " = 321" \
+	"write to global_var"
+    gdb_test "print global_var" " = 321" \
+	"print global_var after writing"
+    gdb_test "print global_var = 123" " = 123" \
+	"write to global_var again"
+
+    # Check we can set a breakpoint while the process is running.  The
+    # breakpoint should hit immediately.
+    set any "\[^\r\n\]*"
+
+    gdb_test_multiple "b maybe_stop_here" "" {
+	-re "Breakpoint $decimal at $any: file $any${srcfile}, line $decimal.\r\n$gdb_prompt " {
+	    pass $gdb_test_name
+	}
+    }
+    gdb_test_multiple "" "breakpoint hits" {
+	-re "Breakpoint $decimal, maybe_stop_here \\(\\) at $any${srcfile}:$decimal\r\n" {
+	    pass "$gdb_test_name"
+	}
+    }
+}
+
+foreach non_stop { "off" "on" } {
+    set stop_mode [expr ($non_stop=="off")?"all-stop":"non-stop"]
+    with_test_prefix "$stop_mode" {
+	test $non_stop
+    }
+}
diff --git a/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.c b/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.c
new file mode 100644
index 00000000000..5f89d223065
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.c
@@ -0,0 +1,123 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2021 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+
+#define THREADS 20
+
+static volatile unsigned int global_var = 123;
+
+/* Wrapper around pthread_create.   */
+
+static void
+create_thread (pthread_t *child,
+	       void *(*start_routine) (void *), void *arg)
+{
+  int rc;
+
+  while ((rc = pthread_create (child, NULL, start_routine, arg)) != 0)
+    {
+      fprintf (stderr, "unexpected error from pthread_create: %s (%d)\n",
+	       strerror (rc), rc);
+      sleep (1);
+    }
+}
+
+/* Data passed to threads on creation.  This is allocated on the heap
+   and ownership transferred from parent to child.  */
+
+struct thread_arg
+{
+  /* The thread's parent.  */
+  pthread_t parent;
+
+  /* Whether to call pthread_join on the parent.  */
+  int join_parent;
+};
+
+/* Entry point for threads.  */
+
+static void *
+thread_fn (void *arg)
+{
+  struct thread_arg *p = arg;
+
+  /* Passing no argument makes the thread exit immediately.  */
+  if (p == NULL)
+    return NULL;
+
+  if (p->join_parent)
+    assert (pthread_join (p->parent, NULL) == 0);
+
+  /* Spawn a number of threads that exit immediately, and then join
+     them.  The idea is to maximize the time window when we mostly
+     have threads exiting.  */
+  {
+    pthread_t child[THREADS];
+    int i;
+
+    /* Passing no argument makes the thread exit immediately.  */
+    for (i = 0; i < THREADS; i++)
+      create_thread (&child[i], thread_fn, NULL);
+
+    for (i = 0; i < THREADS; i++)
+      pthread_join (child[i], NULL);
+  }
+
+  /* Spawn a new thread that joins us, and exit.  The idea here is to
+     not have any thread that stays around forever.  */
+  {
+    pthread_t child;
+
+    p->parent = pthread_self ();
+    p->join_parent = 1;
+    create_thread (&child, thread_fn, p);
+  }
+
+  return NULL;
+}
+
+int
+main (void)
+{
+  int i;
+
+  for (i = 0; i < 4; i++)
+    {
+      struct thread_arg *p;
+      pthread_t child;
+
+      p = malloc (sizeof *p);
+      p->parent = pthread_self ();
+      /* Only join the parent once.  */
+      if (i == 0)
+	p->join_parent = 1;
+      else
+	p->join_parent = 0;
+      create_thread (&child, thread_fn, p);
+    }
+
+  /* Exit the leader to make sure that we can access memory with the
+     leader gone.  */
+  pthread_exit (NULL);
+}
diff --git a/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.exp b/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.exp
new file mode 100644
index 00000000000..ea228e4ba13
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.exp
@@ -0,0 +1,166 @@ 
+# Copyright (C) 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test that we can access memory while all the threads of the inferior
+# are running, and even if:
+#
+# - the leader thread exits
+# - the selected thread exits
+#
+# This test constantly spawns short lived threads to make sure that on
+# systems with debug APIs that require passing down a specific thread
+# to work with (e.g., GNU/Linux ptrace and /proc filesystem), GDB
+# copes with accessing memory just while the thread it is accessing
+# memory through exits.
+#
+# The test spawns two processes and alternates memory accesses between
+# them to force flushing per-process caches.  At the time of writing,
+# the Linux backend accesses inferior memory via /proc/<pid>/mem, and
+# keeps one such file open, as a cache.  Alternating inferiors forces
+# opening such file for a different process, which fails if GDB tries
+# to open the file for a thread that exited.  The test does ensures
+# those reopen/fail code paths are exercised.
+
+standard_testfile
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile {debug pthreads}] == -1} {
+    return -1
+}
+
+# The test proper.  NON_STOP indicates whether we're testing in
+# non-stop, or all-stop mode.
+
+proc test { non_stop } {
+    global binfile
+    global gdb_prompt
+    global GDBFLAGS
+
+    save_vars { GDBFLAGS } {
+      append GDBFLAGS " -ex \"set non-stop $non_stop\""
+      clean_restart ${binfile}
+    }
+
+    if ![runto_main] {
+	fail "cannot run to main"
+	return -1
+    }
+
+    # If debugging with target remote, check whether the all-stop variant
+    # of the RSP is being used.  If so, we can't run the background tests.
+    if {!$non_stop
+	&& [target_info exists gdb_protocol]
+	&& ([target_info gdb_protocol] == "remote"
+	    || [target_info gdb_protocol] == "extended-remote")} {
+
+	gdb_test_multiple "maint show target-non-stop" "" {
+	    -wrap -re "(is|currently) on.*" {
+	    }
+	    -wrap -re "(is|currently) off.*" {
+		unsupported "can't issue commands while target is running"
+		return 0
+	    }
+	}
+    }
+
+    delete_breakpoints
+
+    # Start the second inferior.
+    with_test_prefix "second inferior" {
+	gdb_test "add-inferior -no-connection" "New inferior 2.*"
+	gdb_test "inferior 2" "Switching to inferior 2 .*"
+
+	gdb_load $binfile
+
+	if ![runto_main] {
+	    fail "cannot run to main"
+	    return -1
+	}
+    }
+
+    delete_breakpoints
+
+    # These put too much noise in the logs.
+    gdb_test_no_output "set print thread-events off"
+
+    # Continue all threads of both processes.
+    gdb_test_no_output "set schedule-multiple on"
+    if {$non_stop == "off"} {
+	set cmd "continue &"
+    } else {
+	set cmd "continue -a &"
+    }
+    gdb_test_multiple $cmd "continuing" {
+	-re "Continuing\.\r\n$gdb_prompt " {
+	    pass $gdb_test_name
+	}
+    }
+
+    # Like gdb_test, but:
+    # - don't issue a pass on success.
+    # - on failure, clear the ok variable in the calling context, and
+    #   break it.
+    proc my_gdb_test {cmd pattern message} {
+	upvar inf inf
+	upvar iter iter
+	if {[gdb_test_multiple $cmd "access mem ($message, inf=$inf, iter=$iter)" {
+	    -wrap -re $pattern {
+	    }
+	}] != 0} {
+	    uplevel 1 {set ok 0}
+	    return -code break
+	}
+    }
+
+    # Hammer away for 5 seconds, alternating between inferiors.
+    set ::done 0
+    after 5000 { set ::done 1 }
+
+    set inf 1
+    set ok 1
+    set iter 0
+    while {!$::done && $ok} {
+	incr iter
+	verbose -log "xxxxx: iteration $iter"
+	gdb_test "info threads" ".*" ""
+
+	if {$inf == 1} {
+	    set inf 2
+	} else {
+	    set inf 1
+	}
+
+	my_gdb_test "inferior $inf" ".*" "inferior $inf"
+
+	my_gdb_test "print global_var = 555" " = 555" \
+	    "write to global_var"
+	my_gdb_test "print global_var" " = 555" \
+	    "print global_var after writing"
+	my_gdb_test "print global_var = 333" " = 333" \
+	    "write to global_var again"
+	my_gdb_test "print global_var" " = 333" \
+	    "print global_var after writing again"
+    }
+
+   if {$ok} {
+       pass "access mem"
+   }
+}
+
+foreach non_stop { "off" "on" } {
+    set stop_mode [expr ($non_stop=="off")?"all-stop":"non-stop"]
+    with_test_prefix "$stop_mode" {
+	test $non_stop
+    }
+}