[00/28] Decouple inferior_ptid/inferior_thread(); dup ptids in thread list (PR/25412)

Message ID 20200414175434.8047-1-palves@redhat.com
Headers show
Series
  • Decouple inferior_ptid/inferior_thread(); dup ptids in thread list (PR/25412)
Related show

Message

Simon Marchi via Gdb-patches April 14, 2020, 5:54 p.m.
In PR/25412, Simon noticed that after the multi-target series, the
tid-reuse.exp testcase manages to create a duplicate thread in the
thread list.  Or rather, two threads with the same PTID.

This in turn exposes a design problem in GDB.  The inferior_thread()
function looks up the current thread based on inferior_ptid:

 struct thread_info*
 inferior_thread (void)
 {
   struct thread_info *tp = find_thread_ptid (current_inferior (), inferior_ptid);
   gdb_assert (tp);
   return tp;
 }

But if there can be more than one thread in the thread list with the
same ptid_t, inferior_thread() may well return the wrong thread.

This series fixes this by making the current thread be a global
thread_info pointer that is written to directly by switch_to_thread,
etc., and making inferior_thread() return that pointer, instead of
having inferior_thread() lookup up the inferior_ptid thread, by
ptid_t.  You can look at this as a continuation of the effort of using
more thread_info pointers instead of ptids when possible.

This change required auditing the whole codebase for places where we
were writing to inferior_ptid directly to change the current thread,
and change them to use switch_to_thread instead or one of its
siblings, because otherwise inferior_thread() and inferior_ptid would
get out of sync and inferior_thread() would return a thread unrelated
to the new inferior_ptid we wanted to switch to.  That was all
(hopefully) done in all the patches leading up to the last one.

After this, inferior_ptid still exists, but it is mostly read-only and
mainly used by target backend code.  It is also relied on by a number
of target methods as a global input argument.  E.g., the target_resume
interface and the memory reading routines -- we still need it there
because we need to be able to access memory off of processes for which
we don't have a corresponding inferior/thread object, like when
handling forks.  Maybe we could pass down a context explicitly to
target_read_memory, etc.

Most of the host-/native-specific code here is untested.  I did my
best, but I won't be surprised if more tweaking is necessary.

Testing on native x86_64 GNU/Linux is regression free for me.  Testing
against gdbserver has regressed significantly in the past months and
is becoming difficult to run with a high number of long timeout
sequences; really looks like people aren't paying much attention to
that.  I think this series doesn't regress gdbserver, but it's getting
hard to tell.  :-/

Pedro Alves (28):
  Don't write to inferior_ptid in linux_get_siginfo_data
  gcore, handle exited threads better
  Refactor delete_program_space as a destructor
  Don't write to inferior_ptid in gdbarch-selftests.c, mock
    address_space too
  Don't write to inferior_ptid in inf-ptrace.c
  Don't write to inferior_ptid in target.c
  Don't write to inferior_ptid in infrun.c
  Don't write to inferior_ptid in procfs.c
  Don't write to inferior_ptid in tracefile-tfile.c
  Don't write to inferior_ptid in tracectf.c
  Don't write to inferior_ptid in remote.c
  Don't write to inferior_ptid in remote-sim.c
  Don't write to inferior_ptid in nto-procfs.c
  Don't write to inferior_ptid in go32-nat.c
  Don't write to inferior_ptid in gnu-nat.c
  Don't write to inferior_ptid in darwin-nat.c
  Don't write to inferior_ptid in corelow.c
  Don't write to inferior_ptid in bsd-kvm.c
  Don't write to inferior_ptid in btrace_fetch
  Don't write to inferior_ptid in bsd-kvm.c
  Don't write to inferior_ptid in fork-child.c
  Don't write to inferior_ptid in go32-nat.c
  Don't write to inferior_ptid in remote-sim.c
  Don't write to inferior_ptid in windows-nat.c, part I
  Don't write to inferior_ptid in windows-nat.c, part II
  Don't write to inferior_ptid in ravenscar-thread.c
  Don't write to inferior_ptid in aix-thread.c
  Decouple inferior_ptid/inferior_thread(); dup ptids in thread list
    (PR/25412)

 gdb/aix-thread.c        |   2 +-
 gdb/bsd-kvm.c           |   6 +--
 gdb/btrace.c            |   6 ---
 gdb/corelow.c           |  20 +++++-----
 gdb/darwin-nat.c        |  16 ++++----
 gdb/fork-child.c        |   3 --
 gdb/gdbarch-selftests.c |  17 ++++----
 gdb/gdbthread.h         |  17 ++++----
 gdb/gnu-nat.c           |  14 +++----
 gdb/go32-nat.c          |   8 +---
 gdb/inf-ptrace.c        |  19 +++++----
 gdb/inferior.c          |   2 +-
 gdb/infrun.c            | 101 +++++++++++++++++++-----------------------------
 gdb/linux-tdep.c        |  64 ++++++++++++++++--------------
 gdb/nat/windows-nat.c   |   1 -
 gdb/nat/windows-nat.h   |   3 --
 gdb/nto-procfs.c        |  24 ++++++------
 gdb/procfs.c            |  18 ++++-----
 gdb/progspace.c         |  79 ++++++++++++++++++++-----------------
 gdb/progspace.h         |   4 --
 gdb/ravenscar-thread.c  |  49 ++++++++++++-----------
 gdb/remote-sim.c        |  10 ++---
 gdb/remote.c            |  34 ++++++++--------
 gdb/target.c            |   2 +-
 gdb/thread.c            |  97 ++++++++++++++++------------------------------
 gdb/tracectf.c          |   7 ++--
 gdb/tracefile-tfile.c   |   7 ++--
 gdb/windows-nat.c       |  68 +++++++++++++++-----------------
 28 files changed, 315 insertions(+), 383 deletions(-)


base-commit: dd1cab0694592099854e66467319253954c93764
-- 
2.14.5

Comments

Simon Marchi via Gdb-patches April 14, 2020, 6:46 p.m. | #1
Am Dienstag, 14. April 2020, 19:54:45 MESZ hat Pedro Alves via Gdb-patches <gdb-patches@sourceware.org> Folgendes geschrieben:

> In PR/25412, Simon noticed that after the multi-target series, the

> tid-reuse.exp testcase manages to create a duplicate thread in the

> thread list.  Or rather, two threads with the same PTID.


I'm sorry to hijack this thread, but I just wanted to apply this patch series
for testing, but I'm already stuck on the first step.
How do you get patches from these mails that I then can use with "git am"
(or some other git command)?

For single patches I would just use the raw email, that seems to work fine,
but I thought there probably exists a simpler way for larger series that I
don't know about.


Regards
Hannes Domani
Simon Marchi via Gdb-patches April 14, 2020, 7:24 p.m. | #2
On 4/14/20 7:46 PM, Hannes Domani via Gdb-patches wrote:
>  Am Dienstag, 14. April 2020, 19:54:45 MESZ hat Pedro Alves via Gdb-patches <gdb-patches@sourceware.org> Folgendes geschrieben:

> 

>> In PR/25412, Simon noticed that after the multi-target series, the

>> tid-reuse.exp testcase manages to create a duplicate thread in the

>> thread list.  Or rather, two threads with the same PTID.

> 

> I'm sorry to hijack this thread, but I just wanted to apply this patch series

> for testing, but I'm already stuck on the first step.


Thanks much for testing!

> How do you get patches from these mails that I then can use with "git am"

> (or some other git command)?

> 

> For single patches I would just use the raw email, that seems to work fine,

> but I thought there probably exists a simpler way for larger series that I

> don't know about.


In Thunderbird, I select all the emails, right click, and then select
"save selected messages" -> "as Mbox file".  Them I use git am.
I'm not sure whether that's native functionality, or whether it
comes from the "ImportExportTools" add-on.

I have 

[format]
        useAutoBase = true

in my .gitconfig, which is what puts that 

 base-commit: dd1cab0694592099854e66467319253954c93764

line at the bottom of the cover letter.  That's the commit
on top of which the series applies.

But, to make it simpler, I've now pushed the series to the
users/palves/current-thread branch on sourceware.org, so you
can simply pick it from there.

Thanks,
Pedro Alves
Simon Marchi April 15, 2020, 2:46 p.m. | #3
On 2020-04-14 1:54 p.m., Pedro Alves via Gdb-patches wrote:
> In PR/25412, Simon noticed that after the multi-target series, the

> tid-reuse.exp testcase manages to create a duplicate thread in the

> thread list.  Or rather, two threads with the same PTID.

> 

> This in turn exposes a design problem in GDB.  The inferior_thread()

> function looks up the current thread based on inferior_ptid:

> 

>  struct thread_info*

>  inferior_thread (void)

>  {

>    struct thread_info *tp = find_thread_ptid (current_inferior (), inferior_ptid);

>    gdb_assert (tp);

>    return tp;

>  }

> 

> But if there can be more than one thread in the thread list with the

> same ptid_t, inferior_thread() may well return the wrong thread.


I don't quite understand this part of the explanation.  Thread lists are
per-inferior, and an inferior is specific to one target, and ptids are
unique per-target.  So it's not possible at the moment (or at least not
expected) to have two threads with the same ptid in one specific list.

I am having trouble connecting the dots between this explanation and the
problem originally reported, which IIRC correctly involves a single
inferior.

> This series fixes this by making the current thread be a global

> thread_info pointer that is written to directly by switch_to_thread,

> etc., and making inferior_thread() return that pointer, instead of

> having inferior_thread() lookup up the inferior_ptid thread, by

> ptid_t.  You can look at this as a continuation of the effort of using

> more thread_info pointers instead of ptids when possible.


Even though I don't understand the explanation above, I think this is
a good step forward.

> This change required auditing the whole codebase for places where we

> were writing to inferior_ptid directly to change the current thread,

> and change them to use switch_to_thread instead or one of its

> siblings, because otherwise inferior_thread() and inferior_ptid would

> get out of sync and inferior_thread() would return a thread unrelated

> to the new inferior_ptid we wanted to switch to.  That was all

> (hopefully) done in all the patches leading up to the last one.

> 

> After this, inferior_ptid still exists, but it is mostly read-only and

> mainly used by target backend code.  It is also relied on by a number

> of target methods as a global input argument.  E.g., the target_resume

> interface and the memory reading routines -- we still need it there

> because we need to be able to access memory off of processes for which

> we don't have a corresponding inferior/thread object, like when

> handling forks.  Maybe we could pass down a context explicitly to

> target_read_memory, etc.


I would like to do that eventually.

> Most of the host-/native-specific code here is untested.  I did my

> best, but I won't be surprised if more tweaking is necessary.

> 

> Testing on native x86_64 GNU/Linux is regression free for me.  Testing

> against gdbserver has regressed significantly in the past months and

> is becoming difficult to run with a high number of long timeout

> sequences; really looks like people aren't paying much attention to

> that.  I think this series doesn't regress gdbserver, but it's getting

> hard to tell.  :-/


Oops... we'll have to do some bisecting.

Simon
Simon Marchi April 15, 2020, 3:04 p.m. | #4
On 2020-04-14 3:24 p.m., Pedro Alves via Gdb-patches wrote:
> On 4/14/20 7:46 PM, Hannes Domani via Gdb-patches wrote:

>>  Am Dienstag, 14. April 2020, 19:54:45 MESZ hat Pedro Alves via Gdb-patches <gdb-patches@sourceware.org> Folgendes geschrieben:

>>

>>> In PR/25412, Simon noticed that after the multi-target series, the

>>> tid-reuse.exp testcase manages to create a duplicate thread in the

>>> thread list.  Or rather, two threads with the same PTID.

>>

>> I'm sorry to hijack this thread, but I just wanted to apply this patch series

>> for testing, but I'm already stuck on the first step.

> 

> Thanks much for testing!

> 

>> How do you get patches from these mails that I then can use with "git am"

>> (or some other git command)?

>>

>> For single patches I would just use the raw email, that seems to work fine,

>> but I thought there probably exists a simpler way for larger series that I

>> don't know about.

> 

> In Thunderbird, I select all the emails, right click, and then select

> "save selected messages" -> "as Mbox file".  Them I use git am.

> I'm not sure whether that's native functionality, or whether it

> comes from the "ImportExportTools" add-on.


I don't see it in my Thunderbird, so it must be the add-on.

I do almost the same, but using some built-in features: select the messages and
hit "ctrl-s" to save the messages in some directory.  In this case, they are
saved as one message per file.  I then do `git am path/to/that/directory/*`.

For a single patch, I sometimes display the source (ctrl-u) and copy it all.  I
then do `git am`, which makes it read on stdin.  Paste, then ctrl-d to send
EOF.  The caveat here is that if the patch contains some weird ascii character
(like the "horizontal tab" characters that are sometimes in the GDB source),
it looks like the paste messes it up and the patch doesn't apply.  Saving the
file as described previously works fine.

This works well when the patches were send with git-send-email, and is almost
always a pain otherwise.

> 

> I have 

> 

> [format]

>         useAutoBase = true

> 

> in my .gitconfig, which is what puts that 

> 

>  base-commit: dd1cab0694592099854e66467319253954c93764

> 

> line at the bottom of the cover letter.  That's the commit

> on top of which the series applies.


This means you also save the cover letter in your mbox file?

Simon
Simon Marchi via Gdb-patches April 15, 2020, 3:33 p.m. | #5
On 4/15/20 3:46 PM, Simon Marchi wrote:
> On 2020-04-14 1:54 p.m., Pedro Alves via Gdb-patches wrote:

>> In PR/25412, Simon noticed that after the multi-target series, the

>> tid-reuse.exp testcase manages to create a duplicate thread in the

>> thread list.  Or rather, two threads with the same PTID.

>>

>> This in turn exposes a design problem in GDB.  The inferior_thread()

>> function looks up the current thread based on inferior_ptid:

>>

>>  struct thread_info*

>>  inferior_thread (void)

>>  {

>>    struct thread_info *tp = find_thread_ptid (current_inferior (), inferior_ptid);

>>    gdb_assert (tp);

>>    return tp;

>>  }

>>

>> But if there can be more than one thread in the thread list with the

>> same ptid_t, inferior_thread() may well return the wrong thread.

> 

> I don't quite understand this part of the explanation.  Thread lists are

> per-inferior, and an inferior is specific to one target, and ptids are

> unique per-target.  So it's not possible at the moment (or at least not

> expected) to have two threads with the same ptid in one specific list.


The tid-reuse.exp testcase is about:

 # Test running a program that spawns enough threads that the tid of an
 # exited thread is reused.  GDB should not crash when this happens.

I.e., the testcase spawns enough short-lived threads that we end up
in add_thread_silent adding a new thread with the same ptid as
a preexisting thread, because the kernel tids eventually wrap around.
In this scenario, you know the preexisting thread with that ptid must
be gone, since otherwise the kernel would not be reusing the thread id.

But if that thread that is gone was the currently selected
thread, or somewhere throughout gdb something holds a reference
to it, then we can't delete that exiting thread immediately.  That's
how you end up with more than one thread with the same ptid in the
thread list.  Only one of those threads will be live, all other
"dups" will be exited threads.

So given that currently inferior_thread() does a look up by
ptid, it can happen that GDB meant to have the live thread
selected temporarily, say, but inferior_thread() returns the
exited thread. Like, this could fail:

switch_to_thread (exited_thread);
{
 scoped_restore_current_thread restore_thread;
 switch_to_thread (live_thread);
 gdb_assert (inferior_thread () == live_thread);
}

Note this happens with a single inferior.

Thanks,
Pedro Alves
Simon Marchi April 15, 2020, 3:42 p.m. | #6
On 2020-04-15 11:33 a.m., Pedro Alves wrote:
> On 4/15/20 3:46 PM, Simon Marchi wrote:

>> On 2020-04-14 1:54 p.m., Pedro Alves via Gdb-patches wrote:

>>> In PR/25412, Simon noticed that after the multi-target series, the

>>> tid-reuse.exp testcase manages to create a duplicate thread in the

>>> thread list.  Or rather, two threads with the same PTID.

>>>

>>> This in turn exposes a design problem in GDB.  The inferior_thread()

>>> function looks up the current thread based on inferior_ptid:

>>>

>>>  struct thread_info*

>>>  inferior_thread (void)

>>>  {

>>>    struct thread_info *tp = find_thread_ptid (current_inferior (), inferior_ptid);

>>>    gdb_assert (tp);

>>>    return tp;

>>>  }

>>>

>>> But if there can be more than one thread in the thread list with the

>>> same ptid_t, inferior_thread() may well return the wrong thread.

>>

>> I don't quite understand this part of the explanation.  Thread lists are

>> per-inferior, and an inferior is specific to one target, and ptids are

>> unique per-target.  So it's not possible at the moment (or at least not

>> expected) to have two threads with the same ptid in one specific list.

> 

> The tid-reuse.exp testcase is about:

> 

>  # Test running a program that spawns enough threads that the tid of an

>  # exited thread is reused.  GDB should not crash when this happens.

> 

> I.e., the testcase spawns enough short-lived threads that we end up

> in add_thread_silent adding a new thread with the same ptid as

> a preexisting thread, because the kernel tids eventually wrap around.

> In this scenario, you know the preexisting thread with that ptid must

> be gone, since otherwise the kernel would not be reusing the thread id.

> 

> But if that thread that is gone was the currently selected

> thread, or somewhere throughout gdb something holds a reference

> to it, then we can't delete that exiting thread immediately.  That's

> how you end up with more than one thread with the same ptid in the

> thread list.  Only one of those threads will be live, all other

> "dups" will be exited threads.

> 

> So given that currently inferior_thread() does a look up by

> ptid, it can happen that GDB meant to have the live thread

> selected temporarily, say, but inferior_thread() returns the

> exited thread. Like, this could fail:

> 

> switch_to_thread (exited_thread);

> {

>  scoped_restore_current_thread restore_thread;

>  switch_to_thread (live_thread);

>  gdb_assert (inferior_thread () == live_thread);

> }

> 

> Note this happens with a single inferior.

> 

> Thanks,

> Pedro Alves

> 


Ok thanks, that clarifies it.

Simon
Simon Marchi via Gdb-patches April 16, 2020, 1:41 p.m. | #7
On 4/15/20 4:04 PM, Simon Marchi wrote:

>>

>> I have 

>>

>> [format]

>>         useAutoBase = true

>>

>> in my .gitconfig, which is what puts that 

>>

>>  base-commit: dd1cab0694592099854e66467319253954c93764

>>

>> line at the bottom of the cover letter.  That's the commit

>> on top of which the series applies.

> 

> This means you also save the cover letter in your mbox file?

Nope, I don't think "git am" understands that base-commit
line.  I tried it now, and "git am" complained like so:

 $ git am /tmp/test.mbox
 Patch is empty.
 When you have resolved this problem, run "git am --continue".
 If you prefer to skip this patch, run "git am --skip" instead.
 To restore the original branch and stop patching, run "git am --abort".

No patch was applied, so this was the cover letter.

That line in the cover letter is put there when I do "git format-patch"
(or "git send-email"), so it's in the opposite direction of previously
described procedure of getting patches from Thunderbird into "git am".

Thanks,
Pedro Alves
Tom Tromey April 17, 2020, 8:20 p.m. | #8
>>>>> "Pedro" == Pedro Alves via Gdb-patches <gdb-patches@sourceware.org> writes:


I meant to mention, I skimmed through this series and didn't see
anything objectionable.  It's hard to know about this kind of thing
without trying it (which I didn't do); but IMO it's fine to land this
and then iron out anything that pops up.

Pedro> After this, inferior_ptid still exists, but it is mostly read-only and
Pedro> mainly used by target backend code.

Could it be made completely read-only?

Pedro> It is also relied on by a number
Pedro> of target methods as a global input argument.  E.g., the target_resume
Pedro> interface and the memory reading routines -- we still need it there
Pedro> because we need to be able to access memory off of processes for which
Pedro> we don't have a corresponding inferior/thread object, like when
Pedro> handling forks.  Maybe we could pass down a context explicitly to
Pedro> target_read_memory, etc.

This sounds like a good direction.

Tom
Simon Marchi via Gdb-patches June 18, 2020, 8 p.m. | #9
On 4/17/20 9:20 PM, Tom Tromey wrote:
>>>>>> "Pedro" == Pedro Alves via Gdb-patches <gdb-patches@sourceware.org> writes:

> 

> I meant to mention, I skimmed through this series and didn't see

> anything objectionable.  It's hard to know about this kind of thing

> without trying it (which I didn't do); but IMO it's fine to land this

> and then iron out anything that pops up.

> 

> Pedro> After this, inferior_ptid still exists, but it is mostly read-only and

> Pedro> mainly used by target backend code.

> 

> Could it be made completely read-only?


We could, with e.g. a hack like in the attached patch.

That patch is sufficient to make GDB build on x86-64 GNU/Linux, but
certainly it would require more changes across all the native backends.
I'm not sure it really buys us much, since the remaining writes are just
as easily found by grepping for "inferior_ptid =".

> 

> Pedro> It is also relied on by a number

> Pedro> of target methods as a global input argument.  E.g., the target_resume

> Pedro> interface and the memory reading routines -- we still need it there

> Pedro> because we need to be able to access memory off of processes for which

> Pedro> we don't have a corresponding inferior/thread object, like when

> Pedro> handling forks.  Maybe we could pass down a context explicitly to

> Pedro> target_read_memory, etc.

> 

> This sounds like a good direction.


Thanks,
Pedro Alves
From dd45a29433d527344f0c89bed01dfa645d845351 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>

Date: Thu, 18 Jun 2020 20:35:10 +0100
Subject: [PATCH] inferior_ptid_t

---
 gdb/breakpoint.c   |  2 +-
 gdb/infcmd.c       |  2 +-
 gdb/inferior.h     | 33 ++++++++++++++++++++++++++++++++-
 gdb/infrun.c       |  5 ++---
 gdb/linux-nat.c    |  2 +-
 gdb/proc-service.c |  2 +-
 gdb/regcache.c     |  5 +++--
 gdb/remote.c       |  2 +-
 gdb/thread.c       |  4 ++--
 9 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/gdb/breakpoint.c b/gdb/breakpoint.c
index aead882acd..dc080122bf 100644
--- a/gdb/breakpoint.c
+++ b/gdb/breakpoint.c
@@ -3584,7 +3584,7 @@ detach_breakpoints (ptid_t ptid)
     error (_("Cannot detach breakpoints of inferior_ptid"));
 
   /* Set inferior_ptid; remove_breakpoint_1 uses this global.  */
-  inferior_ptid = ptid;
+  inferior_ptid.set (ptid);
   ALL_BP_LOCATIONS (bl, blp_tmp)
   {
     if (bl->pspace != inf->pspace)
diff --git a/gdb/infcmd.c b/gdb/infcmd.c
index 891da91c80..9b06f03433 100644
--- a/gdb/infcmd.c
+++ b/gdb/infcmd.c
@@ -88,7 +88,7 @@ static char *inferior_io_terminal_scratch;
    being debugged it should be nonzero (currently 3 is used) for remote
    debugging.  */
 
-ptid_t inferior_ptid;
+inferior_ptid_t inferior_ptid;
 
 /* Nonzero if stopped due to completion of a stack dummy routine.  */
 
diff --git a/gdb/inferior.h b/gdb/inferior.h
index 95af474eed..79bd9cb351 100644
--- a/gdb/inferior.h
+++ b/gdb/inferior.h
@@ -122,10 +122,41 @@ extern void clear_sigint_trap (void);
 extern void set_inferior_io_terminal (const char *terminal_name);
 extern const char *get_inferior_io_terminal (void);
 
+class inferior_ptid_t
+{
+public:
+  int pid () const { return m_ptid.pid (); }
+  bool lwp_p () const { return m_ptid.lwp_p (); }
+  long lwp () const { return m_ptid.lwp (); }
+  bool tid_p () const { return m_ptid.tid_p (); }
+  long tid () const { return m_ptid.tid (); }
+  long is_tid () const { return m_ptid.tid (); }
+  bool is_pid () const { return m_ptid.is_pid (); }
+
+  bool matches (const ptid_t &filter) const
+  { return m_ptid.matches (filter); }
+
+  void set (ptid_t ptid) { m_ptid = ptid; }
+
+  operator ptid_t () { return m_ptid; }
+
+  friend bool operator== (const inferior_ptid_t &lhs, const ptid_t &rhs)
+  { return lhs.m_ptid == rhs; }
+  friend bool operator== (const ptid_t &lhs, const inferior_ptid_t &rhs)
+  { return lhs == rhs.m_ptid; }
+  friend bool operator!= (const inferior_ptid_t &lhs, const ptid_t &rhs)
+  { return lhs.m_ptid != rhs; }
+  friend bool operator!= (const ptid_t &lhs, const inferior_ptid_t &rhs)
+  { return lhs != rhs.m_ptid; }
+
+private:
+  ptid_t m_ptid = null_ptid;
+};
+
 /* Collected pid, tid, etc. of the debugged inferior.  When there's
    no inferior, inferior_ptid.pid () will be 0.  */
 
-extern ptid_t inferior_ptid;
+extern inferior_ptid_t inferior_ptid;
 
 extern void generic_mourn_inferior (void);
 
diff --git a/gdb/infrun.c b/gdb/infrun.c
index 7bc405f103..0f38485536 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -1835,7 +1835,7 @@ write_memory_ptid (ptid_t ptid, CORE_ADDR memaddr,
 {
   scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid);
 
-  inferior_ptid = ptid;
+  inferior_ptid.set (ptid);
   write_memory (memaddr, myaddr, len);
 }
 
@@ -2071,7 +2071,7 @@ static void
 infrun_thread_ptid_changed (ptid_t old_ptid, ptid_t new_ptid)
 {
   if (inferior_ptid == old_ptid)
-    inferior_ptid = new_ptid;
+    inferior_ptid.set (new_ptid);
 }
 
 
@@ -9700,7 +9700,6 @@ enabled by default on some platforms."),
 			   &setlist, &showlist);
 
   /* ptid initializations */
-  inferior_ptid = null_ptid;
   target_last_wait_ptid = minus_one_ptid;
 
   gdb::observers::thread_ptid_changed.attach (infrun_thread_ptid_changed);
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 0a2bfdc57d..65438743d1 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -2444,7 +2444,7 @@ static int
 check_stopped_by_watchpoint (struct lwp_info *lp)
 {
   scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid);
-  inferior_ptid = lp->ptid;
+  inferior_ptid.set (lp->ptid);
 
   if (linux_target->low_stopped_by_watchpoint ())
     {
diff --git a/gdb/proc-service.c b/gdb/proc-service.c
index e0383700a1..2d4749bd21 100644
--- a/gdb/proc-service.c
+++ b/gdb/proc-service.c
@@ -77,7 +77,7 @@ ps_xfer_memory (const struct ps_prochandle *ph, psaddr_t addr,
   set_current_program_space (ph->thread->inf->pspace);
 
   scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid);
-  inferior_ptid = ph->thread->ptid;
+  inferior_ptid.set (ph->thread->ptid);
 
   CORE_ADDR core_addr = ps_addr_to_core_addr (addr);
 
diff --git a/gdb/regcache.c b/gdb/regcache.c
index 6a4359d0f3..756ea16c03 100644
--- a/gdb/regcache.c
+++ b/gdb/regcache.c
@@ -1637,8 +1637,9 @@ cooked_read_test (struct gdbarch *gdbarch)
   } pop_targets;
 
   /* Switch to the mock thread.  */
-  scoped_restore restore_inferior_ptid
-    = make_scoped_restore (&inferior_ptid, mock_ptid);
+  ptid_t save_ptid = inferior_ptid;
+  inferior_ptid.set (mock_ptid);
+  SCOPE_EXIT { inferior_ptid.set (save_ptid); };
 
   /* Test that read one raw register from regcache_no_target will go
      to the target layer.  */
diff --git a/gdb/remote.c b/gdb/remote.c
index fd89f2c084..8259831662 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -5910,7 +5910,7 @@ extended_remote_target::attach (const char *args, int from_tty)
 
   switch_to_inferior_no_thread (remote_add_inferior (false, pid, 1, 0));
 
-  inferior_ptid = ptid_t (pid);
+  inferior_ptid.set (ptid_t (pid));
 
   if (target_is_non_stop_p ())
     {
diff --git a/gdb/thread.c b/gdb/thread.c
index f0722d3588..73d8f23b5d 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -1285,7 +1285,7 @@ switch_to_thread_no_regs (struct thread_info *thread)
   set_current_inferior (inf);
 
   current_thread_ = thread;
-  inferior_ptid = current_thread_->ptid;
+  inferior_ptid.set (current_thread_->ptid);
 }
 
 /* See gdbthread.h.  */
@@ -1297,7 +1297,7 @@ switch_to_no_thread ()
     return;
 
   current_thread_ = nullptr;
-  inferior_ptid = null_ptid;
+  inferior_ptid.set (null_ptid);
   reinit_frame_cache ();
 }
 

base-commit: 2c074f49026acbe0597e0d2d2f7385195dcac565
prerequisite-patch-id: cf75804d57b7783ba470ada8e827d46c73c0d9e1
prerequisite-patch-id: 2f376c3e9de1bb5426723bc3081d27dc0221340b
prerequisite-patch-id: 7532dda900933a23405793d23231b7201b13c9e5
prerequisite-patch-id: b8e62093defdd58b9320d2d0cabef5797e962be8
prerequisite-patch-id: 060947c675effc4648dfc5f9929b084ee2b644a5
prerequisite-patch-id: e7c21d96301996d2e937aab3642fcda95c64c2c6
prerequisite-patch-id: 87412331bbd9689c4c93fc21fc70bc886230f57d
prerequisite-patch-id: 89d26483bf1c3300911e98f8a969fe97114dc671
prerequisite-patch-id: 59e116edd554952f722a8a741bf29d2080fdad3c
prerequisite-patch-id: 963bb9047825cf80900913b7c0ca4d975a00eb9d
prerequisite-patch-id: dbee563bfde550440a76eb4efa9441a84c19d366
prerequisite-patch-id: 1699f61662e53f0cb5d907d52d21480fdbab912c
prerequisite-patch-id: 7b0edf3f61a48e06ff1468be1234fa307e978629
prerequisite-patch-id: 259ed6608d652dc47d90539bce36c3002d3a75e4
prerequisite-patch-id: 6fb426c105967a6dcc0778c15a9881520537bb17
prerequisite-patch-id: c50eef1048b695c5d2305c38aaa510ccd30c671d
prerequisite-patch-id: 807aa4cdc04dac6c477d415115cf43283708fcb6
prerequisite-patch-id: 47b63abf5b4378c3d7c86774c302ccf29eda8182
prerequisite-patch-id: 00f4830b5d5dcea8d12ec42615ebf46df55909cf
prerequisite-patch-id: c06aebbe76e78f013baffeb2fcd3d6858b696b8a
prerequisite-patch-id: 84773b2ba6ae77d03da98678e2cf4ad966ad84fb
prerequisite-patch-id: 7a79ad19e2848b4b8ddc0acc7462955c327afe55
prerequisite-patch-id: b119df65172817eaccc0032178bb4173348d4ea4
prerequisite-patch-id: 0611a6654ffd0287a490debede469daea32c656b
prerequisite-patch-id: 729699cf50d1057fa11919434fcac6590a32e2eb
prerequisite-patch-id: 082610a38a64e718d229eb4484ef87dcba547bda
prerequisite-patch-id: 3e123af7bab734e29247f210a29a3d7c8946b533
prerequisite-patch-id: da39a3ee5e6b4b0d3255bfef95601890afd80709
-- 
2.14.5
Simon Marchi via Gdb-patches June 18, 2020, 10:30 p.m. | #10
On 4/14/20 6:54 PM, Pedro Alves via Gdb-patches wrote:
> In PR/25412, Simon noticed that after the multi-target series, the

> tid-reuse.exp testcase manages to create a duplicate thread in the

> thread list.  Or rather, two threads with the same PTID.

> 

> This in turn exposes a design problem in GDB.  The inferior_thread()

> function looks up the current thread based on inferior_ptid:

> 


...

> This series fixes this by making the current thread be a global

> thread_info pointer that is written to directly by switch_to_thread,

> etc., and making inferior_thread() return that pointer, instead of

> having inferior_thread() lookup up the inferior_ptid thread, by

> ptid_t.  You can look at this as a continuation of the effort of using

> more thread_info pointers instead of ptids when possible.


I've pushed this now.

Thanks,
Pedro Alves
John Baldwin July 7, 2020, 11:16 p.m. | #11
On 4/14/20 10:54 AM, Pedro Alves via Gdb-patches wrote:
> In PR/25412, Simon noticed that after the multi-target series, the

> tid-reuse.exp testcase manages to create a duplicate thread in the

> thread list.  Or rather, two threads with the same PTID.

> 

> This in turn exposes a design problem in GDB.  The inferior_thread()

> function looks up the current thread based on inferior_ptid:

> 

>  struct thread_info*

>  inferior_thread (void)

>  {

>    struct thread_info *tp = find_thread_ptid (current_inferior (), inferior_ptid);

>    gdb_assert (tp);

>    return tp;

>  }

> 

> But if there can be more than one thread in the thread list with the

> same ptid_t, inferior_thread() may well return the wrong thread.

> 

> This series fixes this by making the current thread be a global

> thread_info pointer that is written to directly by switch_to_thread,

> etc., and making inferior_thread() return that pointer, instead of

> having inferior_thread() lookup up the inferior_ptid thread, by

> ptid_t.  You can look at this as a continuation of the effort of using

> more thread_info pointers instead of ptids when possible.

> 

> This change required auditing the whole codebase for places where we

> were writing to inferior_ptid directly to change the current thread,

> and change them to use switch_to_thread instead or one of its

> siblings, because otherwise inferior_thread() and inferior_ptid would

> get out of sync and inferior_thread() would return a thread unrelated

> to the new inferior_ptid we wanted to switch to.  That was all

> (hopefully) done in all the patches leading up to the last one.

> 

> After this, inferior_ptid still exists, but it is mostly read-only and

> mainly used by target backend code.  It is also relied on by a number

> of target methods as a global input argument.  E.g., the target_resume

> interface and the memory reading routines -- we still need it there

> because we need to be able to access memory off of processes for which

> we don't have a corresponding inferior/thread object, like when

> handling forks.  Maybe we could pass down a context explicitly to

> target_read_memory, etc.

> 

> Most of the host-/native-specific code here is untested.  I did my

> best, but I won't be surprised if more tweaking is necessary.

> 

> Testing on native x86_64 GNU/Linux is regression free for me.  Testing

> against gdbserver has regressed significantly in the past months and

> is becoming difficult to run with a high number of long timeout

> sequences; really looks like people aren't paying much attention to

> that.  I think this series doesn't regress gdbserver, but it's getting

> hard to tell.  :-/


This appears to have broken native debugging on FreeBSD for me in that
when I run a process to completion it triggers an assertion failure:

(gdb) r
Starting program: /bin/echo 

Child process unexpectedly missing: No child processes.
/home/john/work/git/gdb/gdb/inferior.c:293: internal-error: struct inferior *find_inferior_pid(process_stratum_target *, int): Assertion `pid != 0' failed.

I tracked this down to this code in inf_ptrace::wait():

      /* Ignore terminated detached child processes.  */
      if (!WIFSTOPPED (status) && pid != inferior_ptid.pid ())
        pid = -1;

At this point, inferior_ptid() is all zeroes and the process
has reported a non-stopped exit status (WIFEXITED) so this
ignores the exit event and loops back around to call waitpid()
again which then fails with ECHILD.

Looks like we always clear the inferior thread (and thus
inferior_ptid) in do_target_wait_1:

  /* We know that we are looking for an event in the target of inferior
     INF, but we don't know which thread the event might come from.  As
     such we want to make sure that INFERIOR_PTID is reset so that none of
     the wait code relies on it - doing so is always a mistake.  */
  switch_to_inferior_no_thread (inf);

Commenting out the check in inf_ptrace::wait() "fixes" the issue for
me on FreeBSD, but I'm not sure that is the right fix.

It seems to me that multiprocess probably needs to return events for
not just the current inferior pid but for any valid pid for example,
and though multiprocess is still broken for me on FreeBSD if I comment
out the check against inferior_ptid, I can now see that I was getting
an event for the "wrong" inferior that was previously discarded but
with the check commented out now gets reported to the core.

(The reason I get an event for the wrong process is that for some
reason the core asks the native target to resume the wrong process:

> ./gdb /bin/ls

(gdb) set debug fbsd-nat 
(gdb) set debug fbsd-lwp 
(gdb) start
Temporary breakpoint 1 at 0x20430a: file /usr/src/bin/ls/ls.c, line 236.
Starting program: /bin/ls 
FNAT: stop for LWP 101518 event 1 flags 0x18
FLWP: using LWP 101518 for first thread
FLWP: fbsd_resume for ptid (-1, 0, 0)
FNAT: stop for LWP 101518 event 1 flags 0x20
FNAT: si_signo 20 si_code 1
FLWP: fbsd_resume for ptid (-1, 0, 0)
FNAT: stop for LWP 101518 event 1 flags 0x20
FNAT: si_signo 20 si_code 1
FLWP: fbsd_resume for ptid (-1, 0, 0)
FNAT: stop for LWP 101518 event 1 flags 0x20
FNAT: si_signo 20 si_code 1
FLWP: fbsd_resume for ptid (-1, 0, 0)
FNAT: stop for LWP 101518 event 1 flags 0x20
FNAT: si_signo 20 si_code 1
FLWP: fbsd_resume for ptid (-1, 0, 0)
FNAT: stop for LWP 101518 event 1 flags 0x20
FNAT: si_signo 20 si_code 1
FLWP: fbsd_resume for ptid (-1, 0, 0)
FNAT: stop for LWP 101518 event 1 flags 0x18
FLWP: fbsd_resume for ptid (-1, 0, 0)
FNAT: stop for LWP 101518 event 1 flags 0x20
FNAT: si_signo 5 si_code 1
FNAT: sw breakpoint trap for LWP 101518
FLWP: fbsd_resume for ptid (70453, 101518, 0)
FNAT: stop for LWP 101518 event 1 flags 0x20
FNAT: si_signo 5 si_code 2
FNAT: trace trap for LWP 101518
FLWP: fbsd_resume for ptid (-1, 0, 0)
FNAT: stop for LWP 101518 event 1 flags 0x20
FNAT: si_signo 5 si_code 1
FNAT: sw breakpoint trap for LWP 101518

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe710)
    at /usr/src/bin/ls/ls.c:236
236		const char *errstr = NULL;
(gdb) add-inferior 
[New inferior 2]
Added inferior 2 on connection 1 (native)
(gdb) inferior 2
[Switching to inferior 2 [<null>] (<noexec>)]
(gdb) file /bin/ls
Reading symbols from /bin/ls...
Reading symbols from /usr/lib/debug//bin/ls.debug...
(gdb) start
Temporary breakpoint 2 at 0x20430a: -qualified main. (2 locations)
Starting program: /bin/ls 
FNAT: stop for LWP 101641 event 1 flags 0x18
FLWP: using LWP 101641 for first thread
FLWP: fbsd_resume for ptid (-1, 0, 0)
FNAT: stop for LWP 101641 event 1 flags 0x20
FNAT: si_signo 20 si_code 1
FLWP: fbsd_resume for ptid (-1, 0, 0)
FNAT: stop for LWP 101641 event 1 flags 0x20
FNAT: si_signo 20 si_code 1
FLWP: fbsd_resume for ptid (-1, 0, 0)
FNAT: stop for LWP 101641 event 1 flags 0x20
FNAT: si_signo 20 si_code 1
FLWP: fbsd_resume for ptid (-1, 0, 0)
FNAT: stop for LWP 101641 event 1 flags 0x24
FNAT: si_signo 20 si_code 1
FLWP: fbsd_resume for ptid (-1, 0, 0)
FNAT: stop for LWP 101641 event 1 flags 0x24
FNAT: si_signo 20 si_code 1
FLWP: fbsd_resume for ptid (-1, 0, 0)
FNAT: stop for LWP 101641 event 1 flags 0x20
FNAT: si_signo 20 si_code 1
FLWP: fbsd_resume for ptid (-1, 0, 0)
FNAT: stop for LWP 101641 event 1 flags 0x18
FLWP: fbsd_resume for ptid (70453, 101518, 0)

Program terminated with signal SIGTRAP, Trace/breakpoint trap.
The program no longer exists.

Here you can see that the last call to fbsd_resume() used the ptid from
inferior 1 instead of inferior 2, and inferior 1 didn't discard it's
pending SIGTRAP but instead was killed by it.)

-- 
John Baldwin
Pedro Alves July 7, 2020, 11:53 p.m. | #12
On 7/8/20 12:16 AM, John Baldwin wrote:
> This appears to have broken native debugging on FreeBSD for me in that

> when I run a process to completion it triggers an assertion failure:

> 

> (gdb) r

> Starting program: /bin/echo 

> 

> Child process unexpectedly missing: No child processes.

> /home/john/work/git/gdb/gdb/inferior.c:293: internal-error: struct inferior *find_inferior_pid(process_stratum_target *, int): Assertion `pid != 0' failed.

> 

> I tracked this down to this code in inf_ptrace::wait():

> 

>       /* Ignore terminated detached child processes.  */

>       if (!WIFSTOPPED (status) && pid != inferior_ptid.pid ())

>         pid = -1;

> 

> At this point, inferior_ptid() is all zeroes and the process

> has reported a non-stopped exit status (WIFEXITED) so this

> ignores the exit event and loops back around to call waitpid()

> again which then fails with ECHILD.

> 

> Looks like we always clear the inferior thread (and thus

> inferior_ptid) in do_target_wait_1:

> 

>   /* We know that we are looking for an event in the target of inferior

>      INF, but we don't know which thread the event might come from.  As

>      such we want to make sure that INFERIOR_PTID is reset so that none of

>      the wait code relies on it - doing so is always a mistake.  */

>   switch_to_inferior_no_thread (inf);

> 

> Commenting out the check in inf_ptrace::wait() "fixes" the issue for

> me on FreeBSD, but I'm not sure that is the right fix.


linux-nat.c has something logic, here in linux_nat_filter_event:

  /* Make sure we don't report an event for the exit of an LWP not in
     our list, i.e. not part of the current process.  This can happen
     if we detach from a program we originally forked and then it
     exits.  */
  if (!WIFSTOPPED (status) && !lp)
    return NULL;

For inf-ptrace.c, the right fix should be something around this:

diff --git c/gdb/inf-ptrace.c w/gdb/inf-ptrace.c
index d25d226abb..ae0b0f7ff0 100644
--- c/gdb/inf-ptrace.c
+++ w/gdb/inf-ptrace.c
@@ -347,7 +347,7 @@ inf_ptrace_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
        }
 
       /* Ignore terminated detached child processes.  */
-      if (!WIFSTOPPED (status) && pid != inferior_ptid.pid ())
+      if (!WIFSTOPPED (status) && find_inferior_pid (this, pid) == nullptr)
        pid = -1;
     }
   while (pid == -1);


The inferior_ptid reference in the error path above that code, where it
reads:

  "Claim it exited with unknown signal"

is also wrong.  I'm not sure what to do about that one.
Not clear if that path can really happen.  linux-nat.c calls
perror_with_name in some places if waitpid returns -1,
and in the main waitpid call, just assumes that error -1 never
happens...

> 

> It seems to me that multiprocess probably needs to return events for

> not just the current inferior pid but for any valid pid for example,


Yes.

> and though multiprocess is still broken for me on FreeBSD if I comment

> out the check against inferior_ptid, I can now see that I was getting

> an event for the "wrong" inferior that was previously discarded but

> with the check commented out now gets reported to the core.

> 

> (The reason I get an event for the wrong process is that for some

> reason the core asks the native target to resume the wrong process:

> 

>> ./gdb /bin/ls

> (gdb) set debug fbsd-nat 

> (gdb) set debug fbsd-lwp 

> (gdb) start

> Temporary breakpoint 1 at 0x20430a: file /usr/src/bin/ls/ls.c, line 236.

> Starting program: /bin/ls 

> FNAT: stop for LWP 101518 event 1 flags 0x18

> FLWP: using LWP 101518 for first thread

> FLWP: fbsd_resume for ptid (-1, 0, 0)

> FNAT: stop for LWP 101518 event 1 flags 0x20

> FNAT: si_signo 20 si_code 1

> FLWP: fbsd_resume for ptid (-1, 0, 0)

> FNAT: stop for LWP 101518 event 1 flags 0x20

> FNAT: si_signo 20 si_code 1

> FLWP: fbsd_resume for ptid (-1, 0, 0)

> FNAT: stop for LWP 101518 event 1 flags 0x20

> FNAT: si_signo 20 si_code 1

> FLWP: fbsd_resume for ptid (-1, 0, 0)

> FNAT: stop for LWP 101518 event 1 flags 0x20

> FNAT: si_signo 20 si_code 1

> FLWP: fbsd_resume for ptid (-1, 0, 0)

> FNAT: stop for LWP 101518 event 1 flags 0x20

> FNAT: si_signo 20 si_code 1

> FLWP: fbsd_resume for ptid (-1, 0, 0)

> FNAT: stop for LWP 101518 event 1 flags 0x18

> FLWP: fbsd_resume for ptid (-1, 0, 0)

> FNAT: stop for LWP 101518 event 1 flags 0x20

> FNAT: si_signo 5 si_code 1

> FNAT: sw breakpoint trap for LWP 101518

> FLWP: fbsd_resume for ptid (70453, 101518, 0)

> FNAT: stop for LWP 101518 event 1 flags 0x20

> FNAT: si_signo 5 si_code 2

> FNAT: trace trap for LWP 101518

> FLWP: fbsd_resume for ptid (-1, 0, 0)

> FNAT: stop for LWP 101518 event 1 flags 0x20

> FNAT: si_signo 5 si_code 1

> FNAT: sw breakpoint trap for LWP 101518

> 

> Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe710)

>     at /usr/src/bin/ls/ls.c:236

> 236		const char *errstr = NULL;

> (gdb) add-inferior 

> [New inferior 2]

> Added inferior 2 on connection 1 (native)

> (gdb) inferior 2

> [Switching to inferior 2 [<null>] (<noexec>)]

> (gdb) file /bin/ls

> Reading symbols from /bin/ls...

> Reading symbols from /usr/lib/debug//bin/ls.debug...

> (gdb) start

> Temporary breakpoint 2 at 0x20430a: -qualified main. (2 locations)

> Starting program: /bin/ls 

> FNAT: stop for LWP 101641 event 1 flags 0x18

> FLWP: using LWP 101641 for first thread

> FLWP: fbsd_resume for ptid (-1, 0, 0)

> FNAT: stop for LWP 101641 event 1 flags 0x20

> FNAT: si_signo 20 si_code 1

> FLWP: fbsd_resume for ptid (-1, 0, 0)

> FNAT: stop for LWP 101641 event 1 flags 0x20

> FNAT: si_signo 20 si_code 1

> FLWP: fbsd_resume for ptid (-1, 0, 0)

> FNAT: stop for LWP 101641 event 1 flags 0x20

> FNAT: si_signo 20 si_code 1

> FLWP: fbsd_resume for ptid (-1, 0, 0)

> FNAT: stop for LWP 101641 event 1 flags 0x24

> FNAT: si_signo 20 si_code 1

> FLWP: fbsd_resume for ptid (-1, 0, 0)

> FNAT: stop for LWP 101641 event 1 flags 0x24

> FNAT: si_signo 20 si_code 1

> FLWP: fbsd_resume for ptid (-1, 0, 0)

> FNAT: stop for LWP 101641 event 1 flags 0x20

> FNAT: si_signo 20 si_code 1

> FLWP: fbsd_resume for ptid (-1, 0, 0)

> FNAT: stop for LWP 101641 event 1 flags 0x18

> FLWP: fbsd_resume for ptid (70453, 101518, 0)

> 

> Program terminated with signal SIGTRAP, Trace/breakpoint trap.

> The program no longer exists.

> 

> Here you can see that the last call to fbsd_resume() used the ptid from

> inferior 1 instead of inferior 2, and inferior 1 didn't discard it's

> pending SIGTRAP but instead was killed by it.)

"set debug infrun 1" will probably reveal what is happening.

Thanks,
Pedro Alves
John Baldwin July 8, 2020, 12:10 a.m. | #13
On 7/7/20 4:16 PM, John Baldwin wrote:
> FNAT: stop for LWP 101641 event 1 flags 0x18

> FLWP: fbsd_resume for ptid (70453, 101518, 0)

> 

> Program terminated with signal SIGTRAP, Trace/breakpoint trap.

> The program no longer exists.

> 

> Here you can see that the last call to fbsd_resume() used the ptid from

> inferior 1 instead of inferior 2, and inferior 1 didn't discard it's

> pending SIGTRAP but instead was killed by it.)


On this part, I turned on 'set debug infrun 1' and can see that
the core is trying to step over the breakpoint in the first
inferior.  However, it is not clearing the SIGTRAP signal to
GDB_SIGNAL_0 when resuming the first inferior but leaving it as
GDB_SIGNAL_TRAP.  This appears to be because start_step_over()
calls keep_going_pass_signal() instead of keep_going(), so
tp->control.stop_signal doesn't get reset.  I'm not sure if
stop_signal should have been cleared earlier somehow?

Hmm, if I add a hack to ignore SIGTRAP in fbsd_nat_target::resume(),
then what happens is that the first inferior runs to completion.
I can run the second inferior after switching to it, but I think the
behavior I probably want is for the core to not resume threads from
the non-current inferior instead?

One last test I tried is to do this sequence instead in the first
inferior:

(gdb) file /bin/ls
(gdb) start
(gdb) n

This gets the first inferior off of the breakpoint and the thread is
stopped for a hardware stepping trap instead.

I can then add an inferior do start and have it work fine, and I
can switch back and forth between the two inferiors stepping each
one.

So I think the issue here is that for some reason when the 'start'
command for inferior 2 tries to resume the inferior after getting
across the initial events for exec, it ends up resuming the wrong
inferior due to confusion about the thread stopped at a breakpoint.
However, it doesn't get confused if the thread is stopped for some
other reason.

-- 
John Baldwin
John Baldwin July 8, 2020, 12:19 a.m. | #14
On 7/7/20 4:53 PM, Pedro Alves wrote:
> On 7/8/20 12:16 AM, John Baldwin wrote:

>> This appears to have broken native debugging on FreeBSD for me in that

>> when I run a process to completion it triggers an assertion failure:

>>

>> (gdb) r

>> Starting program: /bin/echo 

>>

>> Child process unexpectedly missing: No child processes.

>> /home/john/work/git/gdb/gdb/inferior.c:293: internal-error: struct inferior *find_inferior_pid(process_stratum_target *, int): Assertion `pid != 0' failed.

>>

>> I tracked this down to this code in inf_ptrace::wait():

>>

>>       /* Ignore terminated detached child processes.  */

>>       if (!WIFSTOPPED (status) && pid != inferior_ptid.pid ())

>>         pid = -1;

>>

>> At this point, inferior_ptid() is all zeroes and the process

>> has reported a non-stopped exit status (WIFEXITED) so this

>> ignores the exit event and loops back around to call waitpid()

>> again which then fails with ECHILD.

>>

>> Looks like we always clear the inferior thread (and thus

>> inferior_ptid) in do_target_wait_1:

>>

>>   /* We know that we are looking for an event in the target of inferior

>>      INF, but we don't know which thread the event might come from.  As

>>      such we want to make sure that INFERIOR_PTID is reset so that none of

>>      the wait code relies on it - doing so is always a mistake.  */

>>   switch_to_inferior_no_thread (inf);

>>

>> Commenting out the check in inf_ptrace::wait() "fixes" the issue for

>> me on FreeBSD, but I'm not sure that is the right fix.

> 

> linux-nat.c has something logic, here in linux_nat_filter_event:

> 

>   /* Make sure we don't report an event for the exit of an LWP not in

>      our list, i.e. not part of the current process.  This can happen

>      if we detach from a program we originally forked and then it

>      exits.  */

>   if (!WIFSTOPPED (status) && !lp)

>     return NULL;

> 

> For inf-ptrace.c, the right fix should be something around this:

> 

> diff --git c/gdb/inf-ptrace.c w/gdb/inf-ptrace.c

> index d25d226abb..ae0b0f7ff0 100644

> --- c/gdb/inf-ptrace.c

> +++ w/gdb/inf-ptrace.c

> @@ -347,7 +347,7 @@ inf_ptrace_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,

>         }

>  

>        /* Ignore terminated detached child processes.  */

> -      if (!WIFSTOPPED (status) && pid != inferior_ptid.pid ())

> +      if (!WIFSTOPPED (status) && find_inferior_pid (this, pid) == nullptr)

>         pid = -1;

>      }

>    while (pid == -1);


Yes, this works.  I hadn't found that function and was thinking of
writing it from scratch, but this is better of course.
 
> The inferior_ptid reference in the error path above that code, where it

> reads:

> 

>   "Claim it exited with unknown signal"

> 

> is also wrong.  I'm not sure what to do about that one.

> Not clear if that path can really happen.  linux-nat.c calls

> perror_with_name in some places if waitpid returns -1,

> and in the main waitpid call, just assumes that error -1 never

> happens...


It happened for me in this case since wait failed with -1 when we discarded
the exit info incorrectly.  I can change it to return null_ptid which would
match what it does now, or perhaps we could just add an assertion here rather
than relying on the  null_ptid tripping over another assertion later?

>> Here you can see that the last call to fbsd_resume() used the ptid from

>> inferior 1 instead of inferior 2, and inferior 1 didn't discard it's

>> pending SIGTRAP but instead was killed by it.)

> "set debug infrun 1" will probably reveal what is happening.


Yes, I have some more details on this in my other followup where infrun
debugging was very helpful, though I realize I failed to include any of
the infrun debug output.

-- 
John Baldwin
John Baldwin July 8, 2020, 12:34 a.m. | #15
On 7/7/20 5:10 PM, John Baldwin wrote:
> On 7/7/20 4:16 PM, John Baldwin wrote:

>> FNAT: stop for LWP 101641 event 1 flags 0x18

>> FLWP: fbsd_resume for ptid (70453, 101518, 0)

>>

>> Program terminated with signal SIGTRAP, Trace/breakpoint trap.

>> The program no longer exists.

>>

>> Here you can see that the last call to fbsd_resume() used the ptid from

>> inferior 1 instead of inferior 2, and inferior 1 didn't discard it's

>> pending SIGTRAP but instead was killed by it.)

> 

> On this part, I turned on 'set debug infrun 1' and can see that

> the core is trying to step over the breakpoint in the first

> inferior.  However, it is not clearing the SIGTRAP signal to

> GDB_SIGNAL_0 when resuming the first inferior but leaving it as

> GDB_SIGNAL_TRAP.  This appears to be because start_step_over()

> calls keep_going_pass_signal() instead of keep_going(), so

> tp->control.stop_signal doesn't get reset.  I'm not sure if

> stop_signal should have been cleared earlier somehow?

> 

> Hmm, if I add a hack to ignore SIGTRAP in fbsd_nat_target::resume(),

> then what happens is that the first inferior runs to completion.

> I can run the second inferior after switching to it, but I think the

> behavior I probably want is for the core to not resume threads from

> the non-current inferior instead?

> 

> One last test I tried is to do this sequence instead in the first

> inferior:

> 

> (gdb) file /bin/ls

> (gdb) start

> (gdb) n

> 

> This gets the first inferior off of the breakpoint and the thread is

> stopped for a hardware stepping trap instead.

> 

> I can then add an inferior do start and have it work fine, and I

> can switch back and forth between the two inferiors stepping each

> one.

> 

> So I think the issue here is that for some reason when the 'start'

> command for inferior 2 tries to resume the inferior after getting

> across the initial events for exec, it ends up resuming the wrong

> inferior due to confusion about the thread stopped at a breakpoint.

> However, it doesn't get confused if the thread is stopped for some

> other reason.


I had to add a 'fbsd_nat_target::supports_multi_process' method and
now it seems to work fine.

It does depend on the fix to use find_inferior_pid() to work.

-- 
John Baldwin