[v5,3/5] gdb/remote: do not delete a thread if it has a pending event

Message ID 04e28369e2e3486bbf4dfa8154c140be317ab1e9.1586187408.git.tankut.baris.aktemur@intel.com
State New
Headers show
Series
  • Handling already-exited threads in 'stop_all_threads'
Related show

Commit Message

Keith Seitz via Gdb-patches April 6, 2020, 3:45 p.m.
gdb/ChangeLog:
2020-04-06  Tankut Baris Aktemur  <tankut.baris.aktemur@intel.com>

	* remote.c (remote_target::update_thread_list): Do not delete
	a thread if it has a pending event.
---
 gdb/remote.c | 3 +++
 1 file changed, 3 insertions(+)

-- 
2.17.1

Comments

Keith Seitz via Gdb-patches April 16, 2020, 4:24 p.m. | #1
On 4/6/20 4:45 PM, Tankut Baris Aktemur via Gdb-patches wrote:
> gdb/ChangeLog:

> 2020-04-06  Tankut Baris Aktemur  <tankut.baris.aktemur@intel.com>

> 

> 	* remote.c (remote_target::update_thread_list): Do not delete

> 	a thread if it has a pending event.

> ---

>  gdb/remote.c | 3 +++

>  1 file changed, 3 insertions(+)

> 

> diff --git a/gdb/remote.c b/gdb/remote.c

> index bfbc0bc21d3..12ac7cb9862 100644

> --- a/gdb/remote.c

> +++ b/gdb/remote.c

> @@ -3821,6 +3821,9 @@ remote_target::update_thread_list ()

>  	  if (tp->inf->process_target () != this)

>  	    continue;

>  

> +	  if (tp->suspend.waitstatus_pending_p)

> +	    continue;

> +


This patch makes me nervous.  :-/

Why doesn't prune_threads, via remote_target::thread_alive end
up removing the thread anyway?

I think it'd be better to extend the check to also check for
TARGET_WAITKIND_EXITED/TARGET_WAITKIND_SIGNALLED.  Let threads
with other kinds of events pending be deleted.

Other targets are likely to need something like this, but I'm
OK with going with remote-specific test to start with, so we
can all this in, including the testcase, and then work on
something more generic if we find a need.

In any case, please add some comments.

Thanks,
Pedro Alves
Keith Seitz via Gdb-patches April 20, 2020, 8:35 p.m. | #2
On Thursday, April 16, 2020 6:24 PM, Pedro Alves wrote:
> On 4/6/20 4:45 PM, Tankut Baris Aktemur via Gdb-patches wrote:

> > gdb/ChangeLog:

> > 2020-04-06  Tankut Baris Aktemur  <tankut.baris.aktemur@intel.com>

> >

> > 	* remote.c (remote_target::update_thread_list): Do not delete

> > 	a thread if it has a pending event.

> > ---

> >  gdb/remote.c | 3 +++

> >  1 file changed, 3 insertions(+)

> >

> > diff --git a/gdb/remote.c b/gdb/remote.c

> > index bfbc0bc21d3..12ac7cb9862 100644

> > --- a/gdb/remote.c

> > +++ b/gdb/remote.c

> > @@ -3821,6 +3821,9 @@ remote_target::update_thread_list ()

> >  	  if (tp->inf->process_target () != this)

> >  	    continue;

> >

> > +	  if (tp->suspend.waitstatus_pending_p)

> > +	    continue;

> > +

> 

> This patch makes me nervous.  :-/

> 

> Why doesn't prune_threads, via remote_target::thread_alive end

> up removing the thread anyway?


As far as I see, prune_threads is called only if gdbserver does not
send a thread list.  The comment in the function suggests that this
could be the case for older servers.  I'm not sure how old it should
be, though, for me to be able to test it.
 
> I think it'd be better to extend the check to also check for

> TARGET_WAITKIND_EXITED/TARGET_WAITKIND_SIGNALLED.  Let threads

> with other kinds of events pending be deleted.

> 

> Other targets are likely to need something like this, but I'm

> OK with going with remote-specific test to start with, so we

> can all this in, including the testcase, and then work on

> something more generic if we find a need.

> 

> In any case, please add some comments.


First I did this change, but when I started working on Patch #5, interesting
things have come up.

Having both of the inferiors on a single extended-remote target is supposed
to trigger the same stop-all-threads scenario that this series attempts to fix.
But it turned out to be slightly different.

The 'update_thread_list' at the beginning of the stop_all_threads pass
removes all the threads because the remote target sends an empty list of threads
(well both inferiors exited and no thread remained).  So, before we even attempt
to stop threads, they are already gone when the list is updated.  The problem is,
inferior 2 (i.e. the inferior whose thread we would attempt to stop but would
receive an EXITED waitstatus for) is then left in a seemingly live state (pid != 0),
but with no threads.  Hence, we cannot even switch to it after inferior 1's
exit event is shown.

Therefore I updated this patch as below.  Although no regression was detected,
I don't feel much safe with this new version.  Comments welcome.

Thanks,
-Baris

From f9c9a7c1bb15cb99925744e25026bd2b09b74194 Mon Sep 17 00:00:00 2001
From: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com>

Date: Mon, 6 Apr 2020 13:54:35 +0200
Subject: [PATCH 3/5] gdb/remote: do not delete a thread if this could leave
 its inferior empty

gdb/ChangeLog:
2020-04-06  Tankut Baris Aktemur  <tankut.baris.aktemur@intel.com>

        * remote.c (remote_target::update_thread_list): Do not delete
        a thread if this could leave its inferior empty.
---
 gdb/remote.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/gdb/remote.c b/gdb/remote.c
index 5db406e045c..681957c012b 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -3822,6 +3822,18 @@ remote_target::update_thread_list ()
          if (tp->inf->process_target () != this)
            continue;

+         /* Do not delete the thread if it belongs to another inferior
+            that has only one non-exited thread, because otherwise we
+            would end up with a seemingly live inferior with no threads.  */
+         if (tp->inf != current_inferior ())
+           {
+             int num_threads = 0;
+             for (thread_info *t ATTRIBUTE_UNUSED : tp->inf->non_exited_threads ())
+               num_threads++;
+             if (num_threads == 1)
+               continue;
+           }
+
          if (!context.contains_thread (tp->ptid))
            {
              /* Not found.  */
--
2.17.1

 


Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
Keith Seitz via Gdb-patches April 21, 2020, 6:54 p.m. | #3
On 4/20/20 9:35 PM, Aktemur, Tankut Baris wrote:
> On Thursday, April 16, 2020 6:24 PM, Pedro Alves wrote:

>> On 4/6/20 4:45 PM, Tankut Baris Aktemur via Gdb-patches wrote:

>>> gdb/ChangeLog:

>>> 2020-04-06  Tankut Baris Aktemur  <tankut.baris.aktemur@intel.com>

>>>

>>> 	* remote.c (remote_target::update_thread_list): Do not delete

>>> 	a thread if it has a pending event.

>>> ---

>>>  gdb/remote.c | 3 +++

>>>  1 file changed, 3 insertions(+)

>>>

>>> diff --git a/gdb/remote.c b/gdb/remote.c

>>> index bfbc0bc21d3..12ac7cb9862 100644

>>> --- a/gdb/remote.c

>>> +++ b/gdb/remote.c

>>> @@ -3821,6 +3821,9 @@ remote_target::update_thread_list ()

>>>  	  if (tp->inf->process_target () != this)

>>>  	    continue;

>>>

>>> +	  if (tp->suspend.waitstatus_pending_p)

>>> +	    continue;

>>> +

>>

>> This patch makes me nervous.  :-/

>>

>> Why doesn't prune_threads, via remote_target::thread_alive end

>> up removing the thread anyway?

> 

> As far as I see, prune_threads is called only if gdbserver does not

> send a thread list.  The comment in the function suggests that this

> could be the case for older servers.  I'm not sure how old it should

> be, though, for me to be able to test it.

>  

>> I think it'd be better to extend the check to also check for

>> TARGET_WAITKIND_EXITED/TARGET_WAITKIND_SIGNALLED.  Let threads

>> with other kinds of events pending be deleted.

>>

>> Other targets are likely to need something like this, but I'm

>> OK with going with remote-specific test to start with, so we

>> can all this in, including the testcase, and then work on

>> something more generic if we find a need.

>>

>> In any case, please add some comments.

> 

> First I did this change, but when I started working on Patch #5, interesting

> things have come up.

> 

> Having both of the inferiors on a single extended-remote target is supposed

> to trigger the same stop-all-threads scenario that this series attempts to fix.

> But it turned out to be slightly different.

> 

> The 'update_thread_list' at the beginning of the stop_all_threads pass

> removes all the threads because the remote target sends an empty list of threads

> (well both inferiors exited and no thread remained).  So, before we even attempt

> to stop threads, they are already gone when the list is updated.  The problem is,

> inferior 2 (i.e. the inferior whose thread we would attempt to stop but would

> receive an EXITED waitstatus for) is then left in a seemingly live state (pid != 0),

> but with no threads.  Hence, we cannot even switch to it after inferior 1's

> exit event is shown.

> 

> Therefore I updated this patch as below.  Although no regression was detected,

> I don't feel much safe with this new version.  Comments welcome.


Hmm.  The current_inferior() check seems like a red flag, IMO.
Why is that needed?


How does it happen that all threads are deleted?  I've seen
this before, but I'll need to refresh my memory.  Let's see.

We have this in gdbserver/linux-low.cc:

/* Return true if LWP has exited already, and has a pending exit event
   to report to GDB.  */

static int
lwp_is_marked_dead (struct lwp_info *lwp)
{
  return (lwp->status_pending_p
	  && (WIFEXITED (lwp->status_pending)
	      || WIFSIGNALED (lwp->status_pending)));
}

/* Return true if the given thread is still alive.  */

bool
linux_process_target::thread_alive (ptid_t ptid)
{
  struct lwp_info *lwp = find_lwp_pid (ptid);

  /* We assume we always know if a thread exits.  If a whole process
     exited but we still haven't been able to report it to GDB, we'll
     hold on to the last lwp of the dead process.  */
  if (lwp != NULL)
    return !lwp_is_marked_dead (lwp);
  else
    return 0;
}

If gdbserver deleted all threads, it seems like that would
mean that the linux-low.cc backend has already reported the
pending exit event to gdbserver's core, and the process exit event
is thus guaranteed to already be in transit, somewhere, maybe
queued in gdbserver/server.cc's notif_stop queue, or maybe even
already in gdb, but queued in gdb/remote.c's stop_reply_queue queue,
or at whatever other level we queue events.

As I mentioned, I've ran into something like this before.
See infrun.c:handle_no_resumed where it reads:

  /* Note however that we may find no resumed thread because the whole
     process exited meanwhile (thus updating the thread list results
     in an empty thread list).  In this case we know we'll be getting
     a process exit event shortly.  */

Maybe we should make sure that stop_all_threads ends up in wait_one
if the inferior has a pid != 0, and no threads?

If we still need to keep one thread in the inferior's thread list,
it feels like that should be handled by common code, like e.g.,
from within delete_thread.  

But if we do that, then we don't need to change stop_all_threads.
And that bit I quoted in handle_no_resumed becomes dead, and
seems like could trick gdb into reporting a TARGET_WAITKIND_NO_RESUMED
to the user.  :-/

It also feels like we're missing a thread state in enum thread_state,
like a THREAD_ZOMBIE state, for a thread that has exited, but that
should still be listed, and only exists because it needs to be waited
on to consume the exit event.  We can probably do without it, though
I've been having this feeling for a long while.  Maybe
THREAD_EXITED + pending status would be the same thing.

Or maybe we should be able to select an inferior and resume it
even if it has no threads but still has pid != 0, which indicates
that the inferior has execution and thus has a pending exit wait
status to collect.  That would go along with storing that
pending exit waitstatus on a separate list, instead of storing
it on a thread.  That might be tougher to hack in.  Not sure it's
worth it if we can do without it.  All the continue/resume code
in infcmd.c/infrun.c assumes there's a current thread.

Thanks,
Pedro Alves
Keith Seitz via Gdb-patches April 22, 2020, 2:57 p.m. | #4
On Tuesday, April 21, 2020 8:54 PM, Pedro Alves wrote:
> On 4/20/20 9:35 PM, Aktemur, Tankut Baris wrote:

> > On Thursday, April 16, 2020 6:24 PM, Pedro Alves wrote:

> >> On 4/6/20 4:45 PM, Tankut Baris Aktemur via Gdb-patches wrote:

> >>> gdb/ChangeLog:

> >>> 2020-04-06  Tankut Baris Aktemur  <tankut.baris.aktemur@intel.com>

> >>>

> >>> 	* remote.c (remote_target::update_thread_list): Do not delete

> >>> 	a thread if it has a pending event.

> >>> ---

> >>>  gdb/remote.c | 3 +++

> >>>  1 file changed, 3 insertions(+)

> >>>

> >>> diff --git a/gdb/remote.c b/gdb/remote.c

> >>> index bfbc0bc21d3..12ac7cb9862 100644

> >>> --- a/gdb/remote.c

> >>> +++ b/gdb/remote.c

> >>> @@ -3821,6 +3821,9 @@ remote_target::update_thread_list ()

> >>>  	  if (tp->inf->process_target () != this)

> >>>  	    continue;

> >>>

> >>> +	  if (tp->suspend.waitstatus_pending_p)

> >>> +	    continue;

> >>> +

> >>

> >> This patch makes me nervous.  :-/

> >>

> >> Why doesn't prune_threads, via remote_target::thread_alive end

> >> up removing the thread anyway?

> >

> > As far as I see, prune_threads is called only if gdbserver does not

> > send a thread list.  The comment in the function suggests that this

> > could be the case for older servers.  I'm not sure how old it should

> > be, though, for me to be able to test it.

> >

> >> I think it'd be better to extend the check to also check for

> >> TARGET_WAITKIND_EXITED/TARGET_WAITKIND_SIGNALLED.  Let threads

> >> with other kinds of events pending be deleted.

> >>

> >> Other targets are likely to need something like this, but I'm

> >> OK with going with remote-specific test to start with, so we

> >> can all this in, including the testcase, and then work on

> >> something more generic if we find a need.

> >>

> >> In any case, please add some comments.

> >

> > First I did this change, but when I started working on Patch #5, interesting

> > things have come up.


So, I've converted this patch to your previous suggestion of also checking
TARGET_WAITKIND_EXITED/TARGET_WAITKIND_SIGNALLED.  Below are more details.
I'll submit the revised version shortly.

> >

> > Having both of the inferiors on a single extended-remote target is supposed

> > to trigger the same stop-all-threads scenario that this series attempts to fix.

> > But it turned out to be slightly different.

> >

> > The 'update_thread_list' at the beginning of the stop_all_threads pass

> > removes all the threads because the remote target sends an empty list of threads

> > (well both inferiors exited and no thread remained).  So, before we even attempt

> > to stop threads, they are already gone when the list is updated.  The problem is,

> > inferior 2 (i.e. the inferior whose thread we would attempt to stop but would

> > receive an EXITED waitstatus for) is then left in a seemingly live state (pid != 0),

> > but with no threads.  Hence, we cannot even switch to it after inferior 1's

> > exit event is shown.

> >

> > Therefore I updated this patch as below.  Although no regression was detected,

> > I don't feel much safe with this new version.  Comments welcome.

> 

> Hmm.  The current_inferior() check seems like a red flag, IMO.

> Why is that needed?


That was an attempt to constrain the potential side effects, but I've reverted
back to checking a pending exit event.
 
> How does it happen that all threads are deleted?  I've seen

> this before, but I'll need to refresh my memory.  Let's see.

> 

> We have this in gdbserver/linux-low.cc:

> 

> /* Return true if LWP has exited already, and has a pending exit event

>    to report to GDB.  */

> 

> static int

> lwp_is_marked_dead (struct lwp_info *lwp)

> {

>   return (lwp->status_pending_p

> 	  && (WIFEXITED (lwp->status_pending)

> 	      || WIFSIGNALED (lwp->status_pending)));

> }

> 

> /* Return true if the given thread is still alive.  */

> 

> bool

> linux_process_target::thread_alive (ptid_t ptid)

> {

>   struct lwp_info *lwp = find_lwp_pid (ptid);

> 

>   /* We assume we always know if a thread exits.  If a whole process

>      exited but we still haven't been able to report it to GDB, we'll

>      hold on to the last lwp of the dead process.  */

>   if (lwp != NULL)

>     return !lwp_is_marked_dead (lwp);

>   else

>     return 0;

> }

> 

> If gdbserver deleted all threads, it seems like that would

> mean that the linux-low.cc backend has already reported the

> pending exit event to gdbserver's core, and the process exit event

> is thus guaranteed to already be in transit, somewhere, maybe

> queued in gdbserver/server.cc's notif_stop queue, or maybe even

> already in gdb, but queued in gdb/remote.c's stop_reply_queue queue,

> or at whatever other level we queue events.


Based on the tests I made, I can see that gdbserver has sent two exit events
(one per each process) to GDB.  GDB is processing the first one.  So, yes,
the second must be somewhere in the queue.

> As I mentioned, I've ran into something like this before.

> See infrun.c:handle_no_resumed where it reads:

> 

>   /* Note however that we may find no resumed thread because the whole

>      process exited meanwhile (thus updating the thread list results

>      in an empty thread list).  In this case we know we'll be getting

>      a process exit event shortly.  */

> 

> Maybe we should make sure that stop_all_threads ends up in wait_one

> if the inferior has a pid != 0, and no threads?


This will be my proposal.  Here is the flow:

The exit event is somewhere in the event queue.  Thus, we end up with
an inferior with pid != 0 and no threads after update_thread_list.  We 
silently add a thread to this inferior in stop_all_threads, so that the
iteration over threads calls wait_one, picks the exit event, and attaches
it to the thread.

Now that the exit event has been picked from the event queue and attached to
a thread as a pending event, the check in remote.c's update_thread_list
does not delete it the next time we update threads (i.e. the next pass).

When "adding a new thread silently", there are two cases to consider:
delete_thread() may have not physically deleted the thread object, but only
marked is as THREAD_EXITED.  In this case, we "resurrect" the thread and make
it non-exited to be picked up by the loop that iterates over threads.
If it's gone completely, we simply add_thread_silently().

A side note:
update_thread_list() updates the list for the current target.  I think
it should have updated the list for all targets.  In v7, I've made this
change.  In fact, the different behavior that I had seen before between
using a single remote target vs. two remote targets was due to this.  When
using two targets, GDB was updating only the target of the first-exiting
inferior.  So, the initial call to update_thread_list was not deleting the
thread of the second inferior.  When using a single target, the thread of
the second inferior was being deleted, because it shares the same target with
the first inferior, and this had led to the pid != 0 and no threads case.
 
> If we still need to keep one thread in the inferior's thread list,

> it feels like that should be handled by common code, like e.g.,

> from within delete_thread.

> 

> But if we do that, then we don't need to change stop_all_threads.

> And that bit I quoted in handle_no_resumed becomes dead, and

> seems like could trick gdb into reporting a TARGET_WAITKIND_NO_RESUMED

> to the user.  :-/

> 

> It also feels like we're missing a thread state in enum thread_state,

> like a THREAD_ZOMBIE state, for a thread that has exited, but that

> should still be listed, and only exists because it needs to be waited

> on to consume the exit event.  We can probably do without it, though

> I've been having this feeling for a long while.  Maybe

> THREAD_EXITED + pending status would be the same thing.

> 

> Or maybe we should be able to select an inferior and resume it

> even if it has no threads but still has pid != 0, which indicates

> that the inferior has execution and thus has a pending exit wait

> status to collect.  That would go along with storing that

> pending exit waitstatus on a separate list, instead of storing

> it on a thread.  That might be tougher to hack in.  Not sure it's

> worth it if we can do without it.  All the continue/resume code

> in infcmd.c/infrun.c assumes there's a current thread.


I'm afraid introducing a per-inferior event list or the THREAD_ZOMBIE
state will be intrusive.  I hope v7 will fulfill the "if we can do without
it" case.

> 

> Thanks,

> Pedro Alves


Regards.
-Baris


Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

Patch

diff --git a/gdb/remote.c b/gdb/remote.c
index bfbc0bc21d3..12ac7cb9862 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -3821,6 +3821,9 @@  remote_target::update_thread_list ()
 	  if (tp->inf->process_target () != this)
 	    continue;
 
+	  if (tp->suspend.waitstatus_pending_p)
+	    continue;
+
 	  if (!context.contains_thread (tp->ptid))
 	    {
 	      /* Not found.  */