[04/12] Fix a couple vStopped pending ack bugs

Message ID 20210113011543.2047449-5-pedro@palves.net
State New
Headers show
Series
  • Fix detach + displaced-step regression + N bugs more
Related show

Commit Message

Pedro Alves Jan. 13, 2021, 1:15 a.m.
A following patch will add a testcase that has two processes with
threads stepping over a breakpoint continuously, and then detaches
from one of the processes while threads are running.  The other
process continues stepping over its breakpoint.  And then the testcase
sends a SIGUSR1, expecting that GDB reports it.  That would sometimes
hang against gdbserver, due to the bugs fixed here.  Both bugs are
related, in that they're about remote protocol asynchronous Stop
notifications.  There's a bug in GDB, and another in GDBserver.

The GDB bug:

- when we detach from a process, the remote target discards any
  pending RSP notification related to that process, including the
  in-flight, yet-unacked notification.  Discarding the in-flight
  notification is the problem.  Until the in-flight notification is
  acked with a vStopped packet, the server won't send another %Stop
  notification.  As a result, the debug session gets messed up.  In
  the new testcase's case, GDB would hang inside stop_all_threads,
  waiting for a stop for one of the process'es threads, which never
  arrived -- its stop reply was permanently stuck in the stop reply
  queue, waiting for a vStopped packet that never arrived.

The GDBserver bug:

  GDBserver has the opposite bug.  It also discards notifications for
  the process being detached.  If that discards the head of the
  notification queue, when gdb sends an ack, it ends up acking the
  _next_ notification.  Meaning, gdb loses one notification.  In the
  testcase, this results in a similar hang in stop_all_threads.

So we have two very similar bugs in GDB and GDBserver, both resulting
in a similar symptom.  That's why I'm fixing them both at the same
time.

gdb/ChangeLog:

	* remote.c (remote_notif_stop_ack): Don't error out on
	TARGET_WAITKIND_IGNORE; instead, just ignore the notification.
	(remote_target::discard_pending_stop_replies): Don't delete
	in-flight notification; instead, clear its contents.

gdbserver/ChangeLog:

	* server.cc (discard_queued_stop_replies): Don't ever discard the
	notification at the head of the list.
---
 gdb/remote.c        | 22 +++++++++++++---------
 gdbserver/server.cc |  9 +++++++++
 2 files changed, 22 insertions(+), 9 deletions(-)

-- 
2.26.2

Comments

Lancelot SIX via Gdb-patches Jan. 13, 2021, 5:29 a.m. | #1
On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> A following patch will add a testcase that has two processes with

> threads stepping over a breakpoint continuously, and then detaches

> from one of the processes while threads are running.  The other

> process continues stepping over its breakpoint.  And then the testcase

> sends a SIGUSR1, expecting that GDB reports it.  That would sometimes

> hang against gdbserver, due to the bugs fixed here.  Both bugs are

> related, in that they're about remote protocol asynchronous Stop

> notifications.  There's a bug in GDB, and another in GDBserver.

> 

> The GDB bug:

> 

> - when we detach from a process, the remote target discards any

>   pending RSP notification related to that process, including the

>   in-flight, yet-unacked notification.  Discarding the in-flight

>   notification is the problem.  Until the in-flight notification is

>   acked with a vStopped packet, the server won't send another %Stop

>   notification.  As a result, the debug session gets messed up.  In

>   the new testcase's case, GDB would hang inside stop_all_threads,

>   waiting for a stop for one of the process'es threads, which never

>   arrived -- its stop reply was permanently stuck in the stop reply

>   queue, waiting for a vStopped packet that never arrived.


Just to be sure I understand does that sum it up?

1. GDBserver sends stop notification about thread X, the remote target
   receives it and stores it
2. At the same time, GDB detaches thread X's inferior
3. The remote target discards the received stop notification
4. GDBserver waits forever for the ack


> The GDBserver bug:

> 

>   GDBserver has the opposite bug.  It also discards notifications for

>   the process being detached.  If that discards the head of the

>   notification queue, when gdb sends an ack, it ends up acking the

>   _next_ notification.  Meaning, gdb loses one notification.  In the

>   testcase, this results in a similar hang in stop_all_threads.

> 

> So we have two very similar bugs in GDB and GDBserver, both resulting

> in a similar symptom.  That's why I'm fixing them both at the same

> time.

> 

> gdb/ChangeLog:

> 

> 	* remote.c (remote_notif_stop_ack): Don't error out on

> 	TARGET_WAITKIND_IGNORE; instead, just ignore the notification.

> 	(remote_target::discard_pending_stop_replies): Don't delete

> 	in-flight notification; instead, clear its contents.

> 

> gdbserver/ChangeLog:

> 

> 	* server.cc (discard_queued_stop_replies): Don't ever discard the

> 	notification at the head of the list.

> ---

>  gdb/remote.c        | 22 +++++++++++++---------

>  gdbserver/server.cc |  9 +++++++++

>  2 files changed, 22 insertions(+), 9 deletions(-)

> 

> diff --git a/gdb/remote.c b/gdb/remote.c

> index 2f15c4c1f32..7199e0ac422 100644

> --- a/gdb/remote.c

> +++ b/gdb/remote.c

> @@ -6935,13 +6935,11 @@ remote_notif_stop_ack (remote_target *remote,

>    /* acknowledge */

>    putpkt (remote, self->ack_command);

>  

> -  if (stop_reply->ws.kind == TARGET_WAITKIND_IGNORE)

> -    {

> -      /* We got an unknown stop reply.  */

> -      error (_("Unknown stop reply"));

> -    }

> -

> -  remote->push_stop_reply (stop_reply);

> +  /* Kind can be TARGET_WAITKIND_IGNORE if we have meanwhile discarded

> +     the notification.  It was left in the queue because we need to

> +     acknowledge it and pull the rest of the notifications out.  */

> +  if (stop_reply->ws.kind != TARGET_WAITKIND_IGNORE)

> +    remote->push_stop_reply (stop_reply);

>  }

>  

>  static int

> @@ -7110,8 +7108,14 @@ remote_target::discard_pending_stop_replies (struct inferior *inf)

>    /* Discard the in-flight notification.  */

>    if (reply != NULL && reply->ptid.pid () == inf->pid)

>      {

> -      delete reply;

> -      rns->pending_event[notif_client_stop.id] = NULL;

> +      /* Leave the notification pending, since the server expects that

> +	 we acknowledge it with vStopped.  But clear its contents, so

> +	 that later on when we acknowledge it, we also discard it.  */

> +      reply->ws.kind = TARGET_WAITKIND_IGNORE;

> +

> +      if (remote_debug)

> +	fprintf_unfiltered (gdb_stdlog,

> +			    "discarded in-flight notification\n");

>      }

>  

>    /* Discard the stop replies we have already pulled with

> diff --git a/gdbserver/server.cc b/gdbserver/server.cc

> index 77e89fe6ed0..a5497e93cee 100644

> --- a/gdbserver/server.cc

> +++ b/gdbserver/server.cc

> @@ -203,6 +203,15 @@ discard_queued_stop_replies (ptid_t ptid)

>        next = iter;

>        ++next;

>  

> +      if (iter == notif_stop.queue.begin ())

> +	{

> +	  /* The head of the list contains the notification that was

> +	     already sent to GDB.  So we can't remove it, otherwise

> +	     when GDB sends the vStopped, it would ack the _next_

> +	     notification, which hadn't been sent yet!  */

> +	  continue;

> +	}

> +


Since the first item in the list has a special status (it is the in
flight one), perhaps having a separate field in notif_server for it
would make things clearer.

I mean having an additional

  notif_event *in_flight;

field in there.  When sending a notification, it gets dequeu-ed from
the queue and put there.

Either way, this LGTM.

Simon
Pedro Alves Feb. 3, 2021, 1:25 a.m. | #2
On 13/01/21 05:29, Simon Marchi wrote:
> On 2021-01-12 8:15 p.m., Pedro Alves wrote:


> Just to be sure I understand does that sum it up?

> 

> 1. GDBserver sends stop notification about thread X, the remote target

>    receives it and stores it

> 2. At the same time, GDB detaches thread X's inferior

> 3. The remote target discards the received stop notification

> 4. GDBserver waits forever for the ack


Yup.  Thanks for that summary.  I've borrowed it and put it in the commit log.

> Since the first item in the list has a special status (it is the in

> flight one), perhaps having a separate field in notif_server for it

> would make things clearer.

> 

> I mean having an additional

> 

>   notif_event *in_flight;

> 

> field in there.  When sending a notification, it gets dequeu-ed from

> the queue and put there.


It's likely yes, though I'd prefer that was done separately, as it has
risk of messing up something else.

> 

> Either way, this LGTM.


Thanks.  I've merged it.

Patch

diff --git a/gdb/remote.c b/gdb/remote.c
index 2f15c4c1f32..7199e0ac422 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -6935,13 +6935,11 @@  remote_notif_stop_ack (remote_target *remote,
   /* acknowledge */
   putpkt (remote, self->ack_command);
 
-  if (stop_reply->ws.kind == TARGET_WAITKIND_IGNORE)
-    {
-      /* We got an unknown stop reply.  */
-      error (_("Unknown stop reply"));
-    }
-
-  remote->push_stop_reply (stop_reply);
+  /* Kind can be TARGET_WAITKIND_IGNORE if we have meanwhile discarded
+     the notification.  It was left in the queue because we need to
+     acknowledge it and pull the rest of the notifications out.  */
+  if (stop_reply->ws.kind != TARGET_WAITKIND_IGNORE)
+    remote->push_stop_reply (stop_reply);
 }
 
 static int
@@ -7110,8 +7108,14 @@  remote_target::discard_pending_stop_replies (struct inferior *inf)
   /* Discard the in-flight notification.  */
   if (reply != NULL && reply->ptid.pid () == inf->pid)
     {
-      delete reply;
-      rns->pending_event[notif_client_stop.id] = NULL;
+      /* Leave the notification pending, since the server expects that
+	 we acknowledge it with vStopped.  But clear its contents, so
+	 that later on when we acknowledge it, we also discard it.  */
+      reply->ws.kind = TARGET_WAITKIND_IGNORE;
+
+      if (remote_debug)
+	fprintf_unfiltered (gdb_stdlog,
+			    "discarded in-flight notification\n");
     }
 
   /* Discard the stop replies we have already pulled with
diff --git a/gdbserver/server.cc b/gdbserver/server.cc
index 77e89fe6ed0..a5497e93cee 100644
--- a/gdbserver/server.cc
+++ b/gdbserver/server.cc
@@ -203,6 +203,15 @@  discard_queued_stop_replies (ptid_t ptid)
       next = iter;
       ++next;
 
+      if (iter == notif_stop.queue.begin ())
+	{
+	  /* The head of the list contains the notification that was
+	     already sent to GDB.  So we can't remove it, otherwise
+	     when GDB sends the vStopped, it would ack the _next_
+	     notification, which hadn't been sent yet!  */
+	  continue;
+	}
+
       if (remove_all_on_match_ptid (*iter, ptid))
 	{
 	  delete *iter;