[01/12] Fix attaching in non-stop mode

Message ID 20210113011543.2047449-2-pedro@palves.net
State New
Headers show
Series
  • Fix detach + displaced-step regression + N bugs more
Related show

Commit Message

Pedro Alves Jan. 13, 2021, 1:15 a.m.
Attaching in non-stop mode currently misbehaves, like so:

 (gdb) attach 1244450
 Attaching to process 1244450
 [New LWP 1244453]
 [New LWP 1244454]
 [New LWP 1244455]
 [New LWP 1244456]
 [New LWP 1244457]
 [New LWP 1244458]
 [New LWP 1244459]
 [New LWP 1244461]
 [New LWP 1244462]
 [New LWP 1244463]
 No unwaited-for children left.

At this point, GDB's stopped/running thread state is out of sync with
the inferior:

(gdb) info threads
  Id   Target Id                     Frame
* 1    LWP 1244450 "attach-non-stop" 0xf1b443bf in ?? ()
  2    LWP 1244453 "attach-non-stop" (running)
  3    LWP 1244454 "attach-non-stop" (running)
  4    LWP 1244455 "attach-non-stop" (running)
  5    LWP 1244456 "attach-non-stop" (running)
  6    LWP 1244457 "attach-non-stop" (running)
  7    LWP 1244458 "attach-non-stop" (running)
  8    LWP 1244459 "attach-non-stop" (running)
  9    LWP 1244461 "attach-non-stop" (running)
  10   LWP 1244462 "attach-non-stop" (running)
  11   LWP 1244463 "attach-non-stop" (running)
(gdb)
(gdb) interrupt -a
(gdb)
*nothing*

The problem is that attaching installs an inferior continuation,
called when the target reports the initial attach stop, here, in
inf-loop.c:inferior_event_handler:

      /* Do all continuations associated with the whole inferior (not
	 a particular thread).  */
      if (inferior_ptid != null_ptid)
	do_all_inferior_continuations (0);

However, currently in non-stop mode, inferior_ptid is still null_ptid
when we get here.

If you try to do "set debug infrun 1" to debug the problem, however,
then the attach completes correctly, with GDB reporting a stop for
each thread.

The bug is that we're missing a switch_to_thread/context_switch call
when handling the initial stop, here:

  if (stop_soon == STOP_QUIETLY_NO_SIGSTOP
      && (ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_STOP
	  || ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_TRAP
	  || ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_0))
    {
      stop_print_frame = true;
      stop_waiting (ecs);
      ecs->event_thread->suspend.stop_signal = GDB_SIGNAL_0;
      return;
    }

Note how the STOP_QUIETLY / STOP_QUIETLY_REMOTE case above that does
call context_switch.

And the reason "set debug infrun 1" "fixes" it, is that the debug path
has a switch_to_thread call.

This patch fixes it by moving the main context_switch call earlier.

A testcase exercising this will be added in a following patch.

gdb/ChangeLog:

	* infrun.c (handle_signal_stop): Move main context_switch call
	earlier, before STOP_QUIETLY_NO_SIGSTOP.
---
 gdb/infrun.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

-- 
2.26.2

Comments

Carl Love via Gdb-patches Jan. 13, 2021, 3:11 a.m. | #1
On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> Attaching in non-stop mode currently misbehaves, like so:

> 

>  (gdb) attach 1244450

>  Attaching to process 1244450

>  [New LWP 1244453]

>  [New LWP 1244454]

>  [New LWP 1244455]

>  [New LWP 1244456]

>  [New LWP 1244457]

>  [New LWP 1244458]

>  [New LWP 1244459]

>  [New LWP 1244461]

>  [New LWP 1244462]

>  [New LWP 1244463]

>  No unwaited-for children left.

> 

> At this point, GDB's stopped/running thread state is out of sync with

> the inferior:

> 

> (gdb) info threads

>   Id   Target Id                     Frame

> * 1    LWP 1244450 "attach-non-stop" 0xf1b443bf in ?? ()

>   2    LWP 1244453 "attach-non-stop" (running)

>   3    LWP 1244454 "attach-non-stop" (running)

>   4    LWP 1244455 "attach-non-stop" (running)

>   5    LWP 1244456 "attach-non-stop" (running)

>   6    LWP 1244457 "attach-non-stop" (running)

>   7    LWP 1244458 "attach-non-stop" (running)

>   8    LWP 1244459 "attach-non-stop" (running)

>   9    LWP 1244461 "attach-non-stop" (running)

>   10   LWP 1244462 "attach-non-stop" (running)

>   11   LWP 1244463 "attach-non-stop" (running)

> (gdb)

> (gdb) interrupt -a

> (gdb)

> *nothing*

> 

> The problem is that attaching installs an inferior continuation,

> called when the target reports the initial attach stop, here, in

> inf-loop.c:inferior_event_handler:

> 

>       /* Do all continuations associated with the whole inferior (not

> 	 a particular thread).  */

>       if (inferior_ptid != null_ptid)

> 	do_all_inferior_continuations (0);

> 

> However, currently in non-stop mode, inferior_ptid is still null_ptid

> when we get here.

> 

> If you try to do "set debug infrun 1" to debug the problem, however,

> then the attach completes correctly, with GDB reporting a stop for

> each thread.

> 

> The bug is that we're missing a switch_to_thread/context_switch call

> when handling the initial stop, here:

> 

>   if (stop_soon == STOP_QUIETLY_NO_SIGSTOP

>       && (ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_STOP

> 	  || ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_TRAP

> 	  || ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_0))

>     {

>       stop_print_frame = true;

>       stop_waiting (ecs);

>       ecs->event_thread->suspend.stop_signal = GDB_SIGNAL_0;

>       return;

>     }

> 

> Note how the STOP_QUIETLY / STOP_QUIETLY_REMOTE case above that does

> call context_switch.

> 

> And the reason "set debug infrun 1" "fixes" it, is that the debug path

> has a switch_to_thread call.

> 

> This patch fixes it by moving the main context_switch call earlier.

> 

> A testcase exercising this will be added in a following patch.

> 

> gdb/ChangeLog:

> 

> 	* infrun.c (handle_signal_stop): Move main context_switch call

> 	earlier, before STOP_QUIETLY_NO_SIGSTOP.


This looks like this bug, if you want to annotate the ChangeLog entry:

https://sourceware.org/bugzilla/show_bug.cgi?id=27055

> ---

>  gdb/infrun.c | 27 ++++++++++++---------------

>  1 file changed, 12 insertions(+), 15 deletions(-)

> 

> diff --git a/gdb/infrun.c b/gdb/infrun.c

> index 45bedf89641..b54baf70994 100644

> --- a/gdb/infrun.c

> +++ b/gdb/infrun.c

> @@ -5735,13 +5735,23 @@ handle_signal_stop (struct execution_control_state *ecs)

>    ecs->event_thread->suspend.stop_pc

>      = regcache_read_pc (get_thread_regcache (ecs->event_thread));

>  

> +  /* See if something interesting happened to the non-current thread.

> +     If so, then switch to that thread.  */

> +  if (ecs->ptid != inferior_ptid)

> +    {

> +      infrun_debug_printf ("context switch");

> +

> +      context_switch (ecs);

> +

> +      if (deprecated_context_hook)

> +	deprecated_context_hook (ecs->event_thread->global_num);

> +    }


This condition (ecs->ptid != inferior_ptid) is half the condition that is
already in context_switch.  Could we just call context_switch unconditionally
here?  The way I understand it is that context_switch makes sure the event
thread is the current one, so it's idempotent.  Each condition we can remove
from this code makes it slightly easier to understand.

I checked deprecated_context_hook, it's only used in insight.  And all it does
is set an int:

  https://sourceware.org/git/?p=insight.git;a=blob;f=gdbtk/generic/gdbtk-hooks.c;h=0aaf404995a3e08aca1e2921d455afa7c8a438ea;hb=HEAD#l783

So it won't care if we call it when the current thread hasn't actually changed,
it's idempotent.

For fun I checked a but further, that variable is then used to drive the
"gdb_context_id" tcl variable.  When I grep in insight, I find no use of that
variable.  Is it maybe a variable that people could use in tcl scripting?  If
not, it just seems that all this could be removed...

Anyway, that LGTM in any case.

Simon
Pedro Alves Feb. 3, 2021, 1:21 a.m. | #2
On 13/01/21 03:11, Simon Marchi wrote:

> This condition (ecs->ptid != inferior_ptid) is half the condition that is

> already in context_switch.  Could we just call context_switch unconditionally

> here?  The way I understand it is that context_switch makes sure the event

> thread is the current one, so it's idempotent.  Each condition we can remove

> from this code makes it slightly easier to understand.

> 

> I checked deprecated_context_hook, it's only used in insight.  And all it does

> is set an int:

> 

>   https://sourceware.org/git/?p=insight.git;a=blob;f=gdbtk/generic/gdbtk-hooks.c;h=0aaf404995a3e08aca1e2921d455afa7c8a438ea;hb=HEAD#l783

> 

> So it won't care if we call it when the current thread hasn't actually changed,

> it's idempotent.

> 

> For fun I checked a but further, that variable is then used to drive the

> "gdb_context_id" tcl variable.  When I grep in insight, I find no use of that

> variable.  Is it maybe a variable that people could use in tcl scripting?  If

> not, it just seems that all this could be removed...


Alright, I was thinking of the patch in terms of just moving the code around,
but I don't mind doing that at the same time.  The patch as merged is below.

> 

> Anyway, that LGTM in any case.


From 2ab76a181f3db93f051aaae66d65ff2733884d96 Mon Sep 17 00:00:00 2001
From: Pedro Alves <pedro@palves.net>

Date: Wed, 23 Dec 2020 00:34:54 +0000
Subject: [PATCH 01/12] Fix attaching in non-stop mode (PR gdb/27055)

Attaching in non-stop mode currently misbehaves, like so:

 (gdb) attach 1244450
 Attaching to process 1244450
 [New LWP 1244453]
 [New LWP 1244454]
 [New LWP 1244455]
 [New LWP 1244456]
 [New LWP 1244457]
 [New LWP 1244458]
 [New LWP 1244459]
 [New LWP 1244461]
 [New LWP 1244462]
 [New LWP 1244463]
 No unwaited-for children left.

At this point, GDB's stopped/running thread state is out of sync with
the inferior:

(gdb) info threads
  Id   Target Id                     Frame
* 1    LWP 1244450 "attach-non-stop" 0xf1b443bf in ?? ()
  2    LWP 1244453 "attach-non-stop" (running)
  3    LWP 1244454 "attach-non-stop" (running)
  4    LWP 1244455 "attach-non-stop" (running)
  5    LWP 1244456 "attach-non-stop" (running)
  6    LWP 1244457 "attach-non-stop" (running)
  7    LWP 1244458 "attach-non-stop" (running)
  8    LWP 1244459 "attach-non-stop" (running)
  9    LWP 1244461 "attach-non-stop" (running)
  10   LWP 1244462 "attach-non-stop" (running)
  11   LWP 1244463 "attach-non-stop" (running)
(gdb)
(gdb) interrupt -a
(gdb)
*nothing*

The problem is that attaching installs an inferior continuation,
called when the target reports the initial attach stop, here, in
inf-loop.c:inferior_event_handler:

      /* Do all continuations associated with the whole inferior (not
	 a particular thread).  */
      if (inferior_ptid != null_ptid)
	do_all_inferior_continuations (0);

However, currently in non-stop mode, inferior_ptid is still null_ptid
when we get here.

If you try to do "set debug infrun 1" to debug the problem, however,
then the attach completes correctly, with GDB reporting a stop for
each thread.

The bug is that we're missing a switch_to_thread/context_switch call
when handling the initial stop, here:

  if (stop_soon == STOP_QUIETLY_NO_SIGSTOP
      && (ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_STOP
	  || ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_TRAP
	  || ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_0))
    {
      stop_print_frame = true;
      stop_waiting (ecs);
      ecs->event_thread->suspend.stop_signal = GDB_SIGNAL_0;
      return;
    }

Note how the STOP_QUIETLY / STOP_QUIETLY_REMOTE case above that does
call context_switch.

And the reason "set debug infrun 1" "fixes" it, is that the debug path
has a switch_to_thread call.

This patch fixes it by moving the main context_switch call earlier.
It also removes the:

   if (ecs->ptid != inferior_ptid)

check at the same time because:

 #1 - that is half of what context_switch already does

 #2 - deprecated_context_hook is only used in Insight, and all it does
      is set an int.  It won't care if we call it when the current
      thread hasn't actually changed.

A testcase exercising this will be added in a following patch.

gdb/ChangeLog:

	PR gdb/27055
	* infrun.c (handle_signal_stop): Move main context_switch call
	earlier, before STOP_QUIETLY_NO_SIGSTOP.
---
 gdb/ChangeLog |  6 ++++++
 gdb/infrun.c  | 20 +++++---------------
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 209834a15fc..9b3b64dcf93 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,9 @@
+2021-02-03  Pedro Alves  <pedro@palves.net>
+
+	PR gdb/27055
+	* infrun.c (handle_signal_stop): Move main context_switch call
+	earlier, before STOP_QUIETLY_NO_SIGSTOP.
+
 2021-02-02  Lancelot SIX  <lsix@lancelotsix.com>
 
 	* NEWS (Changed commands): Add entry for the behavior change of
diff --git a/gdb/infrun.c b/gdb/infrun.c
index e070eff33d7..405b907856a 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -5735,13 +5735,16 @@ handle_signal_stop (struct execution_control_state *ecs)
   ecs->event_thread->suspend.stop_pc
     = regcache_read_pc (get_thread_regcache (ecs->event_thread));
 
+  context_switch (ecs);
+
+  if (deprecated_context_hook)
+    deprecated_context_hook (ecs->event_thread->global_num);
+
   if (debug_infrun)
     {
       struct regcache *regcache = get_thread_regcache (ecs->event_thread);
       struct gdbarch *reg_gdbarch = regcache->arch ();
 
-      switch_to_thread (ecs->event_thread);
-
       infrun_debug_printf ("stop_pc=%s",
 			   paddress (reg_gdbarch,
 				     ecs->event_thread->suspend.stop_pc));
@@ -5764,7 +5767,6 @@ handle_signal_stop (struct execution_control_state *ecs)
   stop_soon = get_inferior_stop_soon (ecs);
   if (stop_soon == STOP_QUIETLY || stop_soon == STOP_QUIETLY_REMOTE)
     {
-      context_switch (ecs);
       infrun_debug_printf ("quietly stopped");
       stop_print_frame = true;
       stop_waiting (ecs);
@@ -5802,18 +5804,6 @@ handle_signal_stop (struct execution_control_state *ecs)
       return;
     }
 
-  /* See if something interesting happened to the non-current thread.  If
-     so, then switch to that thread.  */
-  if (ecs->ptid != inferior_ptid)
-    {
-      infrun_debug_printf ("context switch");
-
-      context_switch (ecs);
-
-      if (deprecated_context_hook)
-	deprecated_context_hook (ecs->event_thread->global_num);
-    }
-
   /* At this point, get hold of the now-current thread's frame.  */
   frame = get_current_frame ();
   gdbarch = get_frame_arch (frame);

base-commit: 0e33957abf8878a16283bee68dbd3899c2bcba09
-- 
2.26.2

Patch

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 45bedf89641..b54baf70994 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -5735,13 +5735,23 @@  handle_signal_stop (struct execution_control_state *ecs)
   ecs->event_thread->suspend.stop_pc
     = regcache_read_pc (get_thread_regcache (ecs->event_thread));
 
+  /* See if something interesting happened to the non-current thread.
+     If so, then switch to that thread.  */
+  if (ecs->ptid != inferior_ptid)
+    {
+      infrun_debug_printf ("context switch");
+
+      context_switch (ecs);
+
+      if (deprecated_context_hook)
+	deprecated_context_hook (ecs->event_thread->global_num);
+    }
+
   if (debug_infrun)
     {
       struct regcache *regcache = get_thread_regcache (ecs->event_thread);
       struct gdbarch *reg_gdbarch = regcache->arch ();
 
-      switch_to_thread (ecs->event_thread);
-
       infrun_debug_printf ("stop_pc=%s",
 			   paddress (reg_gdbarch,
 				     ecs->event_thread->suspend.stop_pc));
@@ -5764,7 +5774,6 @@  handle_signal_stop (struct execution_control_state *ecs)
   stop_soon = get_inferior_stop_soon (ecs);
   if (stop_soon == STOP_QUIETLY || stop_soon == STOP_QUIETLY_REMOTE)
     {
-      context_switch (ecs);
       infrun_debug_printf ("quietly stopped");
       stop_print_frame = true;
       stop_waiting (ecs);
@@ -5802,18 +5811,6 @@  handle_signal_stop (struct execution_control_state *ecs)
       return;
     }
 
-  /* See if something interesting happened to the non-current thread.  If
-     so, then switch to that thread.  */
-  if (ecs->ptid != inferior_ptid)
-    {
-      infrun_debug_printf ("context switch");
-
-      context_switch (ecs);
-
-      if (deprecated_context_hook)
-	deprecated_context_hook (ecs->event_thread->global_num);
-    }
-
   /* At this point, get hold of the now-current thread's frame.  */
   frame = get_current_frame ();
   gdbarch = get_frame_arch (frame);