[05/12] gdbserver: spurious SIGTRAP w/ detach while step-over in progress

Message ID 20210113011543.2047449-6-pedro@palves.net
State New
Headers show
Series
  • Fix detach + displaced-step regression + N bugs more
Related show

Commit Message

Pedro Alves Jan. 13, 2021, 1:15 a.m.
A following patch will add a new testcase that has two processes, each
with a number of threads constantly tripping a breakpoint and stepping
over it, because the breakpoint has a condition that evals false.
Then GDB detaches from one of the processes, while both processes are
running.  And then the testcase sends a SIGUSR1 to the other process.

When run against gdbserver, that would occasionaly fail like this:

 (gdb) PASS: gdb.threads/detach-step-over.exp: iter 1: detach
 Executing on target: kill -SIGUSR1 208303    (timeout = 300)
 spawn -ignore SIGHUP kill -SIGUSR1 208303

 Thread 2.5 "detach-step-ove" received signal SIGTRAP, Trace/breakpoint trap.
 [Switching to Thread 208303.208305]
 0x000055555555522a in thread_func (arg=0x0) at /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.threads/detach-step-over.c:54
 54            counter++; /* Set breakpoint here.  */

Note that it's gdbserver itself that steps over the breakpoint.

The gdbserver logs reveal what happened:

 - GDB manages to detach while a step over is in progress.  That reaches
   linux_process_target::complete_ongoing_step_over(), which does:

      /* Passing NULL_PTID as filter indicates we want all events to
	 be left pending.  Eventually this returns when there are no
	 unwaited-for children left.  */
      ret = wait_for_event_filtered (minus_one_ptid, null_ptid, &wstat,
				     __WALL);

   As the comment say, this leaves all events pending, _including_ the
   just finished step SIGTRAP.  We never discard that SIGTRAP.  So
   GDBserver reports the SIGTRAP to GDB.  GDB can't explain the
   SIGTRAP, so it reports it to the user.

The gdbserver log looks like this.  The LWP of interest is 208305:

 Need step over [LWP 208305]? yes, found breakpoint at 0x555555555227
 proceed_all_lwps: found thread 208305 needing a step-over
 Starting step-over on LWP 208305.  Stopping all threads

208305 starts a step-over.

 >>>> entering void linux_process_target::stop_all_lwps(int, lwp_info*)

 stop_all_lwps (stop-and-suspend, except=LWP 208303.208305)
 Sending sigstop to lwp 208303
 Sending sigstop to lwp 207755
 wait_for_sigstop: pulling events
 LWFE: waitpid(-1, ...) returned 207755, ERRNO-OK
 LLW: waitpid 207755 received Stopped (signal) (stopped)
 pc is 0x7f7e045593bf
 Expected stop.
 LLW: SIGSTOP caught for LWP 207755.207755 while stopping threads.
 LWFE: waitpid(-1, ...) returned 208303, ERRNO-OK
 LLW: waitpid 208303 received Stopped (signal) (stopped)
 pc is 0x7ffff7e743bf
 Expected stop.
 LLW: SIGSTOP caught for LWP 208303.208303 while stopping threads.
 LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
 leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
 leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
 LLW: exit (no unwaited-for LWP)
 stop_all_lwps done, setting stopping_threads back to !stopping
 <<<< exiting void linux_process_target::stop_all_lwps(int, lwp_info*)
 Done stopping all threads for step-over.
 pc is 0x555555555227
 Writing 8b to 0x555555555227 in process 208305
 Could not findsigchld_handler
  fast tracepoint jump at 0x555555555227 in list (uninserting).
   pending reinsert at 0x555555555227
   step from pc 0x555555555227
 Resuming lwp 208305 (step, signal 0, stop expected)
 <<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)
 handling possible serial event
 getpkt ("D;32b8b");  [no ack sent]

The detach request arrives.

 sigchld_handler
 Tracing is already off, ignoring
 detach: step over in progress, finish it first

gdbserver realizes a step over for 208305 was in progress, let's it
finish.

 LWFE: waitpid(-1, ...) returned 208305, ERRNO-OK
 LLW: waitpid 208305 received Stopped (signal) (stopped)
 pc is 0x555555555227
 Expected stop.
 LLW: step LWP 208303.208305, 0, 0 (discard delayed SIGSTOP)
   pending reinsert at 0x555555555227
   step from pc 0x555555555227
 Resuming lwp 208305 (step, signal 0, stop not expected)
 LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
 leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
 leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
 sigsuspend'ing
 LWFE: waitpid(-1, ...) returned 208305, ERRNO-OK
 LLW: waitpid 208305 received Trace/breakpoint trap (stopped)
 pc is 0x55555555522a
 CSBB: LWP 208303.208305 stopped by trace
 LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
 leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
 leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
 LLW: exit (no unwaited-for LWP)
 Finished step over.

The step-over for 208305 finishes.

 Writing cc to 0x555555555227 in process 208305
 Could not find fast tracepoint jump at 0x555555555227 in list (reinserting).
 >>>> entering void linux_process_target::stop_all_lwps(int, lwp_info*)

 stop_all_lwps (stop, except=none)
 wait_for_sigstop: pulling events

The detach proceeds (snipped).

...

 proceed_one_lwp: lwp 208305
    LWP 208305 has pending status, leaving stopped

Later on, 208305 has a pending status (the step SIGTRAP from the
step-over), so GDBserver starts the process of reporting it.

...

 wait_1 ret = LWP 208303.208305, 1, 5
 <<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)

...

and eventually GDB receives the stop notification (T05 == SIGTRAP):

 getpkt ("vStopped");  [no ack sent]
 sigchld_handler
 vStopped: acking 3
 Writing resume reply for LWP 208303.208305:1
 putpkt ("$T0506:f0ee58f7ff7f0* ;07:f0ee58f7ff7f0* ;10:2a525*"550* ;thread:p32daf.32db1;core:c;#37"); [noack mode]

From the GDB side, we see:

 [infrun] fetch_inferior_event: enter
   [infrun] fetch_inferior_event: fetch_inferior_event enter
   [infrun] do_target_wait: Found 2 inferiors, starting at #1
   [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
   [infrun] print_target_wait_results:   208303.208305.0 [Thread 208303.208305],
   [infrun] print_target_wait_results:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
   [infrun] handle_inferior_event: status->kind = stopped, signal = GDB_SIGNAL_TRAP
   [infrun] start_step_over: enter
     [infrun] start_step_over: stealing global queue of threads to step, length = 6
     [infrun] operator(): putting back 6 threads to step in global queue
   [infrun] start_step_over: exit
   [infrun] handle_signal_stop: context switch
   [infrun] context_switch: Switching context from process 0 to Thread 208303.208305
   [infrun] handle_signal_stop: stop_pc=0x55555555522a
   [infrun] handle_signal_stop: random signal (GDB_SIGNAL_TRAP)
   [infrun] stop_waiting: stop_waiting
   [infrun] stop_all_threads: starting

The fix is to discard the step SIGTRAP, unless GDB wanted the thread
to step.

gdbserver/ChangeLog:

	* linux-low.cc (linux_process_target::complete_ongoing_step_over):
	Discard step SIGTRAP, unless GDB wanted the thread to step.
---
 gdbserver/linux-low.cc | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

-- 
2.26.2

Comments

Lancelot SIX via Gdb-patches Jan. 13, 2021, 6 a.m. | #1
On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> A following patch will add a new testcase that has two processes, each

> with a number of threads constantly tripping a breakpoint and stepping

> over it, because the breakpoint has a condition that evals false.

> Then GDB detaches from one of the processes, while both processes are

> running.  And then the testcase sends a SIGUSR1 to the other process.

> 

> When run against gdbserver, that would occasionaly fail like this:

> 

>  (gdb) PASS: gdb.threads/detach-step-over.exp: iter 1: detach

>  Executing on target: kill -SIGUSR1 208303    (timeout = 300)

>  spawn -ignore SIGHUP kill -SIGUSR1 208303

> 

>  Thread 2.5 "detach-step-ove" received signal SIGTRAP, Trace/breakpoint trap.

>  [Switching to Thread 208303.208305]

>  0x000055555555522a in thread_func (arg=0x0) at /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.threads/detach-step-over.c:54

>  54            counter++; /* Set breakpoint here.  */

> 

> Note that it's gdbserver itself that steps over the breakpoint.

> 

> The gdbserver logs reveal what happened:

> 

>  - GDB manages to detach while a step over is in progress.  That reaches

>    linux_process_target::complete_ongoing_step_over(), which does:

> 

>       /* Passing NULL_PTID as filter indicates we want all events to

> 	 be left pending.  Eventually this returns when there are no

> 	 unwaited-for children left.  */

>       ret = wait_for_event_filtered (minus_one_ptid, null_ptid, &wstat,

> 				     __WALL);

> 

>    As the comment say, this leaves all events pending, _including_ the

>    just finished step SIGTRAP.  We never discard that SIGTRAP.  So

>    GDBserver reports the SIGTRAP to GDB.  GDB can't explain the

>    SIGTRAP, so it reports it to the user.

> 

> The gdbserver log looks like this.  The LWP of interest is 208305:

> 

>  Need step over [LWP 208305]? yes, found breakpoint at 0x555555555227

>  proceed_all_lwps: found thread 208305 needing a step-over

>  Starting step-over on LWP 208305.  Stopping all threads

> 

> 208305 starts a step-over.

> 

>  >>>> entering void linux_process_target::stop_all_lwps(int, lwp_info*)

>  stop_all_lwps (stop-and-suspend, except=LWP 208303.208305)

>  Sending sigstop to lwp 208303

>  Sending sigstop to lwp 207755

>  wait_for_sigstop: pulling events

>  LWFE: waitpid(-1, ...) returned 207755, ERRNO-OK

>  LLW: waitpid 207755 received Stopped (signal) (stopped)

>  pc is 0x7f7e045593bf

>  Expected stop.

>  LLW: SIGSTOP caught for LWP 207755.207755 while stopping threads.

>  LWFE: waitpid(-1, ...) returned 208303, ERRNO-OK

>  LLW: waitpid 208303 received Stopped (signal) (stopped)

>  pc is 0x7ffff7e743bf

>  Expected stop.

>  LLW: SIGSTOP caught for LWP 208303.208303 while stopping threads.

>  LWFE: waitpid(-1, ...) returned 0, ERRNO-OK

>  leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0

>  leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0

>  LLW: exit (no unwaited-for LWP)

>  stop_all_lwps done, setting stopping_threads back to !stopping

>  <<<< exiting void linux_process_target::stop_all_lwps(int, lwp_info*)

>  Done stopping all threads for step-over.

>  pc is 0x555555555227

>  Writing 8b to 0x555555555227 in process 208305

>  Could not findsigchld_handler

>   fast tracepoint jump at 0x555555555227 in list (uninserting).

>    pending reinsert at 0x555555555227

>    step from pc 0x555555555227

>  Resuming lwp 208305 (step, signal 0, stop expected)

>  <<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)

>  handling possible serial event

>  getpkt ("D;32b8b");  [no ack sent]

> 

> The detach request arrives.

> 

>  sigchld_handler

>  Tracing is already off, ignoring

>  detach: step over in progress, finish it first

> 

> gdbserver realizes a step over for 208305 was in progress, let's it

> finish.

> 

>  LWFE: waitpid(-1, ...) returned 208305, ERRNO-OK

>  LLW: waitpid 208305 received Stopped (signal) (stopped)

>  pc is 0x555555555227

>  Expected stop.

>  LLW: step LWP 208303.208305, 0, 0 (discard delayed SIGSTOP)

>    pending reinsert at 0x555555555227

>    step from pc 0x555555555227

>  Resuming lwp 208305 (step, signal 0, stop not expected)

>  LWFE: waitpid(-1, ...) returned 0, ERRNO-OK

>  leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0

>  leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0

>  sigsuspend'ing

>  LWFE: waitpid(-1, ...) returned 208305, ERRNO-OK

>  LLW: waitpid 208305 received Trace/breakpoint trap (stopped)

>  pc is 0x55555555522a

>  CSBB: LWP 208303.208305 stopped by trace

>  LWFE: waitpid(-1, ...) returned 0, ERRNO-OK

>  leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0

>  leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0

>  LLW: exit (no unwaited-for LWP)

>  Finished step over.

> 

> The step-over for 208305 finishes.

> 

>  Writing cc to 0x555555555227 in process 208305

>  Could not find fast tracepoint jump at 0x555555555227 in list (reinserting).

>  >>>> entering void linux_process_target::stop_all_lwps(int, lwp_info*)

>  stop_all_lwps (stop, except=none)

>  wait_for_sigstop: pulling events

> 

> The detach proceeds (snipped).

> 

> ...

> 

>  proceed_one_lwp: lwp 208305

>     LWP 208305 has pending status, leaving stopped

> 

> Later on, 208305 has a pending status (the step SIGTRAP from the

> step-over), so GDBserver starts the process of reporting it.

> 

> ...

> 

>  wait_1 ret = LWP 208303.208305, 1, 5

>  <<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)

> 

> ...

> 

> and eventually GDB receives the stop notification (T05 == SIGTRAP):

> 

>  getpkt ("vStopped");  [no ack sent]

>  sigchld_handler

>  vStopped: acking 3

>  Writing resume reply for LWP 208303.208305:1

>  putpkt ("$T0506:f0ee58f7ff7f0* ;07:f0ee58f7ff7f0* ;10:2a525*"550* ;thread:p32daf.32db1;core:c;#37"); [noack mode]

> 

> From the GDB side, we see:

> 

>  [infrun] fetch_inferior_event: enter

>    [infrun] fetch_inferior_event: fetch_inferior_event enter

>    [infrun] do_target_wait: Found 2 inferiors, starting at #1

>    [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =

>    [infrun] print_target_wait_results:   208303.208305.0 [Thread 208303.208305],

>    [infrun] print_target_wait_results:   status->kind = stopped, signal = GDB_SIGNAL_TRAP

>    [infrun] handle_inferior_event: status->kind = stopped, signal = GDB_SIGNAL_TRAP

>    [infrun] start_step_over: enter

>      [infrun] start_step_over: stealing global queue of threads to step, length = 6

>      [infrun] operator(): putting back 6 threads to step in global queue

>    [infrun] start_step_over: exit

>    [infrun] handle_signal_stop: context switch

>    [infrun] context_switch: Switching context from process 0 to Thread 208303.208305

>    [infrun] handle_signal_stop: stop_pc=0x55555555522a

>    [infrun] handle_signal_stop: random signal (GDB_SIGNAL_TRAP)

>    [infrun] stop_waiting: stop_waiting

>    [infrun] stop_all_threads: starting

> 

> The fix is to discard the step SIGTRAP, unless GDB wanted the thread

> to step.


Maybe the only thing that wasn't clear from the start of the
explanation is that GDBserver is doing a step-over for process A
when a detach request for process B arrives.  And that generates
a spurious SIGTRAP report for process A.  It wasn't clear which
process was doing doing what.  But otherwise, make sense.

The fix LGTM, although I'm not much used to the GDBserver internals.

Simon
Pedro Alves Feb. 3, 2021, 1:26 a.m. | #2
On 13/01/21 06:00, Simon Marchi wrote:

> Maybe the only thing that wasn't clear from the start of the

> explanation is that GDBserver is doing a step-over for process A

> when a detach request for process B arrives.  And that generates

> a spurious SIGTRAP report for process A.  It wasn't clear which

> process was doing doing what.  But otherwise, make sense.


I see.  I've added that info to the commit log then, and merged it.

> 

> The fix LGTM, although I'm not much used to the GDBserver internals.

>

Patch

diff --git a/gdbserver/linux-low.cc b/gdbserver/linux-low.cc
index 4b43d171d2d..5c696c275dd 100644
--- a/gdbserver/linux-low.cc
+++ b/gdbserver/linux-low.cc
@@ -4695,7 +4695,34 @@  linux_process_target::complete_ongoing_step_over ()
 
       lwp = find_lwp_pid (step_over_bkpt);
       if (lwp != NULL)
-	finish_step_over (lwp);
+	{
+	  finish_step_over (lwp);
+
+	  /* If we got our step SIGTRAP, don't leave it pending,
+	     otherwise we would report it to GDB as a spurious
+	     SIGTRAP.  */
+	  gdb_assert (lwp->status_pending_p);
+	  if (WIFSTOPPED (lwp->status_pending)
+	      && WSTOPSIG (lwp->status_pending) == SIGTRAP)
+	    {
+	      thread_info *thread = get_lwp_thread (lwp);
+	      if (thread->last_resume_kind != resume_step)
+		{
+		  if (debug_threads)
+		    debug_printf ("detach: discard step-over SIGTRAP\n");
+
+		  lwp->status_pending_p = 0;
+		  lwp->status_pending = 0;
+		  resume_one_lwp (lwp, lwp->stepping, 0, NULL);
+		}
+	      else
+		{
+		  if (debug_threads)
+		    debug_printf ("detach: resume_step, "
+				  "not discarding step-over SIGTRAP\n");
+		}
+	    }
+	}
       step_over_bkpt = null_ptid;
       unsuspend_all_lwps (lwp);
     }