[02/12] gdb: clear inferior displaced stepping state on exec

Message ID 20201110214614.2842615-3-simon.marchi@efficios.com
State New
Headers show
Series
  • Concurrent displaced stepping
Related show

Commit Message

Konstantin Kharlamov via Gdb-patches Nov. 10, 2020, 9:46 p.m.
When a process does an exec, all its program space is replaced with the
newly loaded executable.  All non-main threads disappear and the main
thread starts executing at the entry point of the new executable.

Things can go wrong if a displaced step operation is in progress while
we process the exec event.

If the main thread is the one executing the displaced step: when that
thread (now executing in the new executable) stops somewhere (say, at a
breakpoint), displaced_step_fixup will run and clear up the state.  We
will execute the "fixup" phase for the instruction we single-stepped in
the old program space.  We are now in a completely different context,
so doing the fixup may corrupt the state.

If it is a non-main thread that is doing the displaced step: while
handling the exec event, GDB deletes the thread_info representing that
thread (since the thread doesn't exist in the inferior after the exec).
But inferior::displaced_step_state::step_thread will still point to it.
When handling events later, this condition, in displaced_step_fixup,
will likely never be true:

    /* Was this event for the thread we displaced?  */
    if (displaced->step_thread != event_thread)
      return 0;

... since displaced->step_thread points to a deleted thread (unless that
storage gets re-used for a new thread_info, but that wouldn't be good
either).  This effectively makes the displaced stepping buffer occupied
for ever.  When a thread in the new program space will want to do a
displaced step, it will wait for ever.

I think we simply need to reset the displaced stepping state of the
inferior on exec.  Everything execution-related that existed before the
exec is now gone.

I tried to write a test where a non-main thread displaced-steps an exec
syscall, where things would hang due to the displaced step buffer not
getting released.  However, due to PR 26754 [1], it is hard to make it
stable.  So I'm not including a test for this patch.  If you have an
idea for another way to test this without triggering this bug, I'd like
to hear it.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=26754

gdb/ChangeLog:

	* infrun.c (infrun_inferior_execd): New function.
	(_initialize_infrun): Attach inferior_execd observer.

Change-Id: I1bbc8538e683f53af5b980091849086f4fec5ff9
---
 gdb/infrun.c | 7 +++++++
 1 file changed, 7 insertions(+)

-- 
2.28.0

Comments

Pedro Alves Nov. 25, 2020, 1:28 a.m. | #1
On 11/10/20 9:46 PM, Simon Marchi via Gdb-patches wrote:
> When a process does an exec, all its program space is replaced with the

> newly loaded executable.  All non-main threads disappear and the main

> thread starts executing at the entry point of the new executable.

> 

> Things can go wrong if a displaced step operation is in progress while

> we process the exec event.

> 

> If the main thread is the one executing the displaced step: when that

> thread (now executing in the new executable) stops somewhere (say, at a

> breakpoint), displaced_step_fixup will run and clear up the state.  We

> will execute the "fixup" phase for the instruction we single-stepped in

> the old program space.  We are now in a completely different context,

> so doing the fixup may corrupt the state.

> 

> If it is a non-main thread that is doing the displaced step: while

> handling the exec event, GDB deletes the thread_info representing that

> thread (since the thread doesn't exist in the inferior after the exec).

> But inferior::displaced_step_state::step_thread will still point to it.

> When handling events later, this condition, in displaced_step_fixup,

> will likely never be true:

> 

>     /* Was this event for the thread we displaced?  */

>     if (displaced->step_thread != event_thread)

>       return 0;

> 

> ... since displaced->step_thread points to a deleted thread (unless that

> storage gets re-used for a new thread_info, but that wouldn't be good

> either).  This effectively makes the displaced stepping buffer occupied

> for ever.  When a thread in the new program space will want to do a

> displaced step, it will wait for ever.

> 

> I think we simply need to reset the displaced stepping state of the

> inferior on exec.  Everything execution-related that existed before the

> exec is now gone.

> 

> I tried to write a test where a non-main thread displaced-steps an exec

> syscall, where things would hang due to the displaced step buffer not

> getting released.  However, due to PR 26754 [1], it is hard to make it

> stable.  So I'm not including a test for this patch.  If you have an

> idea for another way to test this without triggering this bug, I'd like

> to hear it.

> 

> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=26754


I can't think of another way to test this.

> 

> gdb/ChangeLog:

> 

> 	* infrun.c (infrun_inferior_execd): New function.

> 	(_initialize_infrun): Attach inferior_execd observer.


OK.
Simon Marchi Dec. 1, 2020, 4:27 a.m. | #2
On 2020-11-24 8:28 p.m., Pedro Alves wrote:
>> I tried to write a test where a non-main thread displaced-steps an exec

>> syscall, where things would hang due to the displaced step buffer not

>> getting released.  However, due to PR 26754 [1], it is hard to make it

>> stable.  So I'm not including a test for this patch.  If you have an

>> idea for another way to test this without triggering this bug, I'd like

>> to hear it.

>>

>> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=26754

> I can't think of another way to test this.

>


We can simply put a breakpoint with an always false condition on the
syscall instruction and run from start.  The exec will be displaced
stepped while the main thread will run.  I can reliably reproduce the
hang this way.

Simon

Patch

diff --git a/gdb/infrun.c b/gdb/infrun.c
index d59f6945285..bb881f3510d 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -1528,6 +1528,12 @@  infrun_inferior_exit (struct inferior *inf)
   inf->displaced_step_state.reset ();
 }
 
+static void
+infrun_inferior_execd (inferior *inf)
+{
+  inf->displaced_step_state.reset ();
+}
+
 /* If ON, and the architecture supports it, GDB will use displaced
    stepping to step over breakpoints.  If OFF, or if the architecture
    doesn't support it, GDB will instead use the traditional
@@ -9509,6 +9515,7 @@  enabled by default on some platforms."),
   gdb::observers::thread_stop_requested.attach (infrun_thread_stop_requested);
   gdb::observers::thread_exit.attach (infrun_thread_thread_exit);
   gdb::observers::inferior_exit.attach (infrun_inferior_exit);
+  gdb::observers::inferior_execd.attach (infrun_inferior_execd);
 
   /* Explicitly create without lookup, since that tries to create a
      value with a void typed value, and when we get here, gdbarch