[1/7] Fix spurious unhandled remote %Stop notifications

Message ID 20200706190252.22552-2-pedro@palves.net
State New
Headers show
Series
  • GDB busy loop when interrupting non-stop program (PR 26199)
Related show

Commit Message

Pedro Alves July 6, 2020, 7:02 p.m.
In non-stop mode, remote targets mark an async event source whose
callback is supposed to result in calling remote_target::wait_ns to
either process the event queue, or acknowledge an incoming %Stop
notification.

The callback in question is remote_async_inferior_event_handler, where
we call inferior_event_handler, to end up in fetch_inferior_event ->
target_wait -> remote_target::wait -> remote_target::wait_ns.

A problem here however is that when debugging multiple targets,
fetch_inferior_event can pull events out of any target picked at
random, for event fairness.  This means that when
remote_async_inferior_event_handler returns, remote_target::wait may
have not been called at all, and thus pending notifications may have
not been acked.  Because async event sources auto-clear, when
remote_async_inferior_event_handler returns the async event handler is
no longer marked, so the event loop won't automatically call
remote_async_inferior_event_handler again to try to process the
pending remote notifications/queue.  The result is that stop events
may end up not processed, e.g., "interrupt -a" seemingly not managing
to stop all threads.

Fix this by making remote_async_inferior_event_handler mark the event
handler again before returning, if necessary.

Maybe a better fix would be to make async event handlers not
auto-clear themselves, make that the responsibility of the callback,
so that the event loop would keep calling the callback automatically.
Or, we could try making so that fetch_inferior_event would optionally
handle events only for the target that it got passed down via
parameter.  However, I don't think now just before branching is the
time to try to do any such change.

gdb/ChangeLog:

	PR gdb/26199
	* remote.c (remote_target::open_1): Pass remote target pointer as
	data to create_async_event_handler.
	(remote_async_inferior_event_handler): Mark async event handler
	before returning if the remote target still has either pending
	events or unacknowledged notifications.
---
 gdb/remote.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

-- 
2.14.5

Comments

Andrew Burgess Dec. 12, 2020, 10:13 p.m. | #1
* Pedro Alves <pedro@palves.net> [2020-07-06 20:02:46 +0100]:

> In non-stop mode, remote targets mark an async event source whose

> callback is supposed to result in calling remote_target::wait_ns to

> either process the event queue, or acknowledge an incoming %Stop

> notification.

> 

> The callback in question is remote_async_inferior_event_handler, where

> we call inferior_event_handler, to end up in fetch_inferior_event ->

> target_wait -> remote_target::wait -> remote_target::wait_ns.

> 

> A problem here however is that when debugging multiple targets,

> fetch_inferior_event can pull events out of any target picked at

> random, for event fairness.  This means that when

> remote_async_inferior_event_handler returns, remote_target::wait may

> have not been called at all, and thus pending notifications may have

> not been acked.  Because async event sources auto-clear, when

> remote_async_inferior_event_handler returns the async event handler is

> no longer marked, so the event loop won't automatically call

> remote_async_inferior_event_handler again to try to process the

> pending remote notifications/queue.  The result is that stop events

> may end up not processed, e.g., "interrupt -a" seemingly not managing

> to stop all threads.

> 

> Fix this by making remote_async_inferior_event_handler mark the event

> handler again before returning, if necessary.

> 

> Maybe a better fix would be to make async event handlers not

> auto-clear themselves, make that the responsibility of the callback,

> so that the event loop would keep calling the callback automatically.

> Or, we could try making so that fetch_inferior_event would optionally

> handle events only for the target that it got passed down via

> parameter.  However, I don't think now just before branching is the

> time to try to do any such change.

> 

> gdb/ChangeLog:

> 

> 	PR gdb/26199

> 	* remote.c (remote_target::open_1): Pass remote target pointer as

> 	data to create_async_event_handler.

> 	(remote_async_inferior_event_handler): Mark async event handler

> 	before returning if the remote target still has either pending

> 	events or unacknowledged notifications.

> ---

>  gdb/remote.c | 15 ++++++++++++++-

>  1 file changed, 14 insertions(+), 1 deletion(-)

> 

> diff --git a/gdb/remote.c b/gdb/remote.c

> index f7f99dc24f..59075cb09f 100644

> --- a/gdb/remote.c

> +++ b/gdb/remote.c

> @@ -5605,7 +5605,7 @@ remote_target::open_1 (const char *name, int from_tty, int extended_p)

>  

>    /* Register extra event sources in the event loop.  */

>    rs->remote_async_inferior_event_token

> -    = create_async_event_handler (remote_async_inferior_event_handler, NULL);

> +    = create_async_event_handler (remote_async_inferior_event_handler, remote);

>    rs->notif_state = remote_notif_state_allocate (remote);

>  

>    /* Reset the target state; these things will be queried either by

> @@ -14164,6 +14164,19 @@ static void

>  remote_async_inferior_event_handler (gdb_client_data data)

>  {

>    inferior_event_handler (INF_REG_EVENT);

> +

> +  remote_target *remote = (remote_target *) data;

> +  remote_state *rs = remote->get_remote_state ();

> +

> +  /* inferior_event_handler may have consumed an event pending on the

> +     infrun side without calling target_wait on the REMOTE target, or

> +     may have pulled an event out of a different target.  Keep trying

> +     for this remote target as long it still has either pending events

> +     or unacknowledged notifications.  */

> +

> +  if (rs->notif_state->pending_event[notif_client_stop.id] != NULL

> +      || !rs->stop_reply_queue.empty ())

> +    mark_async_event_handler (rs->remote_async_inferior_event_token);

>  }


Pedro,

This patch introduced a use after free issue here.  This can be seen
by running the test:

  make check-gdb RUNTESTFLAGS="--target_board=native-gdbserver gdb.base/inferior-died.exp"

For me this fails maybe 1 in 5 times.  I've done some initial
investigation at the problem is obvious one you see the following
stack trace:

  #0  remote_state::~remote_state (this=0x338d548, __in_chrg=<optimized out>) at ../../src.dev-3/gdb/remote.c:1097
  #1  0x0000000000acf3b3 in remote_target::~remote_target (this=0x338d530, __in_chrg=<optimized out>) at ../../src.dev-3/gdb/remote.c:4078
  #2  0x0000000000acf3f6 in remote_target::~remote_target (this=0x338d530, __in_chrg=<optimized out>) at ../../src.dev-3/gdb/remote.c:4097
  #3  0x0000000000acf2fa in remote_target::close (this=0x338d530) at ../../src.dev-3/gdb/remote.c:4075
  #4  0x0000000000c75bfd in target_close (targ=0x338d530) at ../../src.dev-3/gdb/target.c:3126
  #5  0x0000000000c62ca4 in decref_target (t=0x338d530) at ../../src.dev-3/gdb/target.c:545
  #6  0x0000000000c62ec7 in target_stack::unpush (this=0x3666d50, t=0x338d530) at ../../src.dev-3/gdb/target.c:633
  #7  0x0000000000c7796c in inferior::unpush_target (this=0x3666ba0, t=0x338d530) at ../../src.dev-3/gdb/inferior.h:357
  #8  0x0000000000c62de1 in unpush_target (t=0x338d530) at ../../src.dev-3/gdb/target.c:595
  #9  0x0000000000c62ee7 in unpush_target_and_assert (target=0x338d530) at ../../src.dev-3/gdb/target.c:643
  #10 0x0000000000c62fb6 in pop_all_targets_at_and_above (stratum=process_stratum) at ../../src.dev-3/gdb/target.c:666
  #11 0x0000000000ad21a4 in remote_unpush_target (target=0x338d530) at ../../src.dev-3/gdb/remote.c:5524
  #12 0x0000000000adc619 in remote_target::mourn_inferior (this=0x338d530) at ../../src.dev-3/gdb/remote.c:9962
  #13 0x0000000000c65e79 in target_mourn_inferior (ptid=...) at ../../src.dev-3/gdb/target.c:2136
  #14 0x000000000086d0a5 in handle_inferior_event (ecs=0x7fffffffb3b0) at ../../src.dev-3/gdb/infrun.c:5234
  #15 0x0000000000869beb in fetch_inferior_event () at ../../src.dev-3/gdb/infrun.c:3863
  #16 0x000000000084a922 in inferior_event_handler (event_type=INF_REG_EVENT) at ../../src.dev-3/gdb/inf-loop.c:42
  #17 0x0000000000ae73a9 in remote_async_inferior_event_handler (data=0x338d530) at ../../src.dev-3/gdb/remote.c:14177
  #18 0x00000000004ea759 in check_async_event_handlers () at ../../src.dev-3/gdb/async-event.c:328
  #19 0x0000000001449e7a in gdb_do_one_event () at ../../src.dev-3/gdbsupport/event-loop.cc:216
  #20 0x00000000009102d0 in start_event_loop () at ../../src.dev-3/gdb/main.c:347
  #21 0x00000000009103f0 in captured_command_loop () at ../../src.dev-3/gdb/main.c:407
  #22 0x0000000000911be6 in captured_main (data=0x7fffffffb640) at ../../src.dev-3/gdb/main.c:1239
  #23 0x0000000000911c4c in gdb_main (args=0x7fffffffb640) at ../../src.dev-3/gdb/main.c:1254
  #24 0x000000000041755d in main (argc=5, argv=0x7fffffffb748) at ../../src.dev-3/gdb/gdb.c:32

The inferior event being processed is the inferior exited event, this
is the last remote inferior, and so the remote target is unpushed.
GDB then returns to remote_async_inferior_event_handler where we hit
the code you added above which proceeds to make use of the remote
target :-/

Like I say, the problem is now obvious, but the solution less so!

Reading what you originally wrote in the patch I wondered about the
idea of having it be the call back that is responsible for marking the
async event handler as clear.

I haven't tried to fix this yet, but thought I'd share my findings so
far with you.

Thanks,
Andrew
Simon Marchi via Gdb-patches Dec. 13, 2020, 12:46 a.m. | #2
On 2020-12-12 5:13 p.m., Andrew Burgess wrote:
> * Pedro Alves <pedro@palves.net> [2020-07-06 20:02:46 +0100]:

> 

>> In non-stop mode, remote targets mark an async event source whose

>> callback is supposed to result in calling remote_target::wait_ns to

>> either process the event queue, or acknowledge an incoming %Stop

>> notification.

>>

>> The callback in question is remote_async_inferior_event_handler, where

>> we call inferior_event_handler, to end up in fetch_inferior_event ->

>> target_wait -> remote_target::wait -> remote_target::wait_ns.

>>

>> A problem here however is that when debugging multiple targets,

>> fetch_inferior_event can pull events out of any target picked at

>> random, for event fairness.  This means that when

>> remote_async_inferior_event_handler returns, remote_target::wait may

>> have not been called at all, and thus pending notifications may have

>> not been acked.  Because async event sources auto-clear, when

>> remote_async_inferior_event_handler returns the async event handler is

>> no longer marked, so the event loop won't automatically call

>> remote_async_inferior_event_handler again to try to process the

>> pending remote notifications/queue.  The result is that stop events

>> may end up not processed, e.g., "interrupt -a" seemingly not managing

>> to stop all threads.

>>

>> Fix this by making remote_async_inferior_event_handler mark the event

>> handler again before returning, if necessary.

>>

>> Maybe a better fix would be to make async event handlers not

>> auto-clear themselves, make that the responsibility of the callback,

>> so that the event loop would keep calling the callback automatically.

>> Or, we could try making so that fetch_inferior_event would optionally

>> handle events only for the target that it got passed down via

>> parameter.  However, I don't think now just before branching is the

>> time to try to do any such change.

>>

>> gdb/ChangeLog:

>>

>> 	PR gdb/26199

>> 	* remote.c (remote_target::open_1): Pass remote target pointer as

>> 	data to create_async_event_handler.

>> 	(remote_async_inferior_event_handler): Mark async event handler

>> 	before returning if the remote target still has either pending

>> 	events or unacknowledged notifications.

>> ---

>>  gdb/remote.c | 15 ++++++++++++++-

>>  1 file changed, 14 insertions(+), 1 deletion(-)

>>

>> diff --git a/gdb/remote.c b/gdb/remote.c

>> index f7f99dc24f..59075cb09f 100644

>> --- a/gdb/remote.c

>> +++ b/gdb/remote.c

>> @@ -5605,7 +5605,7 @@ remote_target::open_1 (const char *name, int from_tty, int extended_p)

>>  

>>    /* Register extra event sources in the event loop.  */

>>    rs->remote_async_inferior_event_token

>> -    = create_async_event_handler (remote_async_inferior_event_handler, NULL);

>> +    = create_async_event_handler (remote_async_inferior_event_handler, remote);

>>    rs->notif_state = remote_notif_state_allocate (remote);

>>  

>>    /* Reset the target state; these things will be queried either by

>> @@ -14164,6 +14164,19 @@ static void

>>  remote_async_inferior_event_handler (gdb_client_data data)

>>  {

>>    inferior_event_handler (INF_REG_EVENT);

>> +

>> +  remote_target *remote = (remote_target *) data;

>> +  remote_state *rs = remote->get_remote_state ();

>> +

>> +  /* inferior_event_handler may have consumed an event pending on the

>> +     infrun side without calling target_wait on the REMOTE target, or

>> +     may have pulled an event out of a different target.  Keep trying

>> +     for this remote target as long it still has either pending events

>> +     or unacknowledged notifications.  */

>> +

>> +  if (rs->notif_state->pending_event[notif_client_stop.id] != NULL

>> +      || !rs->stop_reply_queue.empty ())

>> +    mark_async_event_handler (rs->remote_async_inferior_event_token);

>>  }

> 

> Pedro,

> 

> This patch introduced a use after free issue here.  This can be seen

> by running the test:

> 

>   make check-gdb RUNTESTFLAGS="--target_board=native-gdbserver gdb.base/inferior-died.exp"

> 

> For me this fails maybe 1 in 5 times.  I've done some initial

> investigation at the problem is obvious one you see the following

> stack trace:

> 

>   #0  remote_state::~remote_state (this=0x338d548, __in_chrg=<optimized out>) at ../../src.dev-3/gdb/remote.c:1097

>   #1  0x0000000000acf3b3 in remote_target::~remote_target (this=0x338d530, __in_chrg=<optimized out>) at ../../src.dev-3/gdb/remote.c:4078

>   #2  0x0000000000acf3f6 in remote_target::~remote_target (this=0x338d530, __in_chrg=<optimized out>) at ../../src.dev-3/gdb/remote.c:4097

>   #3  0x0000000000acf2fa in remote_target::close (this=0x338d530) at ../../src.dev-3/gdb/remote.c:4075

>   #4  0x0000000000c75bfd in target_close (targ=0x338d530) at ../../src.dev-3/gdb/target.c:3126

>   #5  0x0000000000c62ca4 in decref_target (t=0x338d530) at ../../src.dev-3/gdb/target.c:545

>   #6  0x0000000000c62ec7 in target_stack::unpush (this=0x3666d50, t=0x338d530) at ../../src.dev-3/gdb/target.c:633

>   #7  0x0000000000c7796c in inferior::unpush_target (this=0x3666ba0, t=0x338d530) at ../../src.dev-3/gdb/inferior.h:357

>   #8  0x0000000000c62de1 in unpush_target (t=0x338d530) at ../../src.dev-3/gdb/target.c:595

>   #9  0x0000000000c62ee7 in unpush_target_and_assert (target=0x338d530) at ../../src.dev-3/gdb/target.c:643

>   #10 0x0000000000c62fb6 in pop_all_targets_at_and_above (stratum=process_stratum) at ../../src.dev-3/gdb/target.c:666

>   #11 0x0000000000ad21a4 in remote_unpush_target (target=0x338d530) at ../../src.dev-3/gdb/remote.c:5524

>   #12 0x0000000000adc619 in remote_target::mourn_inferior (this=0x338d530) at ../../src.dev-3/gdb/remote.c:9962

>   #13 0x0000000000c65e79 in target_mourn_inferior (ptid=...) at ../../src.dev-3/gdb/target.c:2136

>   #14 0x000000000086d0a5 in handle_inferior_event (ecs=0x7fffffffb3b0) at ../../src.dev-3/gdb/infrun.c:5234

>   #15 0x0000000000869beb in fetch_inferior_event () at ../../src.dev-3/gdb/infrun.c:3863

>   #16 0x000000000084a922 in inferior_event_handler (event_type=INF_REG_EVENT) at ../../src.dev-3/gdb/inf-loop.c:42

>   #17 0x0000000000ae73a9 in remote_async_inferior_event_handler (data=0x338d530) at ../../src.dev-3/gdb/remote.c:14177

>   #18 0x00000000004ea759 in check_async_event_handlers () at ../../src.dev-3/gdb/async-event.c:328

>   #19 0x0000000001449e7a in gdb_do_one_event () at ../../src.dev-3/gdbsupport/event-loop.cc:216

>   #20 0x00000000009102d0 in start_event_loop () at ../../src.dev-3/gdb/main.c:347

>   #21 0x00000000009103f0 in captured_command_loop () at ../../src.dev-3/gdb/main.c:407

>   #22 0x0000000000911be6 in captured_main (data=0x7fffffffb640) at ../../src.dev-3/gdb/main.c:1239

>   #23 0x0000000000911c4c in gdb_main (args=0x7fffffffb640) at ../../src.dev-3/gdb/main.c:1254

>   #24 0x000000000041755d in main (argc=5, argv=0x7fffffffb748) at ../../src.dev-3/gdb/gdb.c:32

> 

> The inferior event being processed is the inferior exited event, this

> is the last remote inferior, and so the remote target is unpushed.

> GDB then returns to remote_async_inferior_event_handler where we hit

> the code you added above which proceeds to make use of the remote

> target :-/

> 

> Like I say, the problem is now obvious, but the solution less so!

> 

> Reading what you originally wrote in the patch I wondered about the

> idea of having it be the call back that is responsible for marking the

> async event handler as clear.

> 

> I haven't tried to fix this yet, but thought I'd share my findings so

> far with you.

> 

> Thanks,

> Andrew

> 


This patch series I proposed earlier should fix it:

https://sourceware.org/pipermail/gdb-patches/2020-November/173633.html

Simon

Patch

diff --git a/gdb/remote.c b/gdb/remote.c
index f7f99dc24f..59075cb09f 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -5605,7 +5605,7 @@  remote_target::open_1 (const char *name, int from_tty, int extended_p)
 
   /* Register extra event sources in the event loop.  */
   rs->remote_async_inferior_event_token
-    = create_async_event_handler (remote_async_inferior_event_handler, NULL);
+    = create_async_event_handler (remote_async_inferior_event_handler, remote);
   rs->notif_state = remote_notif_state_allocate (remote);
 
   /* Reset the target state; these things will be queried either by
@@ -14164,6 +14164,19 @@  static void
 remote_async_inferior_event_handler (gdb_client_data data)
 {
   inferior_event_handler (INF_REG_EVENT);
+
+  remote_target *remote = (remote_target *) data;
+  remote_state *rs = remote->get_remote_state ();
+
+  /* inferior_event_handler may have consumed an event pending on the
+     infrun side without calling target_wait on the REMOTE target, or
+     may have pulled an event out of a different target.  Keep trying
+     for this remote target as long it still has either pending events
+     or unacknowledged notifications.  */
+
+  if (rs->notif_state->pending_event[notif_client_stop.id] != NULL
+      || !rs->stop_reply_queue.empty ())
+    mark_async_event_handler (rs->remote_async_inferior_event_token);
 }
 
 int