[pushed] Fix follow-exec regression / crash (Re: Possible regression on gdb.multi/multi-arch-exec.exp)

Message ID f69f4198-3092-1aa4-e529-1dad1ccef539@redhat.com
State New
Headers show
Series
  • [pushed] Fix follow-exec regression / crash (Re: Possible regression on gdb.multi/multi-arch-exec.exp)
Related show

Commit Message

Pedro Alves June 28, 2018, 4:02 p.m.
On 06/28/2018 01:09 PM, Pedro Alves wrote:

> I think the "gdb: Eliminate the 'stop_pc' global" patch

> (<https://sourceware.org/ml/gdb-patches/2018-06/msg00524.html>)

> will fix this, because it moves the stop_pc assignment until

> after ecs->event_thread is refreshed:

I've split that bit out of that patch and pushed it in now,
as below.

Thanks for reporting.  I recall moving the stop_pc bit in the
multi-target branch a while ago after running into this crash, but when
splitting up the "Use thread_info and inferior pointers more throughout"
patch, I didn't realize that bit was needed as well.

From ecdc3a72c89e43e0e13c5478723b4f70b3964e9f Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>

Date: Thu, 28 Jun 2018 16:57:18 +0100
Subject: [PATCH] Fix follow-exec regression / crash

After commit 00431a78b28f ("Use thread_info and inferior pointers more
throughout"), following an exec can result in gdb crashing.  On some
systems, this is visible with gdb.multi/multi-arch-exec.exp and
gdb.base/foll-exec-mode.exp.  E.g.:

  $ make check TESTS="gdb.multi/multi-arch-exec.exp gdb.base/foll-exec-mode.exp"
  [snip]
  FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: continue across exec that changes architecture (GDB internal error)
  ERROR: : spawn id exp10 not open
      while executing

Running multi-arch-exec under Valgrind we easily spot the problem:

  process 16305 is executing new program: ..../gdb.multi/multi-arch-exec/1-multi-arch-exec-hello
  [New inferior 2 (process 0)]
  [New process 16305]
  ==16129== Invalid read of size 8
  ==16129==    at 0x7FA14D: get_thread_regcache(thread_info*) (regcache.c:399)
  ==16129==    by 0x75E54B: handle_inferior_event_1(execution_control_state*) (infrun.c:5292)
  ==16129==    by 0x75E82D: handle_inferior_event(execution_control_state*) (infrun.c:5382)
  ==16129==    by 0x75BC6A: fetch_inferior_event(void*) (infrun.c:3918)
  ==16129==    by 0x748DA3: inferior_event_handler(inferior_event_type, void*) (inf-loop.c:43)
  ==16129==    by 0x464B5D: handle_target_event(int, void*) (linux-nat.c:4359)
  ==16129==    by 0x7047E0: handle_file_event(file_handler*, int) (event-loop.c:733)
  ==16129==    by 0x704D83: gdb_wait_for_event(int) (event-loop.c:859)
  ==16129==    by 0x703BF6: gdb_do_one_event() (event-loop.c:322)
  ==16129==    by 0x703CA2: start_event_loop() (event-loop.c:371)
  ==16129==    by 0x791D95: captured_command_loop() (main.c:330)
  ==16129==    by 0x79311C: captured_main(void*) (main.c:1157)
  ==16129==  Address 0x15a5bad0 is 32 bytes inside a block of size 600 free'd
  ==16129==    at 0x4C2E1E8: operator delete(void*) (vg_replace_malloc.c:576)
  ==16129==    by 0x8A15D0: delete_thread_1(thread_info*, bool) (thread.c:465)
  ==16129==    by 0x8A15FA: delete_thread(thread_info*) (thread.c:476)
  ==16129==    by 0x8A0D43: add_thread_silent(ptid_t) (thread.c:291)
  ==16129==    by 0x8A0DF0: add_thread_with_info(ptid_t, private_thread_info*) (thread.c:317)
  ==16129==    by 0x8A0E79: add_thread(ptid_t) (thread.c:331)
  ==16129==    by 0x75764C: follow_exec(ptid_t, char*) (infrun.c:1233)
  ==16129==    by 0x75E534: handle_inferior_event_1(execution_control_state*) (infrun.c:5290)
  ==16129==    by 0x75E82D: handle_inferior_event(execution_control_state*) (infrun.c:5382)
  ==16129==    by 0x75BC6A: fetch_inferior_event(void*) (infrun.c:3918)
  ==16129==    by 0x748DA3: inferior_event_handler(inferior_event_type, void*) (inf-loop.c:43)
  ==16129==    by 0x464B5D: handle_target_event(int, void*) (linux-nat.c:4359)

The problem is that handle_inferior_event_1 is reading the stop_pc off
of a thread that was deleted by follow_exec.  Before commit
00431a78b28f, we didn't crash because we were passing down a ptid to
get_thread_regcache instead of ecs->event_thread.

Fix this by simply moving the stop_pc reading until after
ecs->event_thread is refreshed.

gdb/ChangeLog:
2018-06-28  Pedro Alves  <palves@redhat.com>

	* infrun.c (handle_inferior_event_1) <TARGET_WAITKIND_EXECD>:
	Moving fetching stop_pc until after ecs->event_thread is refreshed.
---
 gdb/ChangeLog | 5 +++++
 gdb/infrun.c  | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

-- 
2.14.4

Comments

Sergio Durigan Junior June 28, 2018, 4:36 p.m. | #1
On Thursday, June 28 2018, Pedro Alves wrote:

> On 06/28/2018 01:09 PM, Pedro Alves wrote:

>

>> I think the "gdb: Eliminate the 'stop_pc' global" patch

>> (<https://sourceware.org/ml/gdb-patches/2018-06/msg00524.html>)

>> will fix this, because it moves the stop_pc assignment until

>> after ecs->event_thread is refreshed:

> I've split that bit out of that patch and pushed it in now,

> as below.

>

> Thanks for reporting.  I recall moving the stop_pc bit in the

> multi-target branch a while ago after running into this crash, but when

> splitting up the "Use thread_info and inferior pointers more throughout"

> patch, I didn't realize that bit was needed as well.


Thanks, Pedro.

> From ecdc3a72c89e43e0e13c5478723b4f70b3964e9f Mon Sep 17 00:00:00 2001

> From: Pedro Alves <palves@redhat.com>

> Date: Thu, 28 Jun 2018 16:57:18 +0100

> Subject: [PATCH] Fix follow-exec regression / crash

>

> After commit 00431a78b28f ("Use thread_info and inferior pointers more

> throughout"), following an exec can result in gdb crashing.  On some

> systems, this is visible with gdb.multi/multi-arch-exec.exp and

> gdb.base/foll-exec-mode.exp.  E.g.:

>

>   $ make check TESTS="gdb.multi/multi-arch-exec.exp gdb.base/foll-exec-mode.exp"

>   [snip]

>   FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: continue across exec that changes architecture (GDB internal error)

>   ERROR: : spawn id exp10 not open

>       while executing

>

> Running multi-arch-exec under Valgrind we easily spot the problem:

>

>   process 16305 is executing new program: ..../gdb.multi/multi-arch-exec/1-multi-arch-exec-hello

>   [New inferior 2 (process 0)]

>   [New process 16305]

>   ==16129== Invalid read of size 8

>   ==16129==    at 0x7FA14D: get_thread_regcache(thread_info*) (regcache.c:399)

>   ==16129==    by 0x75E54B: handle_inferior_event_1(execution_control_state*) (infrun.c:5292)

>   ==16129==    by 0x75E82D: handle_inferior_event(execution_control_state*) (infrun.c:5382)

>   ==16129==    by 0x75BC6A: fetch_inferior_event(void*) (infrun.c:3918)

>   ==16129==    by 0x748DA3: inferior_event_handler(inferior_event_type, void*) (inf-loop.c:43)

>   ==16129==    by 0x464B5D: handle_target_event(int, void*) (linux-nat.c:4359)

>   ==16129==    by 0x7047E0: handle_file_event(file_handler*, int) (event-loop.c:733)

>   ==16129==    by 0x704D83: gdb_wait_for_event(int) (event-loop.c:859)

>   ==16129==    by 0x703BF6: gdb_do_one_event() (event-loop.c:322)

>   ==16129==    by 0x703CA2: start_event_loop() (event-loop.c:371)

>   ==16129==    by 0x791D95: captured_command_loop() (main.c:330)

>   ==16129==    by 0x79311C: captured_main(void*) (main.c:1157)

>   ==16129==  Address 0x15a5bad0 is 32 bytes inside a block of size 600 free'd

>   ==16129==    at 0x4C2E1E8: operator delete(void*) (vg_replace_malloc.c:576)

>   ==16129==    by 0x8A15D0: delete_thread_1(thread_info*, bool) (thread.c:465)

>   ==16129==    by 0x8A15FA: delete_thread(thread_info*) (thread.c:476)

>   ==16129==    by 0x8A0D43: add_thread_silent(ptid_t) (thread.c:291)

>   ==16129==    by 0x8A0DF0: add_thread_with_info(ptid_t, private_thread_info*) (thread.c:317)

>   ==16129==    by 0x8A0E79: add_thread(ptid_t) (thread.c:331)

>   ==16129==    by 0x75764C: follow_exec(ptid_t, char*) (infrun.c:1233)

>   ==16129==    by 0x75E534: handle_inferior_event_1(execution_control_state*) (infrun.c:5290)

>   ==16129==    by 0x75E82D: handle_inferior_event(execution_control_state*) (infrun.c:5382)

>   ==16129==    by 0x75BC6A: fetch_inferior_event(void*) (infrun.c:3918)

>   ==16129==    by 0x748DA3: inferior_event_handler(inferior_event_type, void*) (inf-loop.c:43)

>   ==16129==    by 0x464B5D: handle_target_event(int, void*) (linux-nat.c:4359)

>

> The problem is that handle_inferior_event_1 is reading the stop_pc off

> of a thread that was deleted by follow_exec.  Before commit

> 00431a78b28f, we didn't crash because we were passing down a ptid to

> get_thread_regcache instead of ecs->event_thread.

>

> Fix this by simply moving the stop_pc reading until after

> ecs->event_thread is refreshed.

>

> gdb/ChangeLog:

> 2018-06-28  Pedro Alves  <palves@redhat.com>

>

> 	* infrun.c (handle_inferior_event_1) <TARGET_WAITKIND_EXECD>:

> 	Moving fetching stop_pc until after ecs->event_thread is refreshed.

> ---

>  gdb/ChangeLog | 5 +++++

>  gdb/infrun.c  | 4 ++--

>  2 files changed, 7 insertions(+), 2 deletions(-)

>

> diff --git a/gdb/ChangeLog b/gdb/ChangeLog

> index 81fae4924a..417c563849 100644

> --- a/gdb/ChangeLog

> +++ b/gdb/ChangeLog

> @@ -1,3 +1,8 @@

> +2018-06-28  Pedro Alves  <palves@redhat.com>

> +

> +	* infrun.c (handle_inferior_event_1) <TARGET_WAITKIND_EXECD>:

> +	Moving fetching stop_pc until after ecs->event_thread is refreshed.

> +

>  2018-06-28  Tom Tromey  <tom@tromey.com>

>  

>  	* coffread.c (coff_symfile_finish): Update.

> diff --git a/gdb/infrun.c b/gdb/infrun.c

> index 9548f9cafd..4c732340d1 100644

> --- a/gdb/infrun.c

> +++ b/gdb/infrun.c

> @@ -5289,13 +5289,13 @@ Cannot fill $_exitsignal with the correct signal number.\n"));

>           stop.  */

>        follow_exec (inferior_ptid, ecs->ws.value.execd_pathname);

>  

> -      stop_pc = regcache_read_pc (get_thread_regcache (ecs->event_thread));

> -

>        /* In follow_exec we may have deleted the original thread and

>  	 created a new one.  Make sure that the event thread is the

>  	 execd thread for that case (this is a nop otherwise).  */

>        ecs->event_thread = inferior_thread ();

>  

> +      stop_pc = regcache_read_pc (get_thread_regcache (ecs->event_thread));

> +

>        ecs->event_thread->control.stop_bpstat

>  	= bpstat_stop_status (get_current_regcache ()->aspace (),

>  			      stop_pc, ecs->event_thread, &ecs->ws);

> -- 

> 2.14.4


-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/

Patch

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 81fae4924a..417c563849 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,8 @@ 
+2018-06-28  Pedro Alves  <palves@redhat.com>
+
+	* infrun.c (handle_inferior_event_1) <TARGET_WAITKIND_EXECD>:
+	Moving fetching stop_pc until after ecs->event_thread is refreshed.
+
 2018-06-28  Tom Tromey  <tom@tromey.com>
 
 	* coffread.c (coff_symfile_finish): Update.
diff --git a/gdb/infrun.c b/gdb/infrun.c
index 9548f9cafd..4c732340d1 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -5289,13 +5289,13 @@  Cannot fill $_exitsignal with the correct signal number.\n"));
          stop.  */
       follow_exec (inferior_ptid, ecs->ws.value.execd_pathname);
 
-      stop_pc = regcache_read_pc (get_thread_regcache (ecs->event_thread));
-
       /* In follow_exec we may have deleted the original thread and
 	 created a new one.  Make sure that the event thread is the
 	 execd thread for that case (this is a nop otherwise).  */
       ecs->event_thread = inferior_thread ();
 
+      stop_pc = regcache_read_pc (get_thread_regcache (ecs->event_thread));
+
       ecs->event_thread->control.stop_bpstat
 	= bpstat_stop_status (get_current_regcache ()->aspace (),
 			      stop_pc, ecs->event_thread, &ecs->ws);