Fix re-runs of a second inferior (PR gdb/25410)

Message ID 20200124030222.13854-1-palves@redhat.com
State New
Headers show
Series
  • Fix re-runs of a second inferior (PR gdb/25410)
Related show

Commit Message

Pedro Alves Jan. 24, 2020, 3:02 a.m.
This fixes a latent bug exposed by the multi-target patch (5b6d1e4fa
"Multi-target support).

The symptom described in the bug is that starting a first inferior,
then trying to run a second (multi-threaded) inferior twice, causes
libthread_db to fail to load, along with other erratic behavior:

 (gdb) run
 Starting program: /tmp/foo
 warning: td_ta_new failed: generic error

Going a bit deeply, I found that if the two inferiors have different
symbols, we can see that just after inferior 2 exits, we are left with
inferior 2 selected, which is correct, but the symbols in scope belong
to inferior 1, which is obviously incorrect...

This problem is that there's a path in
scoped_restore_current_thread::restore() that switches to no thread
selected, and switches the current inferior, but leaves the current
program space as is, resulting in leaving the program space pointing
to the wrong program space (the one of the other inferior).  This was
happening after handling TARGET_WAITKIND_NO_RESUMED, which is an event
that triggers after TARGET_WAITKIND_EXITED for the previous inferior
exit.  Subsequent symbol lookups find the symbols of the wrong
inferior.

The fix is to use switch_to_inferior_no_thread in that problem spot.
This function was recently added along with the multi-target work
exactly for these situations.

As for testing, this patch adds a new testcase that tests symbol
printing just after inferior exit, which exercises the root cause of
the problem more directly.  And then, to cover the use case described
in the bug too, it also exercises the lithread_db.so mis-loading, by
using TLS printing as a proxy for being sure that threaded debugging
was activated sucessfully.  The testcase fails without the fix like
this, for the "print symbol just after exit" bits:

 ...
 [Inferior 1 (process 8719) exited normally]
 (gdb) PASS: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=1: continue until exit
 print re_run_var_1
 No symbol "re_run_var_1" in current context.
 (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=1: print re_run_var_1
 ...

And like this for the "libthread_db.so loading" bits:

 (gdb) run
 Starting program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.multi/multi-re-run/multi-re-run
 warning: td_ta_new failed: generic error
 [New LWP 27001]

 Thread 1.1 "multi-re-run" hit Breakpoint 3, all_started () at /home/pedro/gdb/binutils-gdb/build/../src/gdb/testsuite/gdb.multi/multi-re-run.c:44
 44      }
 (gdb) PASS: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=2: running to all_started in runto
 print tls_var
 Cannot find thread-local storage for LWP 27000, executable file /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.multi/multi-re-run/multi-re-run:
 Cannot find thread-local variables on this target
 (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=2: print tls_var

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	PR gdb/25410
	* thread.c (scoped_restore_current_thread::restore): Use
	switch_to_inferior_no_thread.

gdb/testsuite/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	PR gdb/25410
	* gdb.multi/multi-re-run-1.c: New.
	* gdb.multi/multi-re-run-2.c: New.
	* gdb.multi/multi-re-run.exp: New.
---
 gdb/testsuite/gdb.multi/multi-re-run-1.c |  61 ++++++++++++++++
 gdb/testsuite/gdb.multi/multi-re-run-2.c |  61 ++++++++++++++++
 gdb/testsuite/gdb.multi/multi-re-run.exp | 115 +++++++++++++++++++++++++++++++
 gdb/thread.c                             |   5 +-
 4 files changed, 238 insertions(+), 4 deletions(-)
 create mode 100644 gdb/testsuite/gdb.multi/multi-re-run-1.c
 create mode 100644 gdb/testsuite/gdb.multi/multi-re-run-2.c
 create mode 100644 gdb/testsuite/gdb.multi/multi-re-run.exp


base-commit: 1ba1ac88011703abcd0271e4f5d00927dc69a09a
-- 
2.14.5

Comments

Simon Marchi Jan. 24, 2020, 3:32 a.m. | #1
On 2020-01-23 10:02 p.m., Pedro Alves wrote:
> This fixes a latent bug exposed by the multi-target patch (5b6d1e4fa

> "Multi-target support).

> 

> The symptom described in the bug is that starting a first inferior,

> then trying to run a second (multi-threaded) inferior twice, causes

> libthread_db to fail to load, along with other erratic behavior:

> 

>  (gdb) run

>  Starting program: /tmp/foo

>  warning: td_ta_new failed: generic error

> 

> Going a bit deeply, I found that if the two inferiors have different

> symbols, we can see that just after inferior 2 exits, we are left with

> inferior 2 selected, which is correct, but the symbols in scope belong

> to inferior 1, which is obviously incorrect...

> 

> This problem is that there's a path in

> scoped_restore_current_thread::restore() that switches to no thread

> selected, and switches the current inferior, but leaves the current

> program space as is, resulting in leaving the program space pointing

> to the wrong program space (the one of the other inferior).  This was

> happening after handling TARGET_WAITKIND_NO_RESUMED, which is an event

> that triggers after TARGET_WAITKIND_EXITED for the previous inferior

> exit.  Subsequent symbol lookups find the symbols of the wrong

> inferior.

> 

> The fix is to use switch_to_inferior_no_thread in that problem spot.

> This function was recently added along with the multi-target work

> exactly for these situations.

> 

> As for testing, this patch adds a new testcase that tests symbol

> printing just after inferior exit, which exercises the root cause of

> the problem more directly.  And then, to cover the use case described

> in the bug too, it also exercises the lithread_db.so mis-loading, by

> using TLS printing as a proxy for being sure that threaded debugging

> was activated sucessfully.  The testcase fails without the fix like

> this, for the "print symbol just after exit" bits:

> 

>  ...

>  [Inferior 1 (process 8719) exited normally]

>  (gdb) PASS: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=1: continue until exit

>  print re_run_var_1

>  No symbol "re_run_var_1" in current context.

>  (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=1: print re_run_var_1

>  ...

> 

> And like this for the "libthread_db.so loading" bits:

> 

>  (gdb) run

>  Starting program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.multi/multi-re-run/multi-re-run

>  warning: td_ta_new failed: generic error

>  [New LWP 27001]

> 

>  Thread 1.1 "multi-re-run" hit Breakpoint 3, all_started () at /home/pedro/gdb/binutils-gdb/build/../src/gdb/testsuite/gdb.multi/multi-re-run.c:44

>  44      }

>  (gdb) PASS: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=2: running to all_started in runto

>  print tls_var

>  Cannot find thread-local storage for LWP 27000, executable file /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.multi/multi-re-run/multi-re-run:

>  Cannot find thread-local variables on this target

>  (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=2: print tls_var


Thanks, that fixes it for me (and LGTM).

Simon
Tankut Baris Aktemur Jan. 24, 2020, 11:22 a.m. | #2
On Friday, January 24, 2020 4:02 AM, Pedro Alves wrote:
> This fixes a latent bug exposed by the multi-target patch (5b6d1e4fa

> "Multi-target support).


The patch led to the assertion violation below when running
gdb.threads/vfork-follow-child-exit.exp.

gdb/progspace.c:243: internal-error: void set_current_program_space(program_space*): Assertion `pspace != NULL' failed.

Thanks.
-Baris


> The symptom described in the bug is that starting a first inferior,

> then trying to run a second (multi-threaded) inferior twice, causes

> libthread_db to fail to load, along with other erratic behavior:

> 

>  (gdb) run

>  Starting program: /tmp/foo

>  warning: td_ta_new failed: generic error

> 

> Going a bit deeply, I found that if the two inferiors have different

> symbols, we can see that just after inferior 2 exits, we are left with

> inferior 2 selected, which is correct, but the symbols in scope belong

> to inferior 1, which is obviously incorrect...

> 

> This problem is that there's a path in

> scoped_restore_current_thread::restore() that switches to no thread

> selected, and switches the current inferior, but leaves the current

> program space as is, resulting in leaving the program space pointing

> to the wrong program space (the one of the other inferior).  This was

> happening after handling TARGET_WAITKIND_NO_RESUMED, which is an event

> that triggers after TARGET_WAITKIND_EXITED for the previous inferior

> exit.  Subsequent symbol lookups find the symbols of the wrong

> inferior.

> 

> The fix is to use switch_to_inferior_no_thread in that problem spot.

> This function was recently added along with the multi-target work

> exactly for these situations.

> 

> As for testing, this patch adds a new testcase that tests symbol

> printing just after inferior exit, which exercises the root cause of

> the problem more directly.  And then, to cover the use case described

> in the bug too, it also exercises the lithread_db.so mis-loading, by

> using TLS printing as a proxy for being sure that threaded debugging

> was activated sucessfully.  The testcase fails without the fix like

> this, for the "print symbol just after exit" bits:

> 

>  ...

>  [Inferior 1 (process 8719) exited normally]

>  (gdb) PASS: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=1: continue until exit

>  print re_run_var_1

>  No symbol "re_run_var_1" in current context.

>  (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=1: print re_run_var_1

>  ...

> 

> And like this for the "libthread_db.so loading" bits:

> 

>  (gdb) run

>  Starting program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.multi/multi-re-run/multi-re-run

>  warning: td_ta_new failed: generic error

>  [New LWP 27001]

> 

>  Thread 1.1 "multi-re-run" hit Breakpoint 3, all_started () at /home/pedro/gdb/binutils-

> gdb/build/../src/gdb/testsuite/gdb.multi/multi-re-run.c:44

>  44      }

>  (gdb) PASS: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=2: running to all_started in runto

>  print tls_var

>  Cannot find thread-local storage for LWP 27000, executable file /home/pedro/gdb/binutils-

> gdb/build/gdb/testsuite/outputs/gdb.multi/multi-re-run/multi-re-run:

>  Cannot find thread-local variables on this target

>  (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=2: print tls_var

> 

> gdb/ChangeLog:

> yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

> 

> 	PR gdb/25410

> 	* thread.c (scoped_restore_current_thread::restore): Use

> 	switch_to_inferior_no_thread.

> 

> gdb/testsuite/ChangeLog:

> yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

> 

> 	PR gdb/25410

> 	* gdb.multi/multi-re-run-1.c: New.

> 	* gdb.multi/multi-re-run-2.c: New.

> 	* gdb.multi/multi-re-run.exp: New.

> ---

>  gdb/testsuite/gdb.multi/multi-re-run-1.c |  61 ++++++++++++++++

>  gdb/testsuite/gdb.multi/multi-re-run-2.c |  61 ++++++++++++++++

>  gdb/testsuite/gdb.multi/multi-re-run.exp | 115 +++++++++++++++++++++++++++++++

>  gdb/thread.c                             |   5 +-

>  4 files changed, 238 insertions(+), 4 deletions(-)

>  create mode 100644 gdb/testsuite/gdb.multi/multi-re-run-1.c

>  create mode 100644 gdb/testsuite/gdb.multi/multi-re-run-2.c

>  create mode 100644 gdb/testsuite/gdb.multi/multi-re-run.exp

> 

> diff --git a/gdb/testsuite/gdb.multi/multi-re-run-1.c b/gdb/testsuite/gdb.multi/multi-re-run-1.c

> new file mode 100644

> index 0000000000..91b0dbce30

> --- /dev/null

> +++ b/gdb/testsuite/gdb.multi/multi-re-run-1.c

> @@ -0,0 +1,61 @@

> +/* This testcase is part of GDB, the GNU debugger.

> +

> +   Copyright 2020 Free Software Foundation, Inc.

> +

> +   This program is free software; you can redistribute it and/or modify

> +   it under the terms of the GNU General Public License as published by

> +   the Free Software Foundation; either version 3 of the License, or

> +   (at your option) any later version.

> +

> +   This program is distributed in the hope that it will be useful,

> +   but WITHOUT ANY WARRANTY; without even the implied warranty of

> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> +   GNU General Public License for more details.

> +

> +   You should have received a copy of the GNU General Public License

> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */

> +

> +#include <stdlib.h>

> +#include <stdio.h>

> +#include <unistd.h>

> +#include <limits.h>

> +#include <string.h>

> +#include <pthread.h>

> +

> +int re_run_var_1 = 1;

> +

> +#define NUM_THREADS 1

> +

> +__thread int tls_var = 1;

> +

> +static pthread_barrier_t barrier;

> +

> +static void *

> +thread_start (void *arg)

> +{

> +  pthread_barrier_wait (&barrier);

> +

> +  while (1)

> +    sleep (1);

> +  return NULL;

> +}

> +

> +static void

> +all_started (void)

> +{

> +}

> +

> +int

> +main (int argc, char ** argv)

> +{

> +  pthread_t thread;

> +  int len;

> +

> +  pthread_barrier_init (&barrier, NULL, NUM_THREADS + 1);

> +  pthread_create (&thread, NULL, thread_start, NULL);

> +

> +  pthread_barrier_wait (&barrier);

> +  all_started ();

> +

> +  return 0;

> +}

> diff --git a/gdb/testsuite/gdb.multi/multi-re-run-2.c b/gdb/testsuite/gdb.multi/multi-re-run-2.c

> new file mode 100644

> index 0000000000..6925e0cb2d

> --- /dev/null

> +++ b/gdb/testsuite/gdb.multi/multi-re-run-2.c

> @@ -0,0 +1,61 @@

> +/* This testcase is part of GDB, the GNU debugger.

> +

> +   Copyright 2020 Free Software Foundation, Inc.

> +

> +   This program is free software; you can redistribute it and/or modify

> +   it under the terms of the GNU General Public License as published by

> +   the Free Software Foundation; either version 3 of the License, or

> +   (at your option) any later version.

> +

> +   This program is distributed in the hope that it will be useful,

> +   but WITHOUT ANY WARRANTY; without even the implied warranty of

> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> +   GNU General Public License for more details.

> +

> +   You should have received a copy of the GNU General Public License

> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */

> +

> +#include <stdlib.h>

> +#include <stdio.h>

> +#include <unistd.h>

> +#include <limits.h>

> +#include <string.h>

> +#include <pthread.h>

> +

> +int re_run_var_2 = 2;

> +

> +#define NUM_THREADS 1

> +

> +__thread int tls_var = 1;

> +

> +static pthread_barrier_t barrier;

> +

> +static void *

> +thread_start (void *arg)

> +{

> +  pthread_barrier_wait (&barrier);

> +

> +  while (1)

> +    sleep (1);

> +  return NULL;

> +}

> +

> +static void

> +all_started (void)

> +{

> +}

> +

> +int

> +main (int argc, char ** argv)

> +{

> +  pthread_t thread;

> +  int len;

> +

> +  pthread_barrier_init (&barrier, NULL, NUM_THREADS + 1);

> +  pthread_create (&thread, NULL, thread_start, NULL);

> +

> +  pthread_barrier_wait (&barrier);

> +  all_started ();

> +

> +  return 0;

> +}

> diff --git a/gdb/testsuite/gdb.multi/multi-re-run.exp b/gdb/testsuite/gdb.multi/multi-re-run.exp

> new file mode 100644

> index 0000000000..daa4206e0e

> --- /dev/null

> +++ b/gdb/testsuite/gdb.multi/multi-re-run.exp

> @@ -0,0 +1,115 @@

> +# Copyright 2020 Free Software Foundation, Inc.

> +

> +# This program is free software; you can redistribute it and/or modify

> +# it under the terms of the GNU General Public License as published by

> +# the Free Software Foundation; either version 3 of the License, or

> +# (at your option) any later version.

> +#

> +# This program is distributed in the hope that it will be useful,

> +# but WITHOUT ANY WARRANTY; without even the implied warranty of

> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> +# GNU General Public License for more details.

> +#

> +# You should have received a copy of the GNU General Public License

> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.

> +

> +# Test loading two inferiors into GDB, and running one of them twice

> +# in a row.  GDB used to have a bug that made it so that after an

> +# inferior exit, the current program space was left pointing to the

> +# wrong inferior's pspace, causing subsequent symbol lookups to

> +# misbehave, including failing to load libthread_db.so.  See PR

> +# gdb/25410.

> +

> +# Build two executables, with different symbols.

> +

> +set exec1 "multi-re-run-1"

> +set srcfile1 multi-re-run-1.c

> +set binfile1 [standard_output_file ${exec1}]

> +

> +set exec2 "multi-re-run-2"

> +set srcfile2 multi-re-run-2.c

> +set binfile2 [standard_output_file ${exec2}]

> +

> +with_test_prefix "exec1" {

> +    if { [prepare_for_testing "failed to prepare" ${exec1} "${srcfile1}" \

> +	      [list pthreads debug]] } {

> +	return -1

> +    }

> +}

> +

> +with_test_prefix "exec2" {

> +    if { [prepare_for_testing "failed to prepare" ${exec2} "${srcfile2}" \

> +	      [list pthreads debug]] } {

> +	return -1

> +    }

> +}

> +

> +# Start two inferiors, leave one stopped, and run the other a couple

> +# times.  RE_RUN_INF is the inferior that is re-run.

> +

> +proc test_re_run {re_run_inf} {

> +    global binfile1 binfile2

> +    global inferior_exited_re

> +    global gdb_prompt

> +

> +    clean_restart ${binfile1}

> +

> +    delete_breakpoints

> +

> +    # Start another inferior.

> +    gdb_test "add-inferior" "Added inferior 2.*" \

> +	"add empty inferior 2"

> +    gdb_test "inferior 2" "Switching to inferior 2.*" \

> +	"switch to inferior 2"

> +    gdb_load ${binfile2}

> +

> +    if {$re_run_inf == 1} {

> +	set steady_inf 2

> +    } else {

> +	set steady_inf 1

> +    }

> +

> +    gdb_test "inferior $steady_inf" "Switching to inferior $steady_inf.*" \

> +	"switch to steady inferior"

> +

> +    # Run the steady inferior to a breakpoint, and let it stay stopped

> +    # there.

> +    if ![runto all_started message] then {

> +	untested "setup failed"

> +	return 0

> +    }

> +

> +    gdb_test "inferior $re_run_inf" "Switching to inferior $re_run_inf.*" \

> +	"switch to re-run inferior"

> +

> +    # Now run the RE_RUN_INF inferior a couple times.  GDB used to

> +    # have a bug that caused the second run to fail to load

> +    # libthread_db.so.

> +    foreach_with_prefix iter {1 2} {

> +	delete_breakpoints

> +

> +	if ![runto all_started message] {

> +	    return 0

> +	}

> +

> +	# If a thread_stratum fails to load, then TLS debugging fails

> +	# too.

> +	gdb_test "print tls_var" " = 1"

> +

> +	gdb_continue_to_end "" continue 1

> +

> +	# In the original bug, after an inferior exit, GDB would leave

> +	# the current program space pointing to the wrong inferior's

> +	# pspace, and thus the wrong symbols were visible.

> +	if {$re_run_inf == 1} {

> +	    gdb_test "print re_run_var_1" " = 1"

> +	} else {

> +	    gdb_test "print re_run_var_2" " = 2"

> +	}

> +    }

> +}

> +

> +# For completeness, test re-running either inferior 1 or inferior 2.

> +foreach_with_prefix re_run_inf {1 2} {

> +    test_re_run $re_run_inf

> +}

> diff --git a/gdb/thread.c b/gdb/thread.c

> index 001fdd42c5..302a49e984 100644

> --- a/gdb/thread.c

> +++ b/gdb/thread.c

> @@ -1429,10 +1429,7 @@ scoped_restore_current_thread::restore ()

>        && m_inf->pid != 0)

>      switch_to_thread (m_thread);

>    else

> -    {

> -      switch_to_no_thread ();

> -      set_current_inferior (m_inf);

> -    }

> +    switch_to_inferior_no_thread (m_inf);

> 

>    /* The running state of the originally selected thread may have

>       changed, so we have to recheck it here.  */

> 

> base-commit: 1ba1ac88011703abcd0271e4f5d00927dc69a09a

> --

> 2.14.5


Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
Simon Marchi Jan. 24, 2020, 2:59 p.m. | #3
On 2020-01-24 6:22 a.m., Aktemur, Tankut Baris wrote:
> On Friday, January 24, 2020 4:02 AM, Pedro Alves wrote:

>> This fixes a latent bug exposed by the multi-target patch (5b6d1e4fa

>> "Multi-target support).

> 

> The patch led to the assertion violation below when running

> gdb.threads/vfork-follow-child-exit.exp.

> 

> gdb/progspace.c:243: internal-error: void set_current_program_space(program_space*): Assertion `pspace != NULL' failed.



Oh, I see that too.

Simon
Pedro Alves Jan. 24, 2020, 3:11 p.m. | #4
On 1/24/20 2:59 PM, Simon Marchi wrote:
> On 2020-01-24 6:22 a.m., Aktemur, Tankut Baris wrote:

>> On Friday, January 24, 2020 4:02 AM, Pedro Alves wrote:

>>> This fixes a latent bug exposed by the multi-target patch (5b6d1e4fa

>>> "Multi-target support).

>>

>> The patch led to the assertion violation below when running

>> gdb.threads/vfork-follow-child-exit.exp.

>>

>> gdb/progspace.c:243: internal-error: void set_current_program_space(program_space*): Assertion `pspace != NULL' failed.

> 

> 

> Oh, I see that too.


Me too.  No idea how I missed this.  Sorry about that...
There's actually yet another internal error in addition
to that one.

Here's the updated patch, which fixes all.

From 2880f692d69cf54ef9fb41f84ac00a1d04f28447 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>

Date: Fri, 24 Jan 2020 14:55:43 +0000
Subject: [PATCH] Fix re-runs of a second inferior (PR gdb/25410)

This fixes a latent bug exposed by the multi-target patch (5b6d1e4fa
"Multi-target support), and then fixes two other latent bugs exposed
by fixing that first latent bug.

The symptom described in the bug report is that starting a first
inferior, then trying to run a second (multi-threaded) inferior twice,
causes libthread_db to fail to load, along with other erratic
behavior:

 (gdb) run
 Starting program: /tmp/foo
 warning: td_ta_new failed: generic error

Going a bit deeply, I found that if the two inferiors have different
symbols, we can see that just after inferior 2 exits, we are left with
inferior 2 selected, which is correct, but the symbols in scope belong
to inferior 1, which is obviously incorrect...

This problem is that there's a path in
scoped_restore_current_thread::restore() that switches to no thread
selected, and switches the current inferior, but leaves the current
program space as is, resulting in leaving the program space pointing
to the wrong program space (the one of the other inferior).  This was
happening after handling TARGET_WAITKIND_NO_RESUMED, which is an event
that triggers after TARGET_WAITKIND_EXITED for the previous inferior
exit.  Subsequent symbol lookups find the symbols of the wrong
inferior.

The fix is to use switch_to_inferior_no_thread in that problem spot.
This function was recently added along with the multi-target work
exactly for these situations.

As for testing, this patch adds a new testcase that tests symbol
printing just after inferior exit, which exercises the root cause of
the problem more directly.  And then, to cover the use case described
in the bug too, it also exercises the lithread_db.so mis-loading, by
using TLS printing as a proxy for being sure that threaded debugging
was activated sucessfully.  The testcase fails without the fix like
this, for the "print symbol just after exit" bits:

 ...
 [Inferior 1 (process 8719) exited normally]
 (gdb) PASS: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=1: continue until exit
 print re_run_var_1
 No symbol "re_run_var_1" in current context.
 (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=1: print re_run_var_1
 ...

And like this for the "libthread_db.so loading" bits:

 (gdb) run
 Starting program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.multi/multi-re-run/multi-re-run
 warning: td_ta_new failed: generic error
 [New LWP 27001]

 Thread 1.1 "multi-re-run" hit Breakpoint 3, all_started () at /home/pedro/gdb/binutils-gdb/build/../src/gdb/testsuite/gdb.multi/multi-re-run.c:44
 44      }
 (gdb) PASS: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=2: running to all_started in runto
 print tls_var
 Cannot find thread-local storage for LWP 27000, executable file /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.multi/multi-re-run/multi-re-run:
 Cannot find thread-local variables on this target
 (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=2: print tls_var


As mentioned, that fix above goes on to expose a couple other latent
bugs.  This commit fixes those as well.

The first latent bug exposed is in
infrun.c:handle_vfork_child_exec_or_exit.  The current code is leaving
inf->pspace == NULL while calling clone_program_space.  The idea was
to make it so that the breakpoints module doesn't use this inferior's
pspace to set breakpoints.  With that, any
scoped_restore_current_thread use from within clone_program_space
tries to restore a NULL program space, which hits an assertion:

 Attaching after Thread 0x7ffff74b8700 (LWP 27276) vfork to child process 27277]
 [New inferior 2 (process 27277)]
 [Thread debugging using libthread_db enabled]
 Using host libthread_db library "/lib64/libthread_db.so.1".
 /home/pedro/gdb/binutils-gdb/build/../src/gdb/progspace.c:243: internal-error: void set_current_program_space(program_space*): Assertion `pspace != NULL' faile
 d.
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.
 Quit this debugging session? (y or n) FAIL: gdb.threads/vfork-follow-child-exit.exp: detach-on-fork=off: continue (GDB internal error)

That NULL pspace idea was legitimate, but it's no longer necessary,
since commit b2e586e850db ("Defer breakpoint reset when cloning
progspace for fork child").  So the fix is to just set the inferior's
program space earlier.


The other latent bug exposed is in exec.c.  When exec_close is called
from the program_space destructor, it is purposedly called with a
current program space that is not current inferior's program space.
The problem is that the multi-target work added some code to
remove_target_sections that loops over all inferiors, and uses
scoped_restore_current_thread to save/restore the previous
thread/inferior/frame state.  This makes it so that exec_close returns
with the current program space set to the current inferior's program
space, which is exactly what we did not want.  Then the program_space
destructor continues into free_all_objfiles, but it is not running
that method on the wrong program space, resulting in:

 Reading symbols from /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/fork-plus-threads/fork-plus-threads...
 Reading symbols from /usr/lib/debug/usr/lib64/libpthread-2.26.so.debug...
 Reading symbols from /usr/lib/debug/usr/lib64/libm-2.26.so.debug...
 Reading symbols from /usr/lib/debug/usr/lib64/libc-2.26.so.debug...
 Reading symbols from /usr/lib/debug/usr/lib64/ld-2.26.so.debug...
 [Inferior 3 (process 9583) exited normally]
 /home/pedro/gdb/binutils-gdb/build/../src/gdb/progspace.c:170: internal-error: void program_space::free_all_objfiles(): Assertion `so->objfile == NULL' failed.
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.
 Quit this debugging session? (y or n) FAIL: gdb.threads/fork-plus-threads.exp: detach-on-fork=off: inferior 1 exited (GDB internal error)

The fix is to use scoped_restore_current_pspace_and_thread instead of
scoped_restore_current_thread.

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	PR gdb/25410
	* thread.c (scoped_restore_current_thread::restore): Use
	switch_to_inferior_no_thread.
	* exec.c: Include "progspace-and-thread.h".
	(add_target_sections, remove_target_sections):
	scoped_restore_current_pspace_and_thread instead of
	scoped_restore_current_thread.
	* infrun.c (handle_vfork_child_exec_or_exit): Assign the pspace
	and aspace to the inferior before calling clone_program_space.
	Remove stale comment.

gdb/testsuite/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	PR gdb/25410
	* gdb.multi/multi-re-run-1.c: New.
	* gdb.multi/multi-re-run-2.c: New.
	* gdb.multi/multi-re-run.exp: New.
---
 gdb/exec.c                               |   5 +-
 gdb/infrun.c                             |  16 ++---
 gdb/testsuite/gdb.multi/multi-re-run-1.c |  61 ++++++++++++++++
 gdb/testsuite/gdb.multi/multi-re-run-2.c |  61 ++++++++++++++++
 gdb/testsuite/gdb.multi/multi-re-run.exp | 115 +++++++++++++++++++++++++++++++
 gdb/thread.c                             |   5 +-
 6 files changed, 245 insertions(+), 18 deletions(-)
 create mode 100644 gdb/testsuite/gdb.multi/multi-re-run-1.c
 create mode 100644 gdb/testsuite/gdb.multi/multi-re-run-2.c
 create mode 100644 gdb/testsuite/gdb.multi/multi-re-run.exp

diff --git a/gdb/exec.c b/gdb/exec.c
index eb07b51dda..2506e84157 100644
--- a/gdb/exec.c
+++ b/gdb/exec.c
@@ -33,6 +33,7 @@
 #include "arch-utils.h"
 #include "gdbthread.h"
 #include "progspace.h"
+#include "progspace-and-thread.h"
 #include "gdb_bfd.h"
 #include "gcore.h"
 #include "source.h"
@@ -547,7 +548,7 @@ add_target_sections (void *owner,
 	  table->sections[space + i].owner = owner;
 	}
 
-      scoped_restore_current_thread restore_thread;
+      scoped_restore_current_pspace_and_thread restore_pspace_thread;
       program_space *curr_pspace = current_program_space;
 
       /* If these are the first file sections we can provide memory
@@ -645,7 +646,7 @@ remove_target_sections (void *owner)
 	 inferior sharing the program space.  */
       if (old_count + (dest - src) == 0)
 	{
-	  scoped_restore_current_thread restore_thread;
+	  scoped_restore_current_pspace_and_thread restore_pspace_thread;
 	  program_space *curr_pspace = current_program_space;
 
 	  for (inferior *inf : all_inferiors ())
diff --git a/gdb/infrun.c b/gdb/infrun.c
index 9c4a07daba..22de42c2ae 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -1020,8 +1020,6 @@ handle_vfork_child_exec_or_exit (int exec)
 	}
       else
 	{
-	  struct program_space *pspace;
-
 	  /* If this is a vfork child exiting, then the pspace and
 	     aspaces were shared with the parent.  Since we're
 	     reporting the process exit, we'll be mourning all that is
@@ -1037,18 +1035,12 @@ handle_vfork_child_exec_or_exit (int exec)
 	  scoped_restore restore_ptid
 	    = make_scoped_restore (&inferior_ptid, null_ptid);
 
-	  /* This inferior is dead, so avoid giving the breakpoints
-	     module the option to write through to it (cloning a
-	     program space resets breakpoints).  */
-	  inf->aspace = NULL;
-	  inf->pspace = NULL;
-	  pspace = new program_space (maybe_new_address_space ());
-	  set_current_program_space (pspace);
+	  inf->pspace = new program_space (maybe_new_address_space ());
+	  inf->aspace = inf->pspace->aspace;
+	  set_current_program_space (inf->pspace);
 	  inf->removable = 1;
 	  inf->symfile_flags = SYMFILE_NO_READ;
-	  clone_program_space (pspace, vfork_parent->pspace);
-	  inf->pspace = pspace;
-	  inf->aspace = pspace->aspace;
+	  clone_program_space (inf->pspace, vfork_parent->pspace);
 
 	  resume_parent = vfork_parent->pid;
 	}
diff --git a/gdb/testsuite/gdb.multi/multi-re-run-1.c b/gdb/testsuite/gdb.multi/multi-re-run-1.c
new file mode 100644
index 0000000000..91b0dbce30
--- /dev/null
+++ b/gdb/testsuite/gdb.multi/multi-re-run-1.c
@@ -0,0 +1,61 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2020 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <limits.h>
+#include <string.h>
+#include <pthread.h>
+
+int re_run_var_1 = 1;
+
+#define NUM_THREADS 1
+
+__thread int tls_var = 1;
+
+static pthread_barrier_t barrier;
+
+static void *
+thread_start (void *arg)
+{
+  pthread_barrier_wait (&barrier);
+
+  while (1)
+    sleep (1);
+  return NULL;
+}
+
+static void
+all_started (void)
+{
+}
+
+int
+main (int argc, char ** argv)
+{
+  pthread_t thread;
+  int len;
+
+  pthread_barrier_init (&barrier, NULL, NUM_THREADS + 1);
+  pthread_create (&thread, NULL, thread_start, NULL);
+
+  pthread_barrier_wait (&barrier);
+  all_started ();
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.multi/multi-re-run-2.c b/gdb/testsuite/gdb.multi/multi-re-run-2.c
new file mode 100644
index 0000000000..6925e0cb2d
--- /dev/null
+++ b/gdb/testsuite/gdb.multi/multi-re-run-2.c
@@ -0,0 +1,61 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2020 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <limits.h>
+#include <string.h>
+#include <pthread.h>
+
+int re_run_var_2 = 2;
+
+#define NUM_THREADS 1
+
+__thread int tls_var = 1;
+
+static pthread_barrier_t barrier;
+
+static void *
+thread_start (void *arg)
+{
+  pthread_barrier_wait (&barrier);
+
+  while (1)
+    sleep (1);
+  return NULL;
+}
+
+static void
+all_started (void)
+{
+}
+
+int
+main (int argc, char ** argv)
+{
+  pthread_t thread;
+  int len;
+
+  pthread_barrier_init (&barrier, NULL, NUM_THREADS + 1);
+  pthread_create (&thread, NULL, thread_start, NULL);
+
+  pthread_barrier_wait (&barrier);
+  all_started ();
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.multi/multi-re-run.exp b/gdb/testsuite/gdb.multi/multi-re-run.exp
new file mode 100644
index 0000000000..93cd709b5c
--- /dev/null
+++ b/gdb/testsuite/gdb.multi/multi-re-run.exp
@@ -0,0 +1,115 @@
+# Copyright 2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test loading two inferiors into GDB, and running one of them twice
+# in a row.  GDB used to have a bug that made it so that after an
+# inferior exit, the current program space was left pointing to the
+# wrong inferior's pspace, causing subsequent symbol lookups to
+# misbehave, including failing to load libthread_db.so.  See PR
+# gdb/25410.
+
+# Build two executables, with different symbols.
+
+set exec1 "multi-re-run-1"
+set srcfile1 multi-re-run-1.c
+set binfile1 [standard_output_file ${exec1}]
+
+set exec2 "multi-re-run-2"
+set srcfile2 multi-re-run-2.c
+set binfile2 [standard_output_file ${exec2}]
+
+with_test_prefix "exec1" {
+    if { [prepare_for_testing "failed to prepare" ${exec1} "${srcfile1}" \
+	      [list pthreads debug]] } {
+	return -1
+    }
+}
+
+with_test_prefix "exec2" {
+    if { [prepare_for_testing "failed to prepare" ${exec2} "${srcfile2}" \
+	      [list pthreads debug]] } {
+	return -1
+    }
+}
+
+# Start two inferiors, leave one stopped, and run the other a couple
+# times.  RE_RUN_INF is the inferior that is re-run.
+
+proc test_re_run {re_run_inf} {
+    global binfile1 binfile2
+    global inferior_exited_re
+    global gdb_prompt
+
+    clean_restart ${binfile1}
+
+    delete_breakpoints
+
+    # Start another inferior.
+    gdb_test "add-inferior" "Added inferior 2.*" \
+	"add empty inferior 2"
+    gdb_test "inferior 2" "Switching to inferior 2.*" \
+	"switch to inferior 2"
+    gdb_load ${binfile2}
+
+    if {$re_run_inf == 1} {
+	set steady_inf 2
+    } else {
+	set steady_inf 1
+    }
+
+    gdb_test "inferior $steady_inf" "Switching to inferior $steady_inf.*" \
+	"switch to steady inferior"
+
+    # Run the steady inferior to a breakpoint, and let it stay stopped
+    # there.
+    if ![runto all_started message] then {
+	untested "setup failed"
+	return 0
+    }
+
+    gdb_test "inferior $re_run_inf" "Switching to inferior $re_run_inf.*" \
+	"switch to re-run inferior"
+
+    # Now run the RE_RUN_INF inferior a couple times.  GDB used to
+    # have a bug that caused the second run to fail to load
+    # libthread_db.so.
+    foreach_with_prefix iter {1 2} {
+	delete_breakpoints
+
+	if ![runto all_started message] {
+	    return 0
+	}
+
+	# If a thread_stratum target fails to load, then TLS debugging
+	# fails too.
+	gdb_test "print tls_var" " = 1"
+
+	gdb_continue_to_end "" continue 1
+
+	# In the original bug, after an inferior exit, GDB would leave
+	# the current program space pointing to the wrong inferior's
+	# pspace, and thus the wrong symbols were visible.
+	if {$re_run_inf == 1} {
+	    gdb_test "print re_run_var_1" " = 1"
+	} else {
+	    gdb_test "print re_run_var_2" " = 2"
+	}
+    }
+}
+
+# For completeness, test re-running either inferior 1 or inferior 2.
+foreach_with_prefix re_run_inf {1 2} {
+    test_re_run $re_run_inf
+}
diff --git a/gdb/thread.c b/gdb/thread.c
index 001fdd42c5..302a49e984 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -1429,10 +1429,7 @@ scoped_restore_current_thread::restore ()
       && m_inf->pid != 0)
     switch_to_thread (m_thread);
   else
-    {
-      switch_to_no_thread ();
-      set_current_inferior (m_inf);
-    }
+    switch_to_inferior_no_thread (m_inf);
 
   /* The running state of the originally selected thread may have
      changed, so we have to recheck it here.  */

base-commit: 1ba1ac88011703abcd0271e4f5d00927dc69a09a
-- 
2.14.5
Simon Marchi Jan. 24, 2020, 5:05 p.m. | #5
On 2020-01-24 10:11 a.m., Pedro Alves wrote:
> On 1/24/20 2:59 PM, Simon Marchi wrote:

>> On 2020-01-24 6:22 a.m., Aktemur, Tankut Baris wrote:

>>> On Friday, January 24, 2020 4:02 AM, Pedro Alves wrote:

>>>> This fixes a latent bug exposed by the multi-target patch (5b6d1e4fa

>>>> "Multi-target support).

>>>

>>> The patch led to the assertion violation below when running

>>> gdb.threads/vfork-follow-child-exit.exp.

>>>

>>> gdb/progspace.c:243: internal-error: void set_current_program_space(program_space*): Assertion `pspace != NULL' failed.

>>

>>

>> Oh, I see that too.

> 

> Me too.  No idea how I missed this.  Sorry about that...

> There's actually yet another internal error in addition

> to that one.

> 

> Here's the updated patch, which fixes all.

> 

> From 2880f692d69cf54ef9fb41f84ac00a1d04f28447 Mon Sep 17 00:00:00 2001

> From: Pedro Alves <palves@redhat.com>

> Date: Fri, 24 Jan 2020 14:55:43 +0000

> Subject: [PATCH] Fix re-runs of a second inferior (PR gdb/25410)

> 

> This fixes a latent bug exposed by the multi-target patch (5b6d1e4fa

> "Multi-target support), and then fixes two other latent bugs exposed

> by fixing that first latent bug.

> 

> The symptom described in the bug report is that starting a first

> inferior, then trying to run a second (multi-threaded) inferior twice,

> causes libthread_db to fail to load, along with other erratic

> behavior:

> 

>  (gdb) run

>  Starting program: /tmp/foo

>  warning: td_ta_new failed: generic error

> 

> Going a bit deeply, I found that if the two inferiors have different

> symbols, we can see that just after inferior 2 exits, we are left with

> inferior 2 selected, which is correct, but the symbols in scope belong

> to inferior 1, which is obviously incorrect...

> 

> This problem is that there's a path in

> scoped_restore_current_thread::restore() that switches to no thread

> selected, and switches the current inferior, but leaves the current

> program space as is, resulting in leaving the program space pointing

> to the wrong program space (the one of the other inferior).  This was

> happening after handling TARGET_WAITKIND_NO_RESUMED, which is an event

> that triggers after TARGET_WAITKIND_EXITED for the previous inferior

> exit.  Subsequent symbol lookups find the symbols of the wrong

> inferior.

> 

> The fix is to use switch_to_inferior_no_thread in that problem spot.

> This function was recently added along with the multi-target work

> exactly for these situations.

> 

> As for testing, this patch adds a new testcase that tests symbol

> printing just after inferior exit, which exercises the root cause of

> the problem more directly.  And then, to cover the use case described

> in the bug too, it also exercises the lithread_db.so mis-loading, by

> using TLS printing as a proxy for being sure that threaded debugging

> was activated sucessfully.  The testcase fails without the fix like

> this, for the "print symbol just after exit" bits:

> 

>  ...

>  [Inferior 1 (process 8719) exited normally]

>  (gdb) PASS: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=1: continue until exit

>  print re_run_var_1

>  No symbol "re_run_var_1" in current context.

>  (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=1: print re_run_var_1

>  ...

> 

> And like this for the "libthread_db.so loading" bits:

> 

>  (gdb) run

>  Starting program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.multi/multi-re-run/multi-re-run

>  warning: td_ta_new failed: generic error

>  [New LWP 27001]

> 

>  Thread 1.1 "multi-re-run" hit Breakpoint 3, all_started () at /home/pedro/gdb/binutils-gdb/build/../src/gdb/testsuite/gdb.multi/multi-re-run.c:44

>  44      }

>  (gdb) PASS: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=2: running to all_started in runto

>  print tls_var

>  Cannot find thread-local storage for LWP 27000, executable file /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.multi/multi-re-run/multi-re-run:

>  Cannot find thread-local variables on this target

>  (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=2: print tls_var

> 

> 

> As mentioned, that fix above goes on to expose a couple other latent

> bugs.  This commit fixes those as well.

> 

> The first latent bug exposed is in

> infrun.c:handle_vfork_child_exec_or_exit.  The current code is leaving

> inf->pspace == NULL while calling clone_program_space.  The idea was

> to make it so that the breakpoints module doesn't use this inferior's

> pspace to set breakpoints.  With that, any

> scoped_restore_current_thread use from within clone_program_space

> tries to restore a NULL program space, which hits an assertion:

> 

>  Attaching after Thread 0x7ffff74b8700 (LWP 27276) vfork to child process 27277]

>  [New inferior 2 (process 27277)]

>  [Thread debugging using libthread_db enabled]

>  Using host libthread_db library "/lib64/libthread_db.so.1".

>  /home/pedro/gdb/binutils-gdb/build/../src/gdb/progspace.c:243: internal-error: void set_current_program_space(program_space*): Assertion `pspace != NULL' faile

>  d.

>  A problem internal to GDB has been detected,

>  further debugging may prove unreliable.

>  Quit this debugging session? (y or n) FAIL: gdb.threads/vfork-follow-child-exit.exp: detach-on-fork=off: continue (GDB internal error)

> 

> That NULL pspace idea was legitimate, but it's no longer necessary,

> since commit b2e586e850db ("Defer breakpoint reset when cloning

> progspace for fork child").  So the fix is to just set the inferior's

> program space earlier.

> 

> 

> The other latent bug exposed is in exec.c.  When exec_close is called

> from the program_space destructor, it is purposedly called with a

> current program space that is not current inferior's program space.

> The problem is that the multi-target work added some code to

> remove_target_sections that loops over all inferiors, and uses

> scoped_restore_current_thread to save/restore the previous

> thread/inferior/frame state.  This makes it so that exec_close returns

> with the current program space set to the current inferior's program

> space, which is exactly what we did not want.  Then the program_space

> destructor continues into free_all_objfiles, but it is not running

> that method on the wrong program space, resulting in:

> 

>  Reading symbols from /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/fork-plus-threads/fork-plus-threads...

>  Reading symbols from /usr/lib/debug/usr/lib64/libpthread-2.26.so.debug...

>  Reading symbols from /usr/lib/debug/usr/lib64/libm-2.26.so.debug...

>  Reading symbols from /usr/lib/debug/usr/lib64/libc-2.26.so.debug...

>  Reading symbols from /usr/lib/debug/usr/lib64/ld-2.26.so.debug...

>  [Inferior 3 (process 9583) exited normally]

>  /home/pedro/gdb/binutils-gdb/build/../src/gdb/progspace.c:170: internal-error: void program_space::free_all_objfiles(): Assertion `so->objfile == NULL' failed.

>  A problem internal to GDB has been detected,

>  further debugging may prove unreliable.

>  Quit this debugging session? (y or n) FAIL: gdb.threads/fork-plus-threads.exp: detach-on-fork=off: inferior 1 exited (GDB internal error)

> 

> The fix is to use scoped_restore_current_pspace_and_thread instead of

> scoped_restore_current_thread.


I did not look at the code in depth, but your explanations make sense to me,
so if the testsuite shows no regression, this LGTM.

Simon
Tankut Baris Aktemur Jan. 28, 2020, 12:30 p.m. | #6
On Friday, January 24, 2020 6:05 PM, Simon Marchi wrote:
> On 2020-01-24 10:11 a.m., Pedro Alves wrote:

> > On 1/24/20 2:59 PM, Simon Marchi wrote:

> >> On 2020-01-24 6:22 a.m., Aktemur, Tankut Baris wrote:

> >>> On Friday, January 24, 2020 4:02 AM, Pedro Alves wrote:

> >>>> This fixes a latent bug exposed by the multi-target patch (5b6d1e4fa

> >>>> "Multi-target support).

> >>>

> >>> The patch led to the assertion violation below when running

> >>> gdb.threads/vfork-follow-child-exit.exp.

> >>>

> >>> gdb/progspace.c:243: internal-error: void set_current_program_space(program_space*): Assertion `pspace != NULL'

> failed.

> >>

> >>

> >> Oh, I see that too.

> >

> > Me too.  No idea how I missed this.  Sorry about that...

> > There's actually yet another internal error in addition

> > to that one.

> >

> > Here's the updated patch, which fixes all.

> >


<snipped>

> > The fix is to use scoped_restore_current_pspace_and_thread instead of

> > scoped_restore_current_thread.

> 

> I did not look at the code in depth, but your explanations make sense to me,

> so if the testsuite shows no regression, this LGTM.

> 

> Simon


I ran the patch locally and did not see any regression.
Thank you.

-Baris

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

Patch

diff --git a/gdb/testsuite/gdb.multi/multi-re-run-1.c b/gdb/testsuite/gdb.multi/multi-re-run-1.c
new file mode 100644
index 0000000000..91b0dbce30
--- /dev/null
+++ b/gdb/testsuite/gdb.multi/multi-re-run-1.c
@@ -0,0 +1,61 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2020 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <limits.h>
+#include <string.h>
+#include <pthread.h>
+
+int re_run_var_1 = 1;
+
+#define NUM_THREADS 1
+
+__thread int tls_var = 1;
+
+static pthread_barrier_t barrier;
+
+static void *
+thread_start (void *arg)
+{
+  pthread_barrier_wait (&barrier);
+
+  while (1)
+    sleep (1);
+  return NULL;
+}
+
+static void
+all_started (void)
+{
+}
+
+int
+main (int argc, char ** argv)
+{
+  pthread_t thread;
+  int len;
+
+  pthread_barrier_init (&barrier, NULL, NUM_THREADS + 1);
+  pthread_create (&thread, NULL, thread_start, NULL);
+
+  pthread_barrier_wait (&barrier);
+  all_started ();
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.multi/multi-re-run-2.c b/gdb/testsuite/gdb.multi/multi-re-run-2.c
new file mode 100644
index 0000000000..6925e0cb2d
--- /dev/null
+++ b/gdb/testsuite/gdb.multi/multi-re-run-2.c
@@ -0,0 +1,61 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2020 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <limits.h>
+#include <string.h>
+#include <pthread.h>
+
+int re_run_var_2 = 2;
+
+#define NUM_THREADS 1
+
+__thread int tls_var = 1;
+
+static pthread_barrier_t barrier;
+
+static void *
+thread_start (void *arg)
+{
+  pthread_barrier_wait (&barrier);
+
+  while (1)
+    sleep (1);
+  return NULL;
+}
+
+static void
+all_started (void)
+{
+}
+
+int
+main (int argc, char ** argv)
+{
+  pthread_t thread;
+  int len;
+
+  pthread_barrier_init (&barrier, NULL, NUM_THREADS + 1);
+  pthread_create (&thread, NULL, thread_start, NULL);
+
+  pthread_barrier_wait (&barrier);
+  all_started ();
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.multi/multi-re-run.exp b/gdb/testsuite/gdb.multi/multi-re-run.exp
new file mode 100644
index 0000000000..daa4206e0e
--- /dev/null
+++ b/gdb/testsuite/gdb.multi/multi-re-run.exp
@@ -0,0 +1,115 @@ 
+# Copyright 2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test loading two inferiors into GDB, and running one of them twice
+# in a row.  GDB used to have a bug that made it so that after an
+# inferior exit, the current program space was left pointing to the
+# wrong inferior's pspace, causing subsequent symbol lookups to
+# misbehave, including failing to load libthread_db.so.  See PR
+# gdb/25410.
+
+# Build two executables, with different symbols.
+
+set exec1 "multi-re-run-1"
+set srcfile1 multi-re-run-1.c
+set binfile1 [standard_output_file ${exec1}]
+
+set exec2 "multi-re-run-2"
+set srcfile2 multi-re-run-2.c
+set binfile2 [standard_output_file ${exec2}]
+
+with_test_prefix "exec1" {
+    if { [prepare_for_testing "failed to prepare" ${exec1} "${srcfile1}" \
+	      [list pthreads debug]] } {
+	return -1
+    }
+}
+
+with_test_prefix "exec2" {
+    if { [prepare_for_testing "failed to prepare" ${exec2} "${srcfile2}" \
+	      [list pthreads debug]] } {
+	return -1
+    }
+}
+
+# Start two inferiors, leave one stopped, and run the other a couple
+# times.  RE_RUN_INF is the inferior that is re-run.
+
+proc test_re_run {re_run_inf} {
+    global binfile1 binfile2
+    global inferior_exited_re
+    global gdb_prompt
+
+    clean_restart ${binfile1}
+
+    delete_breakpoints
+
+    # Start another inferior.
+    gdb_test "add-inferior" "Added inferior 2.*" \
+	"add empty inferior 2"
+    gdb_test "inferior 2" "Switching to inferior 2.*" \
+	"switch to inferior 2"
+    gdb_load ${binfile2}
+
+    if {$re_run_inf == 1} {
+	set steady_inf 2
+    } else {
+	set steady_inf 1
+    }
+
+    gdb_test "inferior $steady_inf" "Switching to inferior $steady_inf.*" \
+	"switch to steady inferior"
+
+    # Run the steady inferior to a breakpoint, and let it stay stopped
+    # there.
+    if ![runto all_started message] then {
+	untested "setup failed"
+	return 0
+    }
+
+    gdb_test "inferior $re_run_inf" "Switching to inferior $re_run_inf.*" \
+	"switch to re-run inferior"
+
+    # Now run the RE_RUN_INF inferior a couple times.  GDB used to
+    # have a bug that caused the second run to fail to load
+    # libthread_db.so.
+    foreach_with_prefix iter {1 2} {
+	delete_breakpoints
+
+	if ![runto all_started message] {
+	    return 0
+	}
+
+	# If a thread_stratum fails to load, then TLS debugging fails
+	# too.
+	gdb_test "print tls_var" " = 1"
+
+	gdb_continue_to_end "" continue 1
+
+	# In the original bug, after an inferior exit, GDB would leave
+	# the current program space pointing to the wrong inferior's
+	# pspace, and thus the wrong symbols were visible.
+	if {$re_run_inf == 1} {
+	    gdb_test "print re_run_var_1" " = 1"
+	} else {
+	    gdb_test "print re_run_var_2" " = 2"
+	}
+    }
+}
+
+# For completeness, test re-running either inferior 1 or inferior 2.
+foreach_with_prefix re_run_inf {1 2} {
+    test_re_run $re_run_inf
+}
diff --git a/gdb/thread.c b/gdb/thread.c
index 001fdd42c5..302a49e984 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -1429,10 +1429,7 @@  scoped_restore_current_thread::restore ()
       && m_inf->pid != 0)
     switch_to_thread (m_thread);
   else
-    {
-      switch_to_no_thread ();
-      set_current_inferior (m_inf);
-    }
+    switch_to_inferior_no_thread (m_inf);
 
   /* The running state of the originally selected thread may have
      changed, so we have to recheck it here.  */