Fix inline frame unwinding breakage

Message ID 20200414213836.24370-1-luis.machado@linaro.org
State New
Headers show
Series
  • Fix inline frame unwinding breakage
Related show

Commit Message

Rogerio Alves via Gdb-patches April 14, 2020, 9:38 p.m.
*** re-sending due to the poor choice of characters for the backtrace
annotations. GIT swallowed parts of it.

There has been some breakage for aarch64-linux, arm-linux and s390-linux in
terms of inline frame unwinding. There may be other targets, but these are
the ones i'm aware of.

The following testcases started to show numerous failures and trigger internal
errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,
"Find tailcall frames before inline frames".

gdb.opt/inline-break.exp
gdb.opt/inline-cmds.exp
gdb.python/py-frame-inline.exp
gdb.reverse/insn-reverse.exp

The internal errors were of this kind:

binutils-gdb/gdb/frame.c:579: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

After a lengthy investigation to try and find the cause of these assertions,
it seems we're dealing with some fragile/poorly documented code to handle inline
frames and we are attempting to unwind from this fragile section of code.

Before commit 1009d92fc621bc4d017029b90a5bfab16e17fde5, the tailcall sniffer
was invoked from dwarf2_frame_prev_register. By the time we invoke the
dwarf2_frame_prev_register function, we've probably already calculated the
frame id (via compute_frame_id).

After said commit, the call to dwarf2_tailcall_sniffer_first was moved to
dwarf2_frame_cache. This is very early in a frame creation process, and
we're still calculating the frame ID (so compute_frame_id is in the call
stack).

This would be fine for regular frames, but the above testcases all deal
with some inline frames.

The particularity of inline frames is that their frame ID's depend on
the previous frame's ID, and the previous frame's ID relies in the inline
frame's registers. So it is a bit of a messy situation.

We have comments in various parts of the code warning about some of these
particularities.

In the case of dwarf2_tailcall_sniffer_first, we attempt to unwind the PC,
which goes through various functions until we eventually invoke
frame_unwind_got_register. This function will eventually attempt to create
a lazy value for a particular register, and this lazy value will require
a valid frame ID.  Since the inline frame doesn't have a valid frame ID
yet (remember we're still calculating the previous frame's ID so we can tell
what the inline frame ID is) we will call compute_frame_id for the inline
frame (level 0).

We'll eventually hit the assertion above, inside get_frame_id:

--
      /* If we haven't computed the frame id yet, then it must be that
         this is the current frame.  Compute it now, and stash the
         result.  The IDs of other frames are computed as soon as
         they're created, in order to detect cycles.  See
         get_prev_frame_if_no_cycle.  */
      gdb_assert (fi->level == 0);
--

It seems to me we shouldn't have reached this assertion without having the
inline frame ID already calculated. In fact, it seems we even start recursing
a bit when we invoke get_prev_frame_always within inline_frame_this_id. But
a check makes us quit the recursion and proceed to compute the id.

Here's the call stack for context:

#0  get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2109
RECURSION - #1  0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124
#2  0x0000aaaaaae95768 in inline_frame_this_id (this_frame=0xaaaaab85a670, this_cache=0xaaaaab85a688, this_id=0xaaaaab85a6d0)
    at ../../../repos/binutils-gdb/gdb/inline-frame.c:165
#3  0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:550
#4  0x0000aaaaaae19318 in get_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:582
#5  0x0000aaaaaae13480 in value_of_register_lazy (frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/findvar.c:296
#6  0x0000aaaaaae16c00 in frame_unwind_got_register (frame=0xaaaaab85a730, regnum=30, new_regnum=30) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:268
#7  0x0000aaaaaad52604 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=30)
    at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1296
#8  0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1229
#9  0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1320
#10 0x0000aaaaaab76574 in aarch64_dwarf2_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)
    at ../../../repos/binutils-gdb/gdb/aarch64-tdep.c:1114
#11 0x0000aaaaaad52724 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)
    at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1316
#12 0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1229
#13 0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1320
#14 0x0000aaaaaae16a84 in default_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:223
#15 0x0000aaaaaae32124 in gdbarch_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/gdbarch.c:3074
#16 0x0000aaaaaad4f15c in dwarf2_tailcall_sniffer_first (this_frame=0xaaaaab85a730, tailcall_cachep=0xaaaaab85a830, entry_cfa_sp_offsetp=0x0)
    at ../../../repos/binutils-gdb/gdb/dwarf2/frame-tailcall.c:388
#17 0x0000aaaaaad520c0 in dwarf2_frame_cache (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748) at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1190
#18 0x0000aaaaaad52204 in dwarf2_frame_this_id (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, this_id=0xaaaaab85a790)
    at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1218
#19 0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame.c:550
#20 0x0000aaaaaae1c958 in get_prev_frame_if_no_cycle (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:1927
#21 0x0000aaaaaae1cc44 in get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2006
FIRST CALL - #22 0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124
#23 0x0000aaaaaae18f68 in skip_artificial_frames (frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:495
#24 0x0000aaaaaae193e8 in get_stack_frame_id (next_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:596
#25 0x0000aaaaaae87a54 in process_event_stop_test (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6857
#26 0x0000aaaaaae86bdc in handle_signal_stop (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6381
#27 0x0000aaaaaae84fd0 in handle_inferior_event (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:5578
#28 0x0000aaaaaae81588 in fetch_inferior_event (client_data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:4020
#29 0x0000aaaaaae5f7fc in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at ../../../repos/binutils-gdb/gdb/inf-loop.c:43
#30 0x0000aaaaaae8d768 in infrun_async_inferior_event_handler (data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:9377
#31 0x0000aaaaaabff970 in check_async_event_handlers () at ../../../repos/binutils-gdb/gdb/async-event.c:291
#32 0x0000aaaaab27cbec in gdb_do_one_event () at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:194
#33 0x0000aaaaaaef1894 in start_event_loop () at ../../../repos/binutils-gdb/gdb/main.c:356
#34 0x0000aaaaaaef1a04 in captured_command_loop () at ../../../repos/binutils-gdb/gdb/main.c:416
#35 0x0000aaaaaaef3338 in captured_main (data=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1254
#36 0x0000aaaaaaef33a0 in gdb_main (args=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1269
#37 0x0000aaaaaab6e0dc in main (argc=6, argv=0xfffffffff348) at ../../../repos/binutils-gdb/gdb/gdb.c:32

The following patch addresses this by using a function that unwinds the PC
from the next (inline) frame directly as opposed to creating a lazy value
that is bound to the next frame's ID (still not computed).

I've validated this for aarch64-linux and x86_64-linux by running the
testsuite.

Tromey, would you mind checking if this suits your problematic core file
tailcall scenario?

gdb/ChangeLog:

2020-04-14  Luis Machado  <luis.machado@linaro.org>

	* dwarf2/frame-tailcall.c (dwarf2_tailcall_sniffer_first): Use
	get_frame_register instead of gdbarch_unwind_pc.
---
 gdb/dwarf2/frame-tailcall.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
2.17.1

Comments

Tom Tromey April 16, 2020, 9:15 p.m. | #1
>>>>> "Luis" == Luis Machado <luis.machado@linaro.org> writes:


Luis> Tromey, would you mind checking if this suits your problematic core file
Luis> tailcall scenario?

I tried this patch locally and it continues to work.
Thank you.

Tom
Andrew Burgess April 22, 2020, 9:37 a.m. | #2
* Luis Machado via Gdb-patches <gdb-patches@sourceware.org> [2020-04-14 18:38:36 -0300]:

> *** re-sending due to the poor choice of characters for the backtrace

> annotations. GIT swallowed parts of it.

> 

> There has been some breakage for aarch64-linux, arm-linux and s390-linux in

> terms of inline frame unwinding. There may be other targets, but these are

> the ones i'm aware of.

> 

> The following testcases started to show numerous failures and trigger internal

> errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

> "Find tailcall frames before inline frames".

> 

> gdb.opt/inline-break.exp

> gdb.opt/inline-cmds.exp

> gdb.python/py-frame-inline.exp

> gdb.reverse/insn-reverse.exp

> 

> The internal errors were of this kind:

> 

> binutils-gdb/gdb/frame.c:579: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->level == 0' failed.


I have also started seeing this assert on RISC-V, and your patch
resolves this issue for me, so I'm keen to see this merged.

I took a look through and it all looks good to me - is there anything
holding this back from being merged?

Thanks,
Andrew

> 

> After a lengthy investigation to try and find the cause of these assertions,

> it seems we're dealing with some fragile/poorly documented code to handle inline

> frames and we are attempting to unwind from this fragile section of code.

> 

> Before commit 1009d92fc621bc4d017029b90a5bfab16e17fde5, the tailcall sniffer

> was invoked from dwarf2_frame_prev_register. By the time we invoke the

> dwarf2_frame_prev_register function, we've probably already calculated the

> frame id (via compute_frame_id).

> 

> After said commit, the call to dwarf2_tailcall_sniffer_first was moved to

> dwarf2_frame_cache. This is very early in a frame creation process, and

> we're still calculating the frame ID (so compute_frame_id is in the call

> stack).

> 

> This would be fine for regular frames, but the above testcases all deal

> with some inline frames.

> 

> The particularity of inline frames is that their frame ID's depend on

> the previous frame's ID, and the previous frame's ID relies in the inline

> frame's registers. So it is a bit of a messy situation.

> 

> We have comments in various parts of the code warning about some of these

> particularities.

> 

> In the case of dwarf2_tailcall_sniffer_first, we attempt to unwind the PC,

> which goes through various functions until we eventually invoke

> frame_unwind_got_register. This function will eventually attempt to create

> a lazy value for a particular register, and this lazy value will require

> a valid frame ID.  Since the inline frame doesn't have a valid frame ID

> yet (remember we're still calculating the previous frame's ID so we can tell

> what the inline frame ID is) we will call compute_frame_id for the inline

> frame (level 0).

> 

> We'll eventually hit the assertion above, inside get_frame_id:

> 

> --

>       /* If we haven't computed the frame id yet, then it must be that

>          this is the current frame.  Compute it now, and stash the

>          result.  The IDs of other frames are computed as soon as

>          they're created, in order to detect cycles.  See

>          get_prev_frame_if_no_cycle.  */

>       gdb_assert (fi->level == 0);

> --

> 

> It seems to me we shouldn't have reached this assertion without having the

> inline frame ID already calculated. In fact, it seems we even start recursing

> a bit when we invoke get_prev_frame_always within inline_frame_this_id. But

> a check makes us quit the recursion and proceed to compute the id.

> 

> Here's the call stack for context:

> 

> #0  get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2109

> RECURSION - #1  0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

> #2  0x0000aaaaaae95768 in inline_frame_this_id (this_frame=0xaaaaab85a670, this_cache=0xaaaaab85a688, this_id=0xaaaaab85a6d0)

>     at ../../../repos/binutils-gdb/gdb/inline-frame.c:165

> #3  0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:550

> #4  0x0000aaaaaae19318 in get_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:582

> #5  0x0000aaaaaae13480 in value_of_register_lazy (frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/findvar.c:296

> #6  0x0000aaaaaae16c00 in frame_unwind_got_register (frame=0xaaaaab85a730, regnum=30, new_regnum=30) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:268

> #7  0x0000aaaaaad52604 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=30)

>     at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1296

> #8  0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1229

> #9  0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1320

> #10 0x0000aaaaaab76574 in aarch64_dwarf2_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

>     at ../../../repos/binutils-gdb/gdb/aarch64-tdep.c:1114

> #11 0x0000aaaaaad52724 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

>     at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1316

> #12 0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1229

> #13 0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1320

> #14 0x0000aaaaaae16a84 in default_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:223

> #15 0x0000aaaaaae32124 in gdbarch_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/gdbarch.c:3074

> #16 0x0000aaaaaad4f15c in dwarf2_tailcall_sniffer_first (this_frame=0xaaaaab85a730, tailcall_cachep=0xaaaaab85a830, entry_cfa_sp_offsetp=0x0)

>     at ../../../repos/binutils-gdb/gdb/dwarf2/frame-tailcall.c:388

> #17 0x0000aaaaaad520c0 in dwarf2_frame_cache (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748) at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1190

> #18 0x0000aaaaaad52204 in dwarf2_frame_this_id (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, this_id=0xaaaaab85a790)

>     at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1218

> #19 0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame.c:550

> #20 0x0000aaaaaae1c958 in get_prev_frame_if_no_cycle (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:1927

> #21 0x0000aaaaaae1cc44 in get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2006

> FIRST CALL - #22 0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

> #23 0x0000aaaaaae18f68 in skip_artificial_frames (frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:495

> #24 0x0000aaaaaae193e8 in get_stack_frame_id (next_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:596

> #25 0x0000aaaaaae87a54 in process_event_stop_test (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6857

> #26 0x0000aaaaaae86bdc in handle_signal_stop (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6381

> #27 0x0000aaaaaae84fd0 in handle_inferior_event (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:5578

> #28 0x0000aaaaaae81588 in fetch_inferior_event (client_data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:4020

> #29 0x0000aaaaaae5f7fc in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at ../../../repos/binutils-gdb/gdb/inf-loop.c:43

> #30 0x0000aaaaaae8d768 in infrun_async_inferior_event_handler (data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:9377

> #31 0x0000aaaaaabff970 in check_async_event_handlers () at ../../../repos/binutils-gdb/gdb/async-event.c:291

> #32 0x0000aaaaab27cbec in gdb_do_one_event () at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:194

> #33 0x0000aaaaaaef1894 in start_event_loop () at ../../../repos/binutils-gdb/gdb/main.c:356

> #34 0x0000aaaaaaef1a04 in captured_command_loop () at ../../../repos/binutils-gdb/gdb/main.c:416

> #35 0x0000aaaaaaef3338 in captured_main (data=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1254

> #36 0x0000aaaaaaef33a0 in gdb_main (args=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1269

> #37 0x0000aaaaaab6e0dc in main (argc=6, argv=0xfffffffff348) at ../../../repos/binutils-gdb/gdb/gdb.c:32

> 

> The following patch addresses this by using a function that unwinds the PC

> from the next (inline) frame directly as opposed to creating a lazy value

> that is bound to the next frame's ID (still not computed).

> 

> I've validated this for aarch64-linux and x86_64-linux by running the

> testsuite.

> 

> Tromey, would you mind checking if this suits your problematic core file

> tailcall scenario?

> 

> gdb/ChangeLog:

> 

> 2020-04-14  Luis Machado  <luis.machado@linaro.org>

> 

> 	* dwarf2/frame-tailcall.c (dwarf2_tailcall_sniffer_first): Use

> 	get_frame_register instead of gdbarch_unwind_pc.

> ---

>  gdb/dwarf2/frame-tailcall.c | 4 +++-

>  1 file changed, 3 insertions(+), 1 deletion(-)

> 

> diff --git a/gdb/dwarf2/frame-tailcall.c b/gdb/dwarf2/frame-tailcall.c

> index 2d219f13f9..01bb134a5c 100644

> --- a/gdb/dwarf2/frame-tailcall.c

> +++ b/gdb/dwarf2/frame-tailcall.c

> @@ -385,7 +385,9 @@ dwarf2_tailcall_sniffer_first (struct frame_info *this_frame,

>        prev_gdbarch = frame_unwind_arch (this_frame);

>  

>        /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */

> -      prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

> +      get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),

> +			  (gdb_byte *) &prev_pc);

> +      prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);

>  

>        /* call_site_find_chain can throw an exception.  */

>        chain = call_site_find_chain (prev_gdbarch, prev_pc, this_pc);

> -- 

> 2.17.1

>
Rogerio Alves via Gdb-patches April 22, 2020, 11:22 a.m. | #3
Hi Andrew,

On 4/22/20 6:37 AM, Andrew Burgess wrote:
> * Luis Machado via Gdb-patches <gdb-patches@sourceware.org> [2020-04-14 18:38:36 -0300]:

> 

>> *** re-sending due to the poor choice of characters for the backtrace

>> annotations. GIT swallowed parts of it.

>>

>> There has been some breakage for aarch64-linux, arm-linux and s390-linux in

>> terms of inline frame unwinding. There may be other targets, but these are

>> the ones i'm aware of.

>>

>> The following testcases started to show numerous failures and trigger internal

>> errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

>> "Find tailcall frames before inline frames".

>>

>> gdb.opt/inline-break.exp

>> gdb.opt/inline-cmds.exp

>> gdb.python/py-frame-inline.exp

>> gdb.reverse/insn-reverse.exp

>>

>> The internal errors were of this kind:

>>

>> binutils-gdb/gdb/frame.c:579: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

> 

> I have also started seeing this assert on RISC-V, and your patch

> resolves this issue for me, so I'm keen to see this merged.


Great.

> 

> I took a look through and it all looks good to me - is there anything

> holding this back from being merged?


Not really. I was waiting for an OK before pushing it.

> 

> Thanks,

> Andrew
Rogerio Alves via Gdb-patches April 23, 2020, 5:51 p.m. | #4
On 4/22/20 8:22 AM, Luis Machado wrote:
> Hi Andrew,

> 

> On 4/22/20 6:37 AM, Andrew Burgess wrote:

>> * Luis Machado via Gdb-patches <gdb-patches@sourceware.org> 

>> [2020-04-14 18:38:36 -0300]:

>>

>>> *** re-sending due to the poor choice of characters for the backtrace

>>> annotations. GIT swallowed parts of it.

>>>

>>> There has been some breakage for aarch64-linux, arm-linux and 

>>> s390-linux in

>>> terms of inline frame unwinding. There may be other targets, but 

>>> these are

>>> the ones i'm aware of.

>>>

>>> The following testcases started to show numerous failures and trigger 

>>> internal

>>> errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

>>> "Find tailcall frames before inline frames".

>>>

>>> gdb.opt/inline-break.exp

>>> gdb.opt/inline-cmds.exp

>>> gdb.python/py-frame-inline.exp

>>> gdb.reverse/insn-reverse.exp

>>>

>>> The internal errors were of this kind:

>>>

>>> binutils-gdb/gdb/frame.c:579: internal-error: frame_id 

>>> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

>>

>> I have also started seeing this assert on RISC-V, and your patch

>> resolves this issue for me, so I'm keen to see this merged.

> 

> Great.

> 

>>

>> I took a look through and it all looks good to me - is there anything

>> holding this back from being merged?

> 

> Not really. I was waiting for an OK before pushing it.

> 

>>

>> Thanks,

>> Andrew


I've pushed this now. Tromey and Andrew OK-ed it on IRC.
Tom de Vries April 24, 2020, 9:17 a.m. | #5
On 23-04-2020 19:51, Luis Machado via Gdb-patches wrote:
> On 4/22/20 8:22 AM, Luis Machado wrote:

>> Hi Andrew,

>>

>> On 4/22/20 6:37 AM, Andrew Burgess wrote:

>>> * Luis Machado via Gdb-patches <gdb-patches@sourceware.org>

>>> [2020-04-14 18:38:36 -0300]:

>>>

>>>> *** re-sending due to the poor choice of characters for the backtrace

>>>> annotations. GIT swallowed parts of it.

>>>>

>>>> There has been some breakage for aarch64-linux, arm-linux and

>>>> s390-linux in

>>>> terms of inline frame unwinding. There may be other targets, but

>>>> these are

>>>> the ones i'm aware of.

>>>>

>>>> The following testcases started to show numerous failures and

>>>> trigger internal

>>>> errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

>>>> "Find tailcall frames before inline frames".

>>>>

>>>> gdb.opt/inline-break.exp

>>>> gdb.opt/inline-cmds.exp

>>>> gdb.python/py-frame-inline.exp

>>>> gdb.reverse/insn-reverse.exp

>>>>

>>>> The internal errors were of this kind:

>>>>

>>>> binutils-gdb/gdb/frame.c:579: internal-error: frame_id

>>>> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

>>>

>>> I have also started seeing this assert on RISC-V, and your patch

>>> resolves this issue for me, so I'm keen to see this merged.

>>

>> Great.

>>

>>>

>>> I took a look through and it all looks good to me - is there anything

>>> holding this back from being merged?

>>

>> Not really. I was waiting for an OK before pushing it.

>>

>>>

>>> Thanks,

>>> Andrew

> 

> I've pushed this now. Tromey and Andrew OK-ed it on IRC.


This causes at least:
...
FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt
FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i
FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i@entry
FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j
FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j@entry
FAIL: gdb.arch/amd64-entry-value.exp: p $sp0 == $sp
FAIL: gdb.arch/amd64-entry-value.exp: frame 3
FAIL: gdb.arch/amd64-entry-value.exp: down
FAIL: gdb.arch/amd64-entry-value.exp: disassemble
FAIL: gdb.arch/amd64-entry-value.exp: ambiguous: bt
FAIL: gdb.arch/amd64-entry-value.exp: self: bt
FAIL: gdb.arch/amd64-entry-value.exp: self: bt debug entry-values
FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt
FAIL: gdb.arch/amd64-tailcall-noret.exp: bt
FAIL: gdb.arch/amd64-tailcall-self.exp: bt
...

Looking at the first FAIL, before this commit we have:
...
(gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:
tailcall: breakhere
bt^M
#0  d (i=71, i@entry=70, j=73.5, j@entry=72.5) at
gdb.arch/amd64-entry-value.cc:34^M
#1  0x00000000004006af in c (i=i@entry=7, j=j@entry=7.25) at
gdb.arch/amd64-entry-value.cc:47^M
#2  0x00000000004006cd in b (i=i@entry=5, j=j@entry=5.25) at
gdb.arch/amd64-entry-value.cc:59^M
#3  0x0000000000400524 in main () at gdb.arch/amd64-entry-value.cc:229^M
(gdb) PASS: gdb.arch/amd64-entry-value.exp: tailcall: bt
...
which has now degraded into:
...
(gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:
tailcall: breakhere
bt^M
#0  d (i=<optimized out>, j=<optimized out>) at
gdb.arch/amd64-entry-value.cc:34^M
#1  0x0000000000400524 in main () at gdb.arch/amd64-entry-value.cc:229^M
(gdb) FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt
...

Thanks,
- Tom
Rogerio Alves via Gdb-patches April 24, 2020, 10:02 a.m. | #6
On 4/24/20 6:17 AM, Tom de Vries wrote:
> On 23-04-2020 19:51, Luis Machado via Gdb-patches wrote:

>> On 4/22/20 8:22 AM, Luis Machado wrote:

>>> Hi Andrew,

>>>

>>> On 4/22/20 6:37 AM, Andrew Burgess wrote:

>>>> * Luis Machado via Gdb-patches <gdb-patches@sourceware.org>

>>>> [2020-04-14 18:38:36 -0300]:

>>>>

>>>>> *** re-sending due to the poor choice of characters for the backtrace

>>>>> annotations. GIT swallowed parts of it.

>>>>>

>>>>> There has been some breakage for aarch64-linux, arm-linux and

>>>>> s390-linux in

>>>>> terms of inline frame unwinding. There may be other targets, but

>>>>> these are

>>>>> the ones i'm aware of.

>>>>>

>>>>> The following testcases started to show numerous failures and

>>>>> trigger internal

>>>>> errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

>>>>> "Find tailcall frames before inline frames".

>>>>>

>>>>> gdb.opt/inline-break.exp

>>>>> gdb.opt/inline-cmds.exp

>>>>> gdb.python/py-frame-inline.exp

>>>>> gdb.reverse/insn-reverse.exp

>>>>>

>>>>> The internal errors were of this kind:

>>>>>

>>>>> binutils-gdb/gdb/frame.c:579: internal-error: frame_id

>>>>> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

>>>>

>>>> I have also started seeing this assert on RISC-V, and your patch

>>>> resolves this issue for me, so I'm keen to see this merged.

>>>

>>> Great.

>>>

>>>>

>>>> I took a look through and it all looks good to me - is there anything

>>>> holding this back from being merged?

>>>

>>> Not really. I was waiting for an OK before pushing it.

>>>

>>>>

>>>> Thanks,

>>>> Andrew

>>

>> I've pushed this now. Tromey and Andrew OK-ed it on IRC.

> 

> This causes at least:

> ...

> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i

> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i@entry

> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j

> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j@entry

> FAIL: gdb.arch/amd64-entry-value.exp: p $sp0 == $sp

> FAIL: gdb.arch/amd64-entry-value.exp: frame 3

> FAIL: gdb.arch/amd64-entry-value.exp: down

> FAIL: gdb.arch/amd64-entry-value.exp: disassemble

> FAIL: gdb.arch/amd64-entry-value.exp: ambiguous: bt

> FAIL: gdb.arch/amd64-entry-value.exp: self: bt

> FAIL: gdb.arch/amd64-entry-value.exp: self: bt debug entry-values

> FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt

> FAIL: gdb.arch/amd64-tailcall-noret.exp: bt

> FAIL: gdb.arch/amd64-tailcall-self.exp: bt

> ...

> 

> Looking at the first FAIL, before this commit we have:

> ...

> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

> tailcall: breakhere

> bt^M

> #0  d (i=71, i@entry=70, j=73.5, j@entry=72.5) at

> gdb.arch/amd64-entry-value.cc:34^M

> #1  0x00000000004006af in c (i=i@entry=7, j=j@entry=7.25) at

> gdb.arch/amd64-entry-value.cc:47^M

> #2  0x00000000004006cd in b (i=i@entry=5, j=j@entry=5.25) at

> gdb.arch/amd64-entry-value.cc:59^M

> #3  0x0000000000400524 in main () at gdb.arch/amd64-entry-value.cc:229^M

> (gdb) PASS: gdb.arch/amd64-entry-value.exp: tailcall: bt

> ...

> which has now degraded into:

> ...

> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

> tailcall: breakhere

> bt^M

> #0  d (i=<optimized out>, j=<optimized out>) at

> gdb.arch/amd64-entry-value.cc:34^M

> #1  0x0000000000400524 in main () at gdb.arch/amd64-entry-value.cc:229^M

> (gdb) FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

> ...

> 

> Thanks,

> - Tom

> 


I'll take a look at it.
Rogerio Alves via Gdb-patches April 24, 2020, 10:58 a.m. | #7
On 4/24/20 7:02 AM, Luis Machado wrote:
> On 4/24/20 6:17 AM, Tom de Vries wrote:

>> On 23-04-2020 19:51, Luis Machado via Gdb-patches wrote:

>>> On 4/22/20 8:22 AM, Luis Machado wrote:

>>>> Hi Andrew,

>>>>

>>>> On 4/22/20 6:37 AM, Andrew Burgess wrote:

>>>>> * Luis Machado via Gdb-patches <gdb-patches@sourceware.org>

>>>>> [2020-04-14 18:38:36 -0300]:

>>>>>

>>>>>> *** re-sending due to the poor choice of characters for the backtrace

>>>>>> annotations. GIT swallowed parts of it.

>>>>>>

>>>>>> There has been some breakage for aarch64-linux, arm-linux and

>>>>>> s390-linux in

>>>>>> terms of inline frame unwinding. There may be other targets, but

>>>>>> these are

>>>>>> the ones i'm aware of.

>>>>>>

>>>>>> The following testcases started to show numerous failures and

>>>>>> trigger internal

>>>>>> errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

>>>>>> "Find tailcall frames before inline frames".

>>>>>>

>>>>>> gdb.opt/inline-break.exp

>>>>>> gdb.opt/inline-cmds.exp

>>>>>> gdb.python/py-frame-inline.exp

>>>>>> gdb.reverse/insn-reverse.exp

>>>>>>

>>>>>> The internal errors were of this kind:

>>>>>>

>>>>>> binutils-gdb/gdb/frame.c:579: internal-error: frame_id

>>>>>> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

>>>>>

>>>>> I have also started seeing this assert on RISC-V, and your patch

>>>>> resolves this issue for me, so I'm keen to see this merged.

>>>>

>>>> Great.

>>>>

>>>>>

>>>>> I took a look through and it all looks good to me - is there anything

>>>>> holding this back from being merged?

>>>>

>>>> Not really. I was waiting for an OK before pushing it.

>>>>

>>>>>

>>>>> Thanks,

>>>>> Andrew

>>>

>>> I've pushed this now. Tromey and Andrew OK-ed it on IRC.

>>

>> This causes at least:

>> ...

>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i

>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i@entry

>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j

>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j@entry

>> FAIL: gdb.arch/amd64-entry-value.exp: p $sp0 == $sp

>> FAIL: gdb.arch/amd64-entry-value.exp: frame 3

>> FAIL: gdb.arch/amd64-entry-value.exp: down

>> FAIL: gdb.arch/amd64-entry-value.exp: disassemble

>> FAIL: gdb.arch/amd64-entry-value.exp: ambiguous: bt

>> FAIL: gdb.arch/amd64-entry-value.exp: self: bt

>> FAIL: gdb.arch/amd64-entry-value.exp: self: bt debug entry-values

>> FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt

>> FAIL: gdb.arch/amd64-tailcall-noret.exp: bt

>> FAIL: gdb.arch/amd64-tailcall-self.exp: bt

>> ...

>>

>> Looking at the first FAIL, before this commit we have:

>> ...

>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

>> tailcall: breakhere

>> bt^M

>> #0  d (i=71, i@entry=70, j=73.5, j@entry=72.5) at

>> gdb.arch/amd64-entry-value.cc:34^M

>> #1  0x00000000004006af in c (i=i@entry=7, j=j@entry=7.25) at

>> gdb.arch/amd64-entry-value.cc:47^M

>> #2  0x00000000004006cd in b (i=i@entry=5, j=j@entry=5.25) at

>> gdb.arch/amd64-entry-value.cc:59^M

>> #3  0x0000000000400524 in main () at gdb.arch/amd64-entry-value.cc:229^M

>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: tailcall: bt

>> ...

>> which has now degraded into:

>> ...

>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

>> tailcall: breakhere

>> bt^M

>> #0  d (i=<optimized out>, j=<optimized out>) at

>> gdb.arch/amd64-entry-value.cc:34^M

>> #1  0x0000000000400524 in main () at gdb.arch/amd64-entry-value.cc:229^M

>> (gdb) FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

>> ...

>>

>> Thanks,

>> - Tom

>>

> 

> I'll take a look at it.


Just a quick update... I did a before/after run and the only regression 
seems to be from gdb.arch/amd64-entry-value.exp.

The other failures are still there even after reverting the inline frame 
unwinding fix.

I'll check what's up with the regressed test.

Could you please confirm this when you have some cycles?
Tom de Vries April 24, 2020, 11:08 a.m. | #8
On 24-04-2020 12:58, Luis Machado wrote:
> On 4/24/20 7:02 AM, Luis Machado wrote:

>> On 4/24/20 6:17 AM, Tom de Vries wrote:

>>> On 23-04-2020 19:51, Luis Machado via Gdb-patches wrote:

>>>> On 4/22/20 8:22 AM, Luis Machado wrote:

>>>>> Hi Andrew,

>>>>>

>>>>> On 4/22/20 6:37 AM, Andrew Burgess wrote:

>>>>>> * Luis Machado via Gdb-patches <gdb-patches@sourceware.org>

>>>>>> [2020-04-14 18:38:36 -0300]:

>>>>>>

>>>>>>> *** re-sending due to the poor choice of characters for the

>>>>>>> backtrace

>>>>>>> annotations. GIT swallowed parts of it.

>>>>>>>

>>>>>>> There has been some breakage for aarch64-linux, arm-linux and

>>>>>>> s390-linux in

>>>>>>> terms of inline frame unwinding. There may be other targets, but

>>>>>>> these are

>>>>>>> the ones i'm aware of.

>>>>>>>

>>>>>>> The following testcases started to show numerous failures and

>>>>>>> trigger internal

>>>>>>> errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

>>>>>>> "Find tailcall frames before inline frames".

>>>>>>>

>>>>>>> gdb.opt/inline-break.exp

>>>>>>> gdb.opt/inline-cmds.exp

>>>>>>> gdb.python/py-frame-inline.exp

>>>>>>> gdb.reverse/insn-reverse.exp

>>>>>>>

>>>>>>> The internal errors were of this kind:

>>>>>>>

>>>>>>> binutils-gdb/gdb/frame.c:579: internal-error: frame_id

>>>>>>> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

>>>>>>

>>>>>> I have also started seeing this assert on RISC-V, and your patch

>>>>>> resolves this issue for me, so I'm keen to see this merged.

>>>>>

>>>>> Great.

>>>>>

>>>>>>

>>>>>> I took a look through and it all looks good to me - is there anything

>>>>>> holding this back from being merged?

>>>>>

>>>>> Not really. I was waiting for an OK before pushing it.

>>>>>

>>>>>>

>>>>>> Thanks,

>>>>>> Andrew

>>>>

>>>> I've pushed this now. Tromey and Andrew OK-ed it on IRC.

>>>

>>> This causes at least:

>>> ...

>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i

>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i@entry

>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j

>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j@entry

>>> FAIL: gdb.arch/amd64-entry-value.exp: p $sp0 == $sp

>>> FAIL: gdb.arch/amd64-entry-value.exp: frame 3

>>> FAIL: gdb.arch/amd64-entry-value.exp: down

>>> FAIL: gdb.arch/amd64-entry-value.exp: disassemble

>>> FAIL: gdb.arch/amd64-entry-value.exp: ambiguous: bt

>>> FAIL: gdb.arch/amd64-entry-value.exp: self: bt

>>> FAIL: gdb.arch/amd64-entry-value.exp: self: bt debug entry-values

>>> FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt

>>> FAIL: gdb.arch/amd64-tailcall-noret.exp: bt

>>> FAIL: gdb.arch/amd64-tailcall-self.exp: bt

>>> ...

>>>

>>> Looking at the first FAIL, before this commit we have:

>>> ...

>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

>>> tailcall: breakhere

>>> bt^M

>>> #0  d (i=71, i@entry=70, j=73.5, j@entry=72.5) at

>>> gdb.arch/amd64-entry-value.cc:34^M

>>> #1  0x00000000004006af in c (i=i@entry=7, j=j@entry=7.25) at

>>> gdb.arch/amd64-entry-value.cc:47^M

>>> #2  0x00000000004006cd in b (i=i@entry=5, j=j@entry=5.25) at

>>> gdb.arch/amd64-entry-value.cc:59^M

>>> #3  0x0000000000400524 in main () at gdb.arch/amd64-entry-value.cc:229^M

>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>> ...

>>> which has now degraded into:

>>> ...

>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

>>> tailcall: breakhere

>>> bt^M

>>> #0  d (i=<optimized out>, j=<optimized out>) at

>>> gdb.arch/amd64-entry-value.cc:34^M

>>> #1  0x0000000000400524 in main () at gdb.arch/amd64-entry-value.cc:229^M

>>> (gdb) FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>> ...

>>>

>>> Thanks,

>>> - Tom

>>>

>>

>> I'll take a look at it.

> 

> Just a quick update... I did a before/after run and the only regression

> seems to be from gdb.arch/amd64-entry-value.exp.

> 

> The other failures are still there even after reverting the inline frame

> unwinding fix.

> 

> I'll check what's up with the regressed test.

> 

> Could you please confirm this when you have some cycles?


Hi,

I cannot confirm this.  All these FAILs fail with the patch, and pass
with the patch reverted.

Looking at amd64-tailcall-cxx.exp, we have normally:
...
(gdb) bt^M
#0  g (x=x@entry=2) at gdb.arch/amd64-tailcall-cxx1.cc:23^M
#1  0x00000000004004e8 in f (x=x@entry=1) at
gdb.arch/amd64-tailcall-cxx2.cc:23^M
#2  0x00000000004003de in main () at gdb.arch/amd64-tailcall-cxx1.cc:31^M
(gdb) PASS: gdb.arch/amd64-tailcall-cxx.exp: bt
...
and with the patch:
...
(gdb) bt^M
#0  g (x=2) at gdb.arch/amd64-tailcall-cxx1.cc:23^M
#1  0x00000000004003de in main () at gdb.arch/amd64-tailcall-cxx1.cc:31^M
(gdb) FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt
...

So, I'd say it looks very similar to the issue in
gdb.arch/amd64-entry-value.exp.

Thanks,
- Tom
Rogerio Alves via Gdb-patches April 24, 2020, 11:37 a.m. | #9
On 4/24/20 8:08 AM, Tom de Vries wrote:
> On 24-04-2020 12:58, Luis Machado wrote:

>> On 4/24/20 7:02 AM, Luis Machado wrote:

>>> On 4/24/20 6:17 AM, Tom de Vries wrote:

>>>> On 23-04-2020 19:51, Luis Machado via Gdb-patches wrote:

>>>>> On 4/22/20 8:22 AM, Luis Machado wrote:

>>>>>> Hi Andrew,

>>>>>>

>>>>>> On 4/22/20 6:37 AM, Andrew Burgess wrote:

>>>>>>> * Luis Machado via Gdb-patches <gdb-patches@sourceware.org>

>>>>>>> [2020-04-14 18:38:36 -0300]:

>>>>>>>

>>>>>>>> *** re-sending due to the poor choice of characters for the

>>>>>>>> backtrace

>>>>>>>> annotations. GIT swallowed parts of it.

>>>>>>>>

>>>>>>>> There has been some breakage for aarch64-linux, arm-linux and

>>>>>>>> s390-linux in

>>>>>>>> terms of inline frame unwinding. There may be other targets, but

>>>>>>>> these are

>>>>>>>> the ones i'm aware of.

>>>>>>>>

>>>>>>>> The following testcases started to show numerous failures and

>>>>>>>> trigger internal

>>>>>>>> errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

>>>>>>>> "Find tailcall frames before inline frames".

>>>>>>>>

>>>>>>>> gdb.opt/inline-break.exp

>>>>>>>> gdb.opt/inline-cmds.exp

>>>>>>>> gdb.python/py-frame-inline.exp

>>>>>>>> gdb.reverse/insn-reverse.exp

>>>>>>>>

>>>>>>>> The internal errors were of this kind:

>>>>>>>>

>>>>>>>> binutils-gdb/gdb/frame.c:579: internal-error: frame_id

>>>>>>>> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

>>>>>>>

>>>>>>> I have also started seeing this assert on RISC-V, and your patch

>>>>>>> resolves this issue for me, so I'm keen to see this merged.

>>>>>>

>>>>>> Great.

>>>>>>

>>>>>>>

>>>>>>> I took a look through and it all looks good to me - is there anything

>>>>>>> holding this back from being merged?

>>>>>>

>>>>>> Not really. I was waiting for an OK before pushing it.

>>>>>>

>>>>>>>

>>>>>>> Thanks,

>>>>>>> Andrew

>>>>>

>>>>> I've pushed this now. Tromey and Andrew OK-ed it on IRC.

>>>>

>>>> This causes at least:

>>>> ...

>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i

>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i@entry

>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j

>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j@entry

>>>> FAIL: gdb.arch/amd64-entry-value.exp: p $sp0 == $sp

>>>> FAIL: gdb.arch/amd64-entry-value.exp: frame 3

>>>> FAIL: gdb.arch/amd64-entry-value.exp: down

>>>> FAIL: gdb.arch/amd64-entry-value.exp: disassemble

>>>> FAIL: gdb.arch/amd64-entry-value.exp: ambiguous: bt

>>>> FAIL: gdb.arch/amd64-entry-value.exp: self: bt

>>>> FAIL: gdb.arch/amd64-entry-value.exp: self: bt debug entry-values

>>>> FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt

>>>> FAIL: gdb.arch/amd64-tailcall-noret.exp: bt

>>>> FAIL: gdb.arch/amd64-tailcall-self.exp: bt

>>>> ...

>>>>

>>>> Looking at the first FAIL, before this commit we have:

>>>> ...

>>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

>>>> tailcall: breakhere

>>>> bt^M

>>>> #0  d (i=71, i@entry=70, j=73.5, j@entry=72.5) at

>>>> gdb.arch/amd64-entry-value.cc:34^M

>>>> #1  0x00000000004006af in c (i=i@entry=7, j=j@entry=7.25) at

>>>> gdb.arch/amd64-entry-value.cc:47^M

>>>> #2  0x00000000004006cd in b (i=i@entry=5, j=j@entry=5.25) at

>>>> gdb.arch/amd64-entry-value.cc:59^M

>>>> #3  0x0000000000400524 in main () at gdb.arch/amd64-entry-value.cc:229^M

>>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>>> ...

>>>> which has now degraded into:

>>>> ...

>>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

>>>> tailcall: breakhere

>>>> bt^M

>>>> #0  d (i=<optimized out>, j=<optimized out>) at

>>>> gdb.arch/amd64-entry-value.cc:34^M

>>>> #1  0x0000000000400524 in main () at gdb.arch/amd64-entry-value.cc:229^M

>>>> (gdb) FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>>> ...

>>>>

>>>> Thanks,

>>>> - Tom

>>>>

>>>

>>> I'll take a look at it.

>>

>> Just a quick update... I did a before/after run and the only regression

>> seems to be from gdb.arch/amd64-entry-value.exp.

>>

>> The other failures are still there even after reverting the inline frame

>> unwinding fix.

>>

>> I'll check what's up with the regressed test.

>>

>> Could you please confirm this when you have some cycles?

> 

> Hi,

> 

> I cannot confirm this.  All these FAILs fail with the patch, and pass

> with the patch reverted.

> 

> Looking at amd64-tailcall-cxx.exp, we have normally:

> ...

> (gdb) bt^M

> #0  g (x=x@entry=2) at gdb.arch/amd64-tailcall-cxx1.cc:23^M

> #1  0x00000000004004e8 in f (x=x@entry=1) at

> gdb.arch/amd64-tailcall-cxx2.cc:23^M

> #2  0x00000000004003de in main () at gdb.arch/amd64-tailcall-cxx1.cc:31^M

> (gdb) PASS: gdb.arch/amd64-tailcall-cxx.exp: bt

> ...

> and with the patch:

> ...

> (gdb) bt^M

> #0  g (x=2) at gdb.arch/amd64-tailcall-cxx1.cc:23^M

> #1  0x00000000004003de in main () at gdb.arch/amd64-tailcall-cxx1.cc:31^M

> (gdb) FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt

> ...

> 

> So, I'd say it looks very similar to the issue in

> gdb.arch/amd64-entry-value.exp.

> 

> Thanks,

> - Tom

> 


Ok. I double-checked this and I'm still seeing failures for those that i 
mentioned, even with the patch reverted. It may be the case that these 
tests are not supposed to pass (or the testcase has issues) on non-amd64 
targets (running Intel here).

I'll work with the testcase that does show the issue. Hopefully a fix 
for that will address all the others, but i may need further confirmation.

Thanks,
Luis
Tom de Vries April 24, 2020, 12:23 p.m. | #10
On 24-04-2020 13:37, Luis Machado wrote:
> On 4/24/20 8:08 AM, Tom de Vries wrote:

>> On 24-04-2020 12:58, Luis Machado wrote:

>>> On 4/24/20 7:02 AM, Luis Machado wrote:

>>>> On 4/24/20 6:17 AM, Tom de Vries wrote:

>>>>> On 23-04-2020 19:51, Luis Machado via Gdb-patches wrote:

>>>>>> On 4/22/20 8:22 AM, Luis Machado wrote:

>>>>>>> Hi Andrew,

>>>>>>>

>>>>>>> On 4/22/20 6:37 AM, Andrew Burgess wrote:

>>>>>>>> * Luis Machado via Gdb-patches <gdb-patches@sourceware.org>

>>>>>>>> [2020-04-14 18:38:36 -0300]:

>>>>>>>>

>>>>>>>>> *** re-sending due to the poor choice of characters for the

>>>>>>>>> backtrace

>>>>>>>>> annotations. GIT swallowed parts of it.

>>>>>>>>>

>>>>>>>>> There has been some breakage for aarch64-linux, arm-linux and

>>>>>>>>> s390-linux in

>>>>>>>>> terms of inline frame unwinding. There may be other targets, but

>>>>>>>>> these are

>>>>>>>>> the ones i'm aware of.

>>>>>>>>>

>>>>>>>>> The following testcases started to show numerous failures and

>>>>>>>>> trigger internal

>>>>>>>>> errors in GDB after commit

>>>>>>>>> 1009d92fc621bc4d017029b90a5bfab16e17fde5,

>>>>>>>>> "Find tailcall frames before inline frames".

>>>>>>>>>

>>>>>>>>> gdb.opt/inline-break.exp

>>>>>>>>> gdb.opt/inline-cmds.exp

>>>>>>>>> gdb.python/py-frame-inline.exp

>>>>>>>>> gdb.reverse/insn-reverse.exp

>>>>>>>>>

>>>>>>>>> The internal errors were of this kind:

>>>>>>>>>

>>>>>>>>> binutils-gdb/gdb/frame.c:579: internal-error: frame_id

>>>>>>>>> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

>>>>>>>>

>>>>>>>> I have also started seeing this assert on RISC-V, and your patch

>>>>>>>> resolves this issue for me, so I'm keen to see this merged.

>>>>>>>

>>>>>>> Great.

>>>>>>>

>>>>>>>>

>>>>>>>> I took a look through and it all looks good to me - is there

>>>>>>>> anything

>>>>>>>> holding this back from being merged?

>>>>>>>

>>>>>>> Not really. I was waiting for an OK before pushing it.

>>>>>>>

>>>>>>>>

>>>>>>>> Thanks,

>>>>>>>> Andrew

>>>>>>

>>>>>> I've pushed this now. Tromey and Andrew OK-ed it on IRC.

>>>>>

>>>>> This causes at least:

>>>>> ...

>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i

>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i@entry

>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j

>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j@entry

>>>>> FAIL: gdb.arch/amd64-entry-value.exp: p $sp0 == $sp

>>>>> FAIL: gdb.arch/amd64-entry-value.exp: frame 3

>>>>> FAIL: gdb.arch/amd64-entry-value.exp: down

>>>>> FAIL: gdb.arch/amd64-entry-value.exp: disassemble

>>>>> FAIL: gdb.arch/amd64-entry-value.exp: ambiguous: bt

>>>>> FAIL: gdb.arch/amd64-entry-value.exp: self: bt

>>>>> FAIL: gdb.arch/amd64-entry-value.exp: self: bt debug entry-values

>>>>> FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt

>>>>> FAIL: gdb.arch/amd64-tailcall-noret.exp: bt

>>>>> FAIL: gdb.arch/amd64-tailcall-self.exp: bt

>>>>> ...

>>>>>

>>>>> Looking at the first FAIL, before this commit we have:

>>>>> ...

>>>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

>>>>> tailcall: breakhere

>>>>> bt^M

>>>>> #0  d (i=71, i@entry=70, j=73.5, j@entry=72.5) at

>>>>> gdb.arch/amd64-entry-value.cc:34^M

>>>>> #1  0x00000000004006af in c (i=i@entry=7, j=j@entry=7.25) at

>>>>> gdb.arch/amd64-entry-value.cc:47^M

>>>>> #2  0x00000000004006cd in b (i=i@entry=5, j=j@entry=5.25) at

>>>>> gdb.arch/amd64-entry-value.cc:59^M

>>>>> #3  0x0000000000400524 in main () at

>>>>> gdb.arch/amd64-entry-value.cc:229^M

>>>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>>>> ...

>>>>> which has now degraded into:

>>>>> ...

>>>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

>>>>> tailcall: breakhere

>>>>> bt^M

>>>>> #0  d (i=<optimized out>, j=<optimized out>) at

>>>>> gdb.arch/amd64-entry-value.cc:34^M

>>>>> #1  0x0000000000400524 in main () at

>>>>> gdb.arch/amd64-entry-value.cc:229^M

>>>>> (gdb) FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>>>> ...

>>>>>

>>>>> Thanks,

>>>>> - Tom

>>>>>

>>>>

>>>> I'll take a look at it.

>>>

>>> Just a quick update... I did a before/after run and the only regression

>>> seems to be from gdb.arch/amd64-entry-value.exp.

>>>

>>> The other failures are still there even after reverting the inline frame

>>> unwinding fix.

>>>

>>> I'll check what's up with the regressed test.

>>>

>>> Could you please confirm this when you have some cycles?

>>

>> Hi,

>>

>> I cannot confirm this.  All these FAILs fail with the patch, and pass

>> with the patch reverted.

>>

>> Looking at amd64-tailcall-cxx.exp, we have normally:

>> ...

>> (gdb) bt^M

>> #0  g (x=x@entry=2) at gdb.arch/amd64-tailcall-cxx1.cc:23^M

>> #1  0x00000000004004e8 in f (x=x@entry=1) at

>> gdb.arch/amd64-tailcall-cxx2.cc:23^M

>> #2  0x00000000004003de in main () at gdb.arch/amd64-tailcall-cxx1.cc:31^M

>> (gdb) PASS: gdb.arch/amd64-tailcall-cxx.exp: bt

>> ...

>> and with the patch:

>> ...

>> (gdb) bt^M

>> #0  g (x=2) at gdb.arch/amd64-tailcall-cxx1.cc:23^M

>> #1  0x00000000004003de in main () at gdb.arch/amd64-tailcall-cxx1.cc:31^M

>> (gdb) FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt

>> ...

>>

>> So, I'd say it looks very similar to the issue in

>> gdb.arch/amd64-entry-value.exp.

>>

>> Thanks,

>> - Tom

>>

> 

> Ok. I double-checked this and I'm still seeing failures for those that i

> mentioned, even with the patch reverted. It may be the case that these

> tests are not supposed to pass (or the testcase has issues) on non-amd64

> targets (running Intel here).

> 


Also Intel here (FWIW: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz).

> I'll work with the testcase that does show the issue. Hopefully a fix

> for that will address all the others, but i may need further confirmation.


Understood.

Can you file a PR for the amd64-tailcall-cxx.exp FAIL that you're seeing
before the patch, and attach the exec?

Thanks,
- Tom
Rogerio Alves via Gdb-patches April 24, 2020, 1:19 p.m. | #11
On 4/24/20 9:23 AM, Tom de Vries wrote:
> On 24-04-2020 13:37, Luis Machado wrote:

>> On 4/24/20 8:08 AM, Tom de Vries wrote:

>>> On 24-04-2020 12:58, Luis Machado wrote:

>>>> On 4/24/20 7:02 AM, Luis Machado wrote:

>>>>> On 4/24/20 6:17 AM, Tom de Vries wrote:

>>>>>> On 23-04-2020 19:51, Luis Machado via Gdb-patches wrote:

>>>>>>> On 4/22/20 8:22 AM, Luis Machado wrote:

>>>>>>>> Hi Andrew,

>>>>>>>>

>>>>>>>> On 4/22/20 6:37 AM, Andrew Burgess wrote:

>>>>>>>>> * Luis Machado via Gdb-patches <gdb-patches@sourceware.org>

>>>>>>>>> [2020-04-14 18:38:36 -0300]:

>>>>>>>>>

>>>>>>>>>> *** re-sending due to the poor choice of characters for the

>>>>>>>>>> backtrace

>>>>>>>>>> annotations. GIT swallowed parts of it.

>>>>>>>>>>

>>>>>>>>>> There has been some breakage for aarch64-linux, arm-linux and

>>>>>>>>>> s390-linux in

>>>>>>>>>> terms of inline frame unwinding. There may be other targets, but

>>>>>>>>>> these are

>>>>>>>>>> the ones i'm aware of.

>>>>>>>>>>

>>>>>>>>>> The following testcases started to show numerous failures and

>>>>>>>>>> trigger internal

>>>>>>>>>> errors in GDB after commit

>>>>>>>>>> 1009d92fc621bc4d017029b90a5bfab16e17fde5,

>>>>>>>>>> "Find tailcall frames before inline frames".

>>>>>>>>>>

>>>>>>>>>> gdb.opt/inline-break.exp

>>>>>>>>>> gdb.opt/inline-cmds.exp

>>>>>>>>>> gdb.python/py-frame-inline.exp

>>>>>>>>>> gdb.reverse/insn-reverse.exp

>>>>>>>>>>

>>>>>>>>>> The internal errors were of this kind:

>>>>>>>>>>

>>>>>>>>>> binutils-gdb/gdb/frame.c:579: internal-error: frame_id

>>>>>>>>>> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

>>>>>>>>>

>>>>>>>>> I have also started seeing this assert on RISC-V, and your patch

>>>>>>>>> resolves this issue for me, so I'm keen to see this merged.

>>>>>>>>

>>>>>>>> Great.

>>>>>>>>

>>>>>>>>>

>>>>>>>>> I took a look through and it all looks good to me - is there

>>>>>>>>> anything

>>>>>>>>> holding this back from being merged?

>>>>>>>>

>>>>>>>> Not really. I was waiting for an OK before pushing it.

>>>>>>>>

>>>>>>>>>

>>>>>>>>> Thanks,

>>>>>>>>> Andrew

>>>>>>>

>>>>>>> I've pushed this now. Tromey and Andrew OK-ed it on IRC.

>>>>>>

>>>>>> This causes at least:

>>>>>> ...

>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i

>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i@entry

>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j

>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j@entry

>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: p $sp0 == $sp

>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: frame 3

>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: down

>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: disassemble

>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: ambiguous: bt

>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: self: bt

>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: self: bt debug entry-values

>>>>>> FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt

>>>>>> FAIL: gdb.arch/amd64-tailcall-noret.exp: bt

>>>>>> FAIL: gdb.arch/amd64-tailcall-self.exp: bt

>>>>>> ...

>>>>>>

>>>>>> Looking at the first FAIL, before this commit we have:

>>>>>> ...

>>>>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

>>>>>> tailcall: breakhere

>>>>>> bt^M

>>>>>> #0  d (i=71, i@entry=70, j=73.5, j@entry=72.5) at

>>>>>> gdb.arch/amd64-entry-value.cc:34^M

>>>>>> #1  0x00000000004006af in c (i=i@entry=7, j=j@entry=7.25) at

>>>>>> gdb.arch/amd64-entry-value.cc:47^M

>>>>>> #2  0x00000000004006cd in b (i=i@entry=5, j=j@entry=5.25) at

>>>>>> gdb.arch/amd64-entry-value.cc:59^M

>>>>>> #3  0x0000000000400524 in main () at

>>>>>> gdb.arch/amd64-entry-value.cc:229^M

>>>>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>>>>> ...

>>>>>> which has now degraded into:

>>>>>> ...

>>>>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

>>>>>> tailcall: breakhere

>>>>>> bt^M

>>>>>> #0  d (i=<optimized out>, j=<optimized out>) at

>>>>>> gdb.arch/amd64-entry-value.cc:34^M

>>>>>> #1  0x0000000000400524 in main () at

>>>>>> gdb.arch/amd64-entry-value.cc:229^M

>>>>>> (gdb) FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>>>>> ...

>>>>>>

>>>>>> Thanks,

>>>>>> - Tom

>>>>>>

>>>>>

>>>>> I'll take a look at it.

>>>>

>>>> Just a quick update... I did a before/after run and the only regression

>>>> seems to be from gdb.arch/amd64-entry-value.exp.

>>>>

>>>> The other failures are still there even after reverting the inline frame

>>>> unwinding fix.

>>>>

>>>> I'll check what's up with the regressed test.

>>>>

>>>> Could you please confirm this when you have some cycles?

>>>

>>> Hi,

>>>

>>> I cannot confirm this.  All these FAILs fail with the patch, and pass

>>> with the patch reverted.

>>>

>>> Looking at amd64-tailcall-cxx.exp, we have normally:

>>> ...

>>> (gdb) bt^M

>>> #0  g (x=x@entry=2) at gdb.arch/amd64-tailcall-cxx1.cc:23^M

>>> #1  0x00000000004004e8 in f (x=x@entry=1) at

>>> gdb.arch/amd64-tailcall-cxx2.cc:23^M

>>> #2  0x00000000004003de in main () at gdb.arch/amd64-tailcall-cxx1.cc:31^M

>>> (gdb) PASS: gdb.arch/amd64-tailcall-cxx.exp: bt

>>> ...

>>> and with the patch:

>>> ...

>>> (gdb) bt^M

>>> #0  g (x=2) at gdb.arch/amd64-tailcall-cxx1.cc:23^M

>>> #1  0x00000000004003de in main () at gdb.arch/amd64-tailcall-cxx1.cc:31^M

>>> (gdb) FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt

>>> ...

>>>

>>> So, I'd say it looks very similar to the issue in

>>> gdb.arch/amd64-entry-value.exp.

>>>

>>> Thanks,

>>> - Tom

>>>

>>

>> Ok. I double-checked this and I'm still seeing failures for those that i

>> mentioned, even with the patch reverted. It may be the case that these

>> tests are not supposed to pass (or the testcase has issues) on non-amd64

>> targets (running Intel here).

>>

> 

> Also Intel here (FWIW: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz).

> 


Yikes. I have the exact same. There may be system differences affecting 
the tests then (libraries and/or compiler).

I have this compiler: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04).


>> I'll work with the testcase that does show the issue. Hopefully a fix

>> for that will address all the others, but i may need further confirmation.

> 

> Understood.

> 

> Can you file a PR for the amd64-tailcall-cxx.exp FAIL that you're seeing

> before the patch, and attach the exec?


Sure. But before i do that, i have these failure with the patch reverted:

FAIL: gdb.arch/amd64-entry-value-inline.exp: p y
FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 1: p y
FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 1: p b
FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 2: p y
FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 2: p b
FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 3: p y
FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 3: p b
FAIL: gdb.arch/amd64-entry-value-param.exp: call 1: p y
FAIL: gdb.arch/amd64-entry-value-param.exp: call 1: p b
FAIL: gdb.arch/amd64-entry-value-param.exp: call 2: p y
FAIL: gdb.arch/amd64-entry-value-param.exp: call 2: p b
FAIL: gdb.arch/amd64-entry-value-param.exp: call 3: p y
FAIL: gdb.arch/amd64-entry-value-param.exp: call 3: p b
FAIL: gdb.arch/amd64-entry-value-paramref.exp: frame
FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt
FAIL: gdb.arch/amd64-tailcall-noret.exp: bt
FAIL: gdb.arch/amd64-tailcall-self.exp: bt
FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: entry_equal: stop (stopped 
at wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: entry_equal: 
-stack-list-variables (unexpected output)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: entry_different: stop 
(stopped at wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: entry_different: 
-stack-list-variables (unexpected output)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: validity: stop (stopped at 
wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: validity: 
-stack-list-variables (unexpected output)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: entry_equal: stop 
(stopped at wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: entry_equal: 
-stack-list-variables (unexpected output)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: entry_different: stop 
(stopped at wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: entry_different: 
-stack-list-variables (unexpected output)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: validity: stop 
(stopped at wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: validity: 
-stack-list-variables (unexpected output)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: if-needed: validity: stop 
(stopped at wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: if-needed: validity: 
-stack-list-variables (unexpected output)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: entry_equal: stop (stopped 
at wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: entry_equal: 
-stack-list-variables (unexpected output)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: entry_different: stop 
(stopped at wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: entry_different: 
-stack-list-variables (unexpected output)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: validity: stop (stopped at 
wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: validity: 
-stack-list-variables (unexpected output)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: entry_equal: stop 
(stopped at wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: entry_equal: 
-stack-list-variables (unexpected output)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: entry_different: stop 
(stopped at wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: entry_different: 
-stack-list-variables (unexpected output)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: validity: stop (stopped 
at wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: validity: 
-stack-list-variables (unexpected output)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: entry_equal: stop 
(stopped at wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: entry_equal: 
-stack-list-variables (unexpected output)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: entry_different: stop 
(stopped at wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: entry_different: 
-stack-list-variables (unexpected output)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: validity: stop (stopped 
at wrong place)
FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: validity: 
-stack-list-variables (unexpected output)

Also a bunch of failures for gdb.base/gnu-ifunc.exp, but i think this is 
unrelated.

Which ones do you want me to open bugs against?
Tom de Vries April 24, 2020, 2:36 p.m. | #12
On 24-04-2020 15:19, Luis Machado wrote:
> 

> 

> On 4/24/20 9:23 AM, Tom de Vries wrote:

>> On 24-04-2020 13:37, Luis Machado wrote:

>>> On 4/24/20 8:08 AM, Tom de Vries wrote:

>>>> On 24-04-2020 12:58, Luis Machado wrote:

>>>>> On 4/24/20 7:02 AM, Luis Machado wrote:

>>>>>> On 4/24/20 6:17 AM, Tom de Vries wrote:

>>>>>>> On 23-04-2020 19:51, Luis Machado via Gdb-patches wrote:

>>>>>>>> On 4/22/20 8:22 AM, Luis Machado wrote:

>>>>>>>>> Hi Andrew,

>>>>>>>>>

>>>>>>>>> On 4/22/20 6:37 AM, Andrew Burgess wrote:

>>>>>>>>>> * Luis Machado via Gdb-patches <gdb-patches@sourceware.org>

>>>>>>>>>> [2020-04-14 18:38:36 -0300]:

>>>>>>>>>>

>>>>>>>>>>> *** re-sending due to the poor choice of characters for the

>>>>>>>>>>> backtrace

>>>>>>>>>>> annotations. GIT swallowed parts of it.

>>>>>>>>>>>

>>>>>>>>>>> There has been some breakage for aarch64-linux, arm-linux and

>>>>>>>>>>> s390-linux in

>>>>>>>>>>> terms of inline frame unwinding. There may be other targets, but

>>>>>>>>>>> these are

>>>>>>>>>>> the ones i'm aware of.

>>>>>>>>>>>

>>>>>>>>>>> The following testcases started to show numerous failures and

>>>>>>>>>>> trigger internal

>>>>>>>>>>> errors in GDB after commit

>>>>>>>>>>> 1009d92fc621bc4d017029b90a5bfab16e17fde5,

>>>>>>>>>>> "Find tailcall frames before inline frames".

>>>>>>>>>>>

>>>>>>>>>>> gdb.opt/inline-break.exp

>>>>>>>>>>> gdb.opt/inline-cmds.exp

>>>>>>>>>>> gdb.python/py-frame-inline.exp

>>>>>>>>>>> gdb.reverse/insn-reverse.exp

>>>>>>>>>>>

>>>>>>>>>>> The internal errors were of this kind:

>>>>>>>>>>>

>>>>>>>>>>> binutils-gdb/gdb/frame.c:579: internal-error: frame_id

>>>>>>>>>>> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

>>>>>>>>>>

>>>>>>>>>> I have also started seeing this assert on RISC-V, and your patch

>>>>>>>>>> resolves this issue for me, so I'm keen to see this merged.

>>>>>>>>>

>>>>>>>>> Great.

>>>>>>>>>

>>>>>>>>>>

>>>>>>>>>> I took a look through and it all looks good to me - is there

>>>>>>>>>> anything

>>>>>>>>>> holding this back from being merged?

>>>>>>>>>

>>>>>>>>> Not really. I was waiting for an OK before pushing it.

>>>>>>>>>

>>>>>>>>>>

>>>>>>>>>> Thanks,

>>>>>>>>>> Andrew

>>>>>>>>

>>>>>>>> I've pushed this now. Tromey and Andrew OK-ed it on IRC.

>>>>>>>

>>>>>>> This causes at least:

>>>>>>> ...

>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i

>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i@entry

>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j

>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j@entry

>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: p $sp0 == $sp

>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: frame 3

>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: down

>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: disassemble

>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: ambiguous: bt

>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: self: bt

>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: self: bt debug entry-values

>>>>>>> FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt

>>>>>>> FAIL: gdb.arch/amd64-tailcall-noret.exp: bt

>>>>>>> FAIL: gdb.arch/amd64-tailcall-self.exp: bt

>>>>>>> ...

>>>>>>>

>>>>>>> Looking at the first FAIL, before this commit we have:

>>>>>>> ...

>>>>>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

>>>>>>> tailcall: breakhere

>>>>>>> bt^M

>>>>>>> #0  d (i=71, i@entry=70, j=73.5, j@entry=72.5) at

>>>>>>> gdb.arch/amd64-entry-value.cc:34^M

>>>>>>> #1  0x00000000004006af in c (i=i@entry=7, j=j@entry=7.25) at

>>>>>>> gdb.arch/amd64-entry-value.cc:47^M

>>>>>>> #2  0x00000000004006cd in b (i=i@entry=5, j=j@entry=5.25) at

>>>>>>> gdb.arch/amd64-entry-value.cc:59^M

>>>>>>> #3  0x0000000000400524 in main () at

>>>>>>> gdb.arch/amd64-entry-value.cc:229^M

>>>>>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>>>>>> ...

>>>>>>> which has now degraded into:

>>>>>>> ...

>>>>>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

>>>>>>> tailcall: breakhere

>>>>>>> bt^M

>>>>>>> #0  d (i=<optimized out>, j=<optimized out>) at

>>>>>>> gdb.arch/amd64-entry-value.cc:34^M

>>>>>>> #1  0x0000000000400524 in main () at

>>>>>>> gdb.arch/amd64-entry-value.cc:229^M

>>>>>>> (gdb) FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>>>>>> ...

>>>>>>>

>>>>>>> Thanks,

>>>>>>> - Tom

>>>>>>>

>>>>>>

>>>>>> I'll take a look at it.

>>>>>

>>>>> Just a quick update... I did a before/after run and the only

>>>>> regression

>>>>> seems to be from gdb.arch/amd64-entry-value.exp.

>>>>>

>>>>> The other failures are still there even after reverting the inline

>>>>> frame

>>>>> unwinding fix.

>>>>>

>>>>> I'll check what's up with the regressed test.

>>>>>

>>>>> Could you please confirm this when you have some cycles?

>>>>

>>>> Hi,

>>>>

>>>> I cannot confirm this.  All these FAILs fail with the patch, and pass

>>>> with the patch reverted.

>>>>

>>>> Looking at amd64-tailcall-cxx.exp, we have normally:

>>>> ...

>>>> (gdb) bt^M

>>>> #0  g (x=x@entry=2) at gdb.arch/amd64-tailcall-cxx1.cc:23^M

>>>> #1  0x00000000004004e8 in f (x=x@entry=1) at

>>>> gdb.arch/amd64-tailcall-cxx2.cc:23^M

>>>> #2  0x00000000004003de in main () at

>>>> gdb.arch/amd64-tailcall-cxx1.cc:31^M

>>>> (gdb) PASS: gdb.arch/amd64-tailcall-cxx.exp: bt

>>>> ...

>>>> and with the patch:

>>>> ...

>>>> (gdb) bt^M

>>>> #0  g (x=2) at gdb.arch/amd64-tailcall-cxx1.cc:23^M

>>>> #1  0x00000000004003de in main () at

>>>> gdb.arch/amd64-tailcall-cxx1.cc:31^M

>>>> (gdb) FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt

>>>> ...

>>>>

>>>> So, I'd say it looks very similar to the issue in

>>>> gdb.arch/amd64-entry-value.exp.

>>>>

>>>> Thanks,

>>>> - Tom

>>>>

>>>

>>> Ok. I double-checked this and I'm still seeing failures for those that i

>>> mentioned, even with the patch reverted. It may be the case that these

>>> tests are not supposed to pass (or the testcase has issues) on non-amd64

>>> targets (running Intel here).

>>>

>>

>> Also Intel here (FWIW: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz).

>>

> 

> Yikes. I have the exact same. There may be system differences affecting

> the tests then (libraries and/or compiler).

> 

> I have this compiler: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04).

> 

> 

>>> I'll work with the testcase that does show the issue. Hopefully a fix

>>> for that will address all the others, but i may need further

>>> confirmation.

>>

>> Understood.

>>

>> Can you file a PR for the amd64-tailcall-cxx.exp FAIL that you're seeing

>> before the patch, and attach the exec?

> 

> Sure. But before i do that, i have these failure with the patch reverted:

> 

> FAIL: gdb.arch/amd64-entry-value-inline.exp: p y

> FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 1: p y

> FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 1: p b

> FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 2: p y

> FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 2: p b

> FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 3: p y

> FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 3: p b

> FAIL: gdb.arch/amd64-entry-value-param.exp: call 1: p y

> FAIL: gdb.arch/amd64-entry-value-param.exp: call 1: p b

> FAIL: gdb.arch/amd64-entry-value-param.exp: call 2: p y

> FAIL: gdb.arch/amd64-entry-value-param.exp: call 2: p b

> FAIL: gdb.arch/amd64-entry-value-param.exp: call 3: p y

> FAIL: gdb.arch/amd64-entry-value-param.exp: call 3: p b

> FAIL: gdb.arch/amd64-entry-value-paramref.exp: frame

> FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt

> FAIL: gdb.arch/amd64-tailcall-noret.exp: bt

> FAIL: gdb.arch/amd64-tailcall-self.exp: bt

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: entry_equal: stop (stopped

> at wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: entry_equal:

> -stack-list-variables (unexpected output)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: entry_different: stop

> (stopped at wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: entry_different:

> -stack-list-variables (unexpected output)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: validity: stop (stopped at

> wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: validity:

> -stack-list-variables (unexpected output)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: entry_equal: stop

> (stopped at wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: entry_equal:

> -stack-list-variables (unexpected output)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: entry_different: stop

> (stopped at wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: entry_different:

> -stack-list-variables (unexpected output)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: validity: stop

> (stopped at wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: validity:

> -stack-list-variables (unexpected output)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: if-needed: validity: stop

> (stopped at wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: if-needed: validity:

> -stack-list-variables (unexpected output)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: entry_equal: stop (stopped

> at wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: entry_equal:

> -stack-list-variables (unexpected output)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: entry_different: stop

> (stopped at wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: entry_different:

> -stack-list-variables (unexpected output)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: validity: stop (stopped at

> wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: validity:

> -stack-list-variables (unexpected output)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: entry_equal: stop

> (stopped at wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: entry_equal:

> -stack-list-variables (unexpected output)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: entry_different: stop

> (stopped at wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: entry_different:

> -stack-list-variables (unexpected output)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: validity: stop (stopped

> at wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: validity:

> -stack-list-variables (unexpected output)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: entry_equal: stop

> (stopped at wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: entry_equal:

> -stack-list-variables (unexpected output)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: entry_different: stop

> (stopped at wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: entry_different:

> -stack-list-variables (unexpected output)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: validity: stop (stopped

> at wrong place)

> FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: validity:

> -stack-list-variables (unexpected output)

> 

> Also a bunch of failures for gdb.base/gnu-ifunc.exp, but i think this is

> unrelated.

> 

> Which ones do you want me to open bugs against?


I think you're running into
https://sourceware.org/bugzilla/show_bug.cgi?id=24892 .

I can reproduce the same failure by running with target board
unix/-fPIE/-pie.

Thanks,
- Tom
Rogerio Alves via Gdb-patches April 24, 2020, 2:39 p.m. | #13
On 4/24/20 11:36 AM, Tom de Vries wrote:
> On 24-04-2020 15:19, Luis Machado wrote:

>>

>>

>> On 4/24/20 9:23 AM, Tom de Vries wrote:

>>> On 24-04-2020 13:37, Luis Machado wrote:

>>>> On 4/24/20 8:08 AM, Tom de Vries wrote:

>>>>> On 24-04-2020 12:58, Luis Machado wrote:

>>>>>> On 4/24/20 7:02 AM, Luis Machado wrote:

>>>>>>> On 4/24/20 6:17 AM, Tom de Vries wrote:

>>>>>>>> On 23-04-2020 19:51, Luis Machado via Gdb-patches wrote:

>>>>>>>>> On 4/22/20 8:22 AM, Luis Machado wrote:

>>>>>>>>>> Hi Andrew,

>>>>>>>>>>

>>>>>>>>>> On 4/22/20 6:37 AM, Andrew Burgess wrote:

>>>>>>>>>>> * Luis Machado via Gdb-patches <gdb-patches@sourceware.org>

>>>>>>>>>>> [2020-04-14 18:38:36 -0300]:

>>>>>>>>>>>

>>>>>>>>>>>> *** re-sending due to the poor choice of characters for the

>>>>>>>>>>>> backtrace

>>>>>>>>>>>> annotations. GIT swallowed parts of it.

>>>>>>>>>>>>

>>>>>>>>>>>> There has been some breakage for aarch64-linux, arm-linux and

>>>>>>>>>>>> s390-linux in

>>>>>>>>>>>> terms of inline frame unwinding. There may be other targets, but

>>>>>>>>>>>> these are

>>>>>>>>>>>> the ones i'm aware of.

>>>>>>>>>>>>

>>>>>>>>>>>> The following testcases started to show numerous failures and

>>>>>>>>>>>> trigger internal

>>>>>>>>>>>> errors in GDB after commit

>>>>>>>>>>>> 1009d92fc621bc4d017029b90a5bfab16e17fde5,

>>>>>>>>>>>> "Find tailcall frames before inline frames".

>>>>>>>>>>>>

>>>>>>>>>>>> gdb.opt/inline-break.exp

>>>>>>>>>>>> gdb.opt/inline-cmds.exp

>>>>>>>>>>>> gdb.python/py-frame-inline.exp

>>>>>>>>>>>> gdb.reverse/insn-reverse.exp

>>>>>>>>>>>>

>>>>>>>>>>>> The internal errors were of this kind:

>>>>>>>>>>>>

>>>>>>>>>>>> binutils-gdb/gdb/frame.c:579: internal-error: frame_id

>>>>>>>>>>>> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

>>>>>>>>>>>

>>>>>>>>>>> I have also started seeing this assert on RISC-V, and your patch

>>>>>>>>>>> resolves this issue for me, so I'm keen to see this merged.

>>>>>>>>>>

>>>>>>>>>> Great.

>>>>>>>>>>

>>>>>>>>>>>

>>>>>>>>>>> I took a look through and it all looks good to me - is there

>>>>>>>>>>> anything

>>>>>>>>>>> holding this back from being merged?

>>>>>>>>>>

>>>>>>>>>> Not really. I was waiting for an OK before pushing it.

>>>>>>>>>>

>>>>>>>>>>>

>>>>>>>>>>> Thanks,

>>>>>>>>>>> Andrew

>>>>>>>>>

>>>>>>>>> I've pushed this now. Tromey and Andrew OK-ed it on IRC.

>>>>>>>>

>>>>>>>> This causes at least:

>>>>>>>> ...

>>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i

>>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p i@entry

>>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j

>>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: tailcall: p j@entry

>>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: p $sp0 == $sp

>>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: frame 3

>>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: down

>>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: disassemble

>>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: ambiguous: bt

>>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: self: bt

>>>>>>>> FAIL: gdb.arch/amd64-entry-value.exp: self: bt debug entry-values

>>>>>>>> FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt

>>>>>>>> FAIL: gdb.arch/amd64-tailcall-noret.exp: bt

>>>>>>>> FAIL: gdb.arch/amd64-tailcall-self.exp: bt

>>>>>>>> ...

>>>>>>>>

>>>>>>>> Looking at the first FAIL, before this commit we have:

>>>>>>>> ...

>>>>>>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

>>>>>>>> tailcall: breakhere

>>>>>>>> bt^M

>>>>>>>> #0  d (i=71, i@entry=70, j=73.5, j@entry=72.5) at

>>>>>>>> gdb.arch/amd64-entry-value.cc:34^M

>>>>>>>> #1  0x00000000004006af in c (i=i@entry=7, j=j@entry=7.25) at

>>>>>>>> gdb.arch/amd64-entry-value.cc:47^M

>>>>>>>> #2  0x00000000004006cd in b (i=i@entry=5, j=j@entry=5.25) at

>>>>>>>> gdb.arch/amd64-entry-value.cc:59^M

>>>>>>>> #3  0x0000000000400524 in main () at

>>>>>>>> gdb.arch/amd64-entry-value.cc:229^M

>>>>>>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>>>>>>> ...

>>>>>>>> which has now degraded into:

>>>>>>>> ...

>>>>>>>> (gdb) PASS: gdb.arch/amd64-entry-value.exp: continue to breakpoint:

>>>>>>>> tailcall: breakhere

>>>>>>>> bt^M

>>>>>>>> #0  d (i=<optimized out>, j=<optimized out>) at

>>>>>>>> gdb.arch/amd64-entry-value.cc:34^M

>>>>>>>> #1  0x0000000000400524 in main () at

>>>>>>>> gdb.arch/amd64-entry-value.cc:229^M

>>>>>>>> (gdb) FAIL: gdb.arch/amd64-entry-value.exp: tailcall: bt

>>>>>>>> ...

>>>>>>>>

>>>>>>>> Thanks,

>>>>>>>> - Tom

>>>>>>>>

>>>>>>>

>>>>>>> I'll take a look at it.

>>>>>>

>>>>>> Just a quick update... I did a before/after run and the only

>>>>>> regression

>>>>>> seems to be from gdb.arch/amd64-entry-value.exp.

>>>>>>

>>>>>> The other failures are still there even after reverting the inline

>>>>>> frame

>>>>>> unwinding fix.

>>>>>>

>>>>>> I'll check what's up with the regressed test.

>>>>>>

>>>>>> Could you please confirm this when you have some cycles?

>>>>>

>>>>> Hi,

>>>>>

>>>>> I cannot confirm this.  All these FAILs fail with the patch, and pass

>>>>> with the patch reverted.

>>>>>

>>>>> Looking at amd64-tailcall-cxx.exp, we have normally:

>>>>> ...

>>>>> (gdb) bt^M

>>>>> #0  g (x=x@entry=2) at gdb.arch/amd64-tailcall-cxx1.cc:23^M

>>>>> #1  0x00000000004004e8 in f (x=x@entry=1) at

>>>>> gdb.arch/amd64-tailcall-cxx2.cc:23^M

>>>>> #2  0x00000000004003de in main () at

>>>>> gdb.arch/amd64-tailcall-cxx1.cc:31^M

>>>>> (gdb) PASS: gdb.arch/amd64-tailcall-cxx.exp: bt

>>>>> ...

>>>>> and with the patch:

>>>>> ...

>>>>> (gdb) bt^M

>>>>> #0  g (x=2) at gdb.arch/amd64-tailcall-cxx1.cc:23^M

>>>>> #1  0x00000000004003de in main () at

>>>>> gdb.arch/amd64-tailcall-cxx1.cc:31^M

>>>>> (gdb) FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt

>>>>> ...

>>>>>

>>>>> So, I'd say it looks very similar to the issue in

>>>>> gdb.arch/amd64-entry-value.exp.

>>>>>

>>>>> Thanks,

>>>>> - Tom

>>>>>

>>>>

>>>> Ok. I double-checked this and I'm still seeing failures for those that i

>>>> mentioned, even with the patch reverted. It may be the case that these

>>>> tests are not supposed to pass (or the testcase has issues) on non-amd64

>>>> targets (running Intel here).

>>>>

>>>

>>> Also Intel here (FWIW: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz).

>>>

>>

>> Yikes. I have the exact same. There may be system differences affecting

>> the tests then (libraries and/or compiler).

>>

>> I have this compiler: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04).

>>

>>

>>>> I'll work with the testcase that does show the issue. Hopefully a fix

>>>> for that will address all the others, but i may need further

>>>> confirmation.

>>>

>>> Understood.

>>>

>>> Can you file a PR for the amd64-tailcall-cxx.exp FAIL that you're seeing

>>> before the patch, and attach the exec?

>>

>> Sure. But before i do that, i have these failure with the patch reverted:

>>

>> FAIL: gdb.arch/amd64-entry-value-inline.exp: p y

>> FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 1: p y

>> FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 1: p b

>> FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 2: p y

>> FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 2: p b

>> FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 3: p y

>> FAIL: gdb.arch/amd64-entry-value-param-dwarf5.exp: call 3: p b

>> FAIL: gdb.arch/amd64-entry-value-param.exp: call 1: p y

>> FAIL: gdb.arch/amd64-entry-value-param.exp: call 1: p b

>> FAIL: gdb.arch/amd64-entry-value-param.exp: call 2: p y

>> FAIL: gdb.arch/amd64-entry-value-param.exp: call 2: p b

>> FAIL: gdb.arch/amd64-entry-value-param.exp: call 3: p y

>> FAIL: gdb.arch/amd64-entry-value-param.exp: call 3: p b

>> FAIL: gdb.arch/amd64-entry-value-paramref.exp: frame

>> FAIL: gdb.arch/amd64-tailcall-cxx.exp: bt

>> FAIL: gdb.arch/amd64-tailcall-noret.exp: bt

>> FAIL: gdb.arch/amd64-tailcall-self.exp: bt

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: entry_equal: stop (stopped

>> at wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: entry_equal:

>> -stack-list-variables (unexpected output)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: entry_different: stop

>> (stopped at wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: entry_different:

>> -stack-list-variables (unexpected output)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: validity: stop (stopped at

>> wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: only: validity:

>> -stack-list-variables (unexpected output)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: entry_equal: stop

>> (stopped at wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: entry_equal:

>> -stack-list-variables (unexpected output)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: entry_different: stop

>> (stopped at wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: entry_different:

>> -stack-list-variables (unexpected output)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: validity: stop

>> (stopped at wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: preferred: validity:

>> -stack-list-variables (unexpected output)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: if-needed: validity: stop

>> (stopped at wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: if-needed: validity:

>> -stack-list-variables (unexpected output)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: entry_equal: stop (stopped

>> at wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: entry_equal:

>> -stack-list-variables (unexpected output)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: entry_different: stop

>> (stopped at wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: entry_different:

>> -stack-list-variables (unexpected output)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: validity: stop (stopped at

>> wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: both: validity:

>> -stack-list-variables (unexpected output)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: entry_equal: stop

>> (stopped at wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: entry_equal:

>> -stack-list-variables (unexpected output)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: entry_different: stop

>> (stopped at wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: entry_different:

>> -stack-list-variables (unexpected output)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: validity: stop (stopped

>> at wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: compact: validity:

>> -stack-list-variables (unexpected output)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: entry_equal: stop

>> (stopped at wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: entry_equal:

>> -stack-list-variables (unexpected output)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: entry_different: stop

>> (stopped at wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: entry_different:

>> -stack-list-variables (unexpected output)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: validity: stop (stopped

>> at wrong place)

>> FAIL: gdb.mi/mi2-amd64-entry-value.exp: default: validity:

>> -stack-list-variables (unexpected output)

>>

>> Also a bunch of failures for gdb.base/gnu-ifunc.exp, but i think this is

>> unrelated.

>>

>> Which ones do you want me to open bugs against?

> 

> I think you're running into

> https://sourceware.org/bugzilla/show_bug.cgi?id=24892 .


Looks very likely, since Ubuntu's GCC passes -pie by default. I'll keep 
that in mind.

> 

> I can reproduce the same failure by running with target board

> unix/-fPIE/-pie.

> 

> Thanks,

> - Tom

>
Andrew Burgess June 18, 2020, 4:58 p.m. | #14
* Luis Machado via Gdb-patches <gdb-patches@sourceware.org> [2020-04-14 18:38:36 -0300]:

> *** re-sending due to the poor choice of characters for the backtrace

> annotations. GIT swallowed parts of it.

> 

> There has been some breakage for aarch64-linux, arm-linux and s390-linux in

> terms of inline frame unwinding. There may be other targets, but these are

> the ones i'm aware of.

> 

> The following testcases started to show numerous failures and trigger internal

> errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

> "Find tailcall frames before inline frames".

> 

> gdb.opt/inline-break.exp

> gdb.opt/inline-cmds.exp

> gdb.python/py-frame-inline.exp

> gdb.reverse/insn-reverse.exp

> 

> The internal errors were of this kind:

> 

> binutils-gdb/gdb/frame.c:579: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

> 

> After a lengthy investigation to try and find the cause of these assertions,

> it seems we're dealing with some fragile/poorly documented code to handle inline

> frames and we are attempting to unwind from this fragile section of code.

> 

> Before commit 1009d92fc621bc4d017029b90a5bfab16e17fde5, the tailcall sniffer

> was invoked from dwarf2_frame_prev_register. By the time we invoke the

> dwarf2_frame_prev_register function, we've probably already calculated the

> frame id (via compute_frame_id).

> 

> After said commit, the call to dwarf2_tailcall_sniffer_first was moved to

> dwarf2_frame_cache. This is very early in a frame creation process, and

> we're still calculating the frame ID (so compute_frame_id is in the call

> stack).

> 

> This would be fine for regular frames, but the above testcases all deal

> with some inline frames.

> 

> The particularity of inline frames is that their frame ID's depend on

> the previous frame's ID, and the previous frame's ID relies in the inline

> frame's registers. So it is a bit of a messy situation.

> 

> We have comments in various parts of the code warning about some of these

> particularities.

> 

> In the case of dwarf2_tailcall_sniffer_first, we attempt to unwind the PC,

> which goes through various functions until we eventually invoke

> frame_unwind_got_register. This function will eventually attempt to create

> a lazy value for a particular register, and this lazy value will require

> a valid frame ID.  Since the inline frame doesn't have a valid frame ID

> yet (remember we're still calculating the previous frame's ID so we can tell

> what the inline frame ID is) we will call compute_frame_id for the inline

> frame (level 0).

> 

> We'll eventually hit the assertion above, inside get_frame_id:

> 

> --

>       /* If we haven't computed the frame id yet, then it must be that

>          this is the current frame.  Compute it now, and stash the

>          result.  The IDs of other frames are computed as soon as

>          they're created, in order to detect cycles.  See

>          get_prev_frame_if_no_cycle.  */

>       gdb_assert (fi->level == 0);

> --

> 

> It seems to me we shouldn't have reached this assertion without having the

> inline frame ID already calculated. In fact, it seems we even start recursing

> a bit when we invoke get_prev_frame_always within inline_frame_this_id. But

> a check makes us quit the recursion and proceed to compute the id.

> 

> Here's the call stack for context:

> 

> #0  get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2109

> RECURSION - #1  0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

> #2  0x0000aaaaaae95768 in inline_frame_this_id (this_frame=0xaaaaab85a670, this_cache=0xaaaaab85a688, this_id=0xaaaaab85a6d0)

>     at ../../../repos/binutils-gdb/gdb/inline-frame.c:165

> #3  0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:550

> #4  0x0000aaaaaae19318 in get_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:582

> #5  0x0000aaaaaae13480 in value_of_register_lazy (frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/findvar.c:296

> #6  0x0000aaaaaae16c00 in frame_unwind_got_register (frame=0xaaaaab85a730, regnum=30, new_regnum=30) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:268

> #7  0x0000aaaaaad52604 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=30)

>     at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1296

> #8  0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1229

> #9  0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1320

> #10 0x0000aaaaaab76574 in aarch64_dwarf2_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

>     at ../../../repos/binutils-gdb/gdb/aarch64-tdep.c:1114

> #11 0x0000aaaaaad52724 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

>     at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1316

> #12 0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1229

> #13 0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1320

> #14 0x0000aaaaaae16a84 in default_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:223

> #15 0x0000aaaaaae32124 in gdbarch_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/gdbarch.c:3074

> #16 0x0000aaaaaad4f15c in dwarf2_tailcall_sniffer_first (this_frame=0xaaaaab85a730, tailcall_cachep=0xaaaaab85a830, entry_cfa_sp_offsetp=0x0)

>     at ../../../repos/binutils-gdb/gdb/dwarf2/frame-tailcall.c:388

> #17 0x0000aaaaaad520c0 in dwarf2_frame_cache (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748) at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1190

> #18 0x0000aaaaaad52204 in dwarf2_frame_this_id (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, this_id=0xaaaaab85a790)

>     at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1218

> #19 0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame.c:550

> #20 0x0000aaaaaae1c958 in get_prev_frame_if_no_cycle (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:1927

> #21 0x0000aaaaaae1cc44 in get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2006

> FIRST CALL - #22 0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

> #23 0x0000aaaaaae18f68 in skip_artificial_frames (frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:495

> #24 0x0000aaaaaae193e8 in get_stack_frame_id (next_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:596

> #25 0x0000aaaaaae87a54 in process_event_stop_test (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6857

> #26 0x0000aaaaaae86bdc in handle_signal_stop (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6381

> #27 0x0000aaaaaae84fd0 in handle_inferior_event (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:5578

> #28 0x0000aaaaaae81588 in fetch_inferior_event (client_data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:4020

> #29 0x0000aaaaaae5f7fc in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at ../../../repos/binutils-gdb/gdb/inf-loop.c:43

> #30 0x0000aaaaaae8d768 in infrun_async_inferior_event_handler (data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:9377

> #31 0x0000aaaaaabff970 in check_async_event_handlers () at ../../../repos/binutils-gdb/gdb/async-event.c:291

> #32 0x0000aaaaab27cbec in gdb_do_one_event () at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:194

> #33 0x0000aaaaaaef1894 in start_event_loop () at ../../../repos/binutils-gdb/gdb/main.c:356

> #34 0x0000aaaaaaef1a04 in captured_command_loop () at ../../../repos/binutils-gdb/gdb/main.c:416

> #35 0x0000aaaaaaef3338 in captured_main (data=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1254

> #36 0x0000aaaaaaef33a0 in gdb_main (args=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1269

> #37 0x0000aaaaaab6e0dc in main (argc=6, argv=0xfffffffff348) at ../../../repos/binutils-gdb/gdb/gdb.c:32

> 

> The following patch addresses this by using a function that unwinds the PC

> from the next (inline) frame directly as opposed to creating a lazy value

> that is bound to the next frame's ID (still not computed).

> 

> I've validated this for aarch64-linux and x86_64-linux by running the

> testsuite.

> 

> Tromey, would you mind checking if this suits your problematic core file

> tailcall scenario?

> 

> gdb/ChangeLog:

> 

> 2020-04-14  Luis Machado  <luis.machado@linaro.org>

> 

> 	* dwarf2/frame-tailcall.c (dwarf2_tailcall_sniffer_first): Use

> 	get_frame_register instead of gdbarch_unwind_pc.

> ---

>  gdb/dwarf2/frame-tailcall.c | 4 +++-

>  1 file changed, 3 insertions(+), 1 deletion(-)

> 

> diff --git a/gdb/dwarf2/frame-tailcall.c b/gdb/dwarf2/frame-tailcall.c

> index 2d219f13f9..01bb134a5c 100644

> --- a/gdb/dwarf2/frame-tailcall.c

> +++ b/gdb/dwarf2/frame-tailcall.c

> @@ -385,7 +385,9 @@ dwarf2_tailcall_sniffer_first (struct frame_info *this_frame,

>        prev_gdbarch = frame_unwind_arch (this_frame);

>  

>        /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */

> -      prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

> +      get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),

> +			  (gdb_byte *) &prev_pc);

> +      prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);

>  

>        /* call_site_find_chain can throw an exception.  */

>        chain = call_site_find_chain (prev_gdbarch, prev_pc, this_pc);


I'm now no longer convinced that this patch is correct, and I'd like
to reopen the discussion.

Here's what concerns me, we used to make the following call-chain:

  gdbarch_unwind_pc --> frame_unwind_register_unsigned --> frame_unwind_register_value

Now we do this:

  get_frame_register --> frame_unwind_register --> frame_register_unwind --> frame_unwind_register_value

The problem is that gdbarch_unwind_pc' takes an argument 'next_frame',
while, get_frame_register takes an argument called frame', but is
really 'this_frame', it then passes 'frame->next' to
'frame_unwind_register'.

What this means is that if we have a call stack like this:

  #3 --> #2 --> #1 --> #0

And we invoke the tail-call sniffer in frame #1, previously we figured
out the $pc value in frame #2, while now we figure out the $pc value
in frame #1.

I'm even more convinced that this is an error based on the fix patch
you applied later:

  commit 991a3e2e9944a4b3a27bd989ac03c18285bd545d
  Date:   Sat Apr 25 00:32:44 2020 -0300

      Fix remaining inline/tailcall unwinding breakage for x86_64

This basically sent all but a select few cases down the old code path,
while restricting just a few cases to the new path.

I ran the testsuite (on x86-64/Linux) looking for cases where the new
code actually triggers and there are just 2.  Remember that this code
is use the $pc value to identify tail-call chains.

In both of the cases I found, both _before_ and _after_ your change, a
tail-call chain was not identified.  What this means is that even if
your code is returning the wrong value, it's not going to cause a test
regression.

Finally, if you catch the cases where your new code triggers, and then
step into call_site_find_chain (which is called later in the sniffer),
you'll see that this function is passed a caller address and a callee
address.  The callee address passed in is 'this_pc', in our example
above, this is the current address in #1.  We previously used to
compute the address is #2, which makes sense, we're looking for a
chain of tail-calls that gets us from #2 to #1.

However, after your change we're now simply passing in the address in
#1 as both the caller and the callee address, which makes no sense (to
me, right now).

I'm still investigating at the moment.  Right now I have more
questions than answer, but I wanted to raise my concerns in case I'm
just totally missing something obvious and you can set me straight.

Thanks,
Andrew
Andrew Burgess June 18, 2020, 5:29 p.m. | #15
* Andrew Burgess <andrew.burgess@embecosm.com> [2020-06-18 17:58:55 +0100]:

> * Luis Machado via Gdb-patches <gdb-patches@sourceware.org> [2020-04-14 18:38:36 -0300]:

> 

> > *** re-sending due to the poor choice of characters for the backtrace

> > annotations. GIT swallowed parts of it.

> > 

> > There has been some breakage for aarch64-linux, arm-linux and s390-linux in

> > terms of inline frame unwinding. There may be other targets, but these are

> > the ones i'm aware of.

> > 

> > The following testcases started to show numerous failures and trigger internal

> > errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

> > "Find tailcall frames before inline frames".

> > 

> > gdb.opt/inline-break.exp

> > gdb.opt/inline-cmds.exp

> > gdb.python/py-frame-inline.exp

> > gdb.reverse/insn-reverse.exp

> > 

> > The internal errors were of this kind:

> > 

> > binutils-gdb/gdb/frame.c:579: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

> > 

> > After a lengthy investigation to try and find the cause of these assertions,

> > it seems we're dealing with some fragile/poorly documented code to handle inline

> > frames and we are attempting to unwind from this fragile section of code.

> > 

> > Before commit 1009d92fc621bc4d017029b90a5bfab16e17fde5, the tailcall sniffer

> > was invoked from dwarf2_frame_prev_register. By the time we invoke the

> > dwarf2_frame_prev_register function, we've probably already calculated the

> > frame id (via compute_frame_id).

> > 

> > After said commit, the call to dwarf2_tailcall_sniffer_first was moved to

> > dwarf2_frame_cache. This is very early in a frame creation process, and

> > we're still calculating the frame ID (so compute_frame_id is in the call

> > stack).

> > 

> > This would be fine for regular frames, but the above testcases all deal

> > with some inline frames.

> > 

> > The particularity of inline frames is that their frame ID's depend on

> > the previous frame's ID, and the previous frame's ID relies in the inline

> > frame's registers. So it is a bit of a messy situation.

> > 

> > We have comments in various parts of the code warning about some of these

> > particularities.

> > 

> > In the case of dwarf2_tailcall_sniffer_first, we attempt to unwind the PC,

> > which goes through various functions until we eventually invoke

> > frame_unwind_got_register. This function will eventually attempt to create

> > a lazy value for a particular register, and this lazy value will require

> > a valid frame ID.  Since the inline frame doesn't have a valid frame ID

> > yet (remember we're still calculating the previous frame's ID so we can tell

> > what the inline frame ID is) we will call compute_frame_id for the inline

> > frame (level 0).

> > 

> > We'll eventually hit the assertion above, inside get_frame_id:

> > 

> > --

> >       /* If we haven't computed the frame id yet, then it must be that

> >          this is the current frame.  Compute it now, and stash the

> >          result.  The IDs of other frames are computed as soon as

> >          they're created, in order to detect cycles.  See

> >          get_prev_frame_if_no_cycle.  */

> >       gdb_assert (fi->level == 0);

> > --

> > 

> > It seems to me we shouldn't have reached this assertion without having the

> > inline frame ID already calculated. In fact, it seems we even start recursing

> > a bit when we invoke get_prev_frame_always within inline_frame_this_id. But

> > a check makes us quit the recursion and proceed to compute the id.

> > 

> > Here's the call stack for context:

> > 

> > #0  get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2109

> > RECURSION - #1  0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

> > #2  0x0000aaaaaae95768 in inline_frame_this_id (this_frame=0xaaaaab85a670, this_cache=0xaaaaab85a688, this_id=0xaaaaab85a6d0)

> >     at ../../../repos/binutils-gdb/gdb/inline-frame.c:165

> > #3  0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:550

> > #4  0x0000aaaaaae19318 in get_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:582

> > #5  0x0000aaaaaae13480 in value_of_register_lazy (frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/findvar.c:296

> > #6  0x0000aaaaaae16c00 in frame_unwind_got_register (frame=0xaaaaab85a730, regnum=30, new_regnum=30) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:268

> > #7  0x0000aaaaaad52604 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=30)

> >     at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1296

> > #8  0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1229

> > #9  0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1320

> > #10 0x0000aaaaaab76574 in aarch64_dwarf2_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

> >     at ../../../repos/binutils-gdb/gdb/aarch64-tdep.c:1114

> > #11 0x0000aaaaaad52724 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

> >     at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1316

> > #12 0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1229

> > #13 0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1320

> > #14 0x0000aaaaaae16a84 in default_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:223

> > #15 0x0000aaaaaae32124 in gdbarch_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/gdbarch.c:3074

> > #16 0x0000aaaaaad4f15c in dwarf2_tailcall_sniffer_first (this_frame=0xaaaaab85a730, tailcall_cachep=0xaaaaab85a830, entry_cfa_sp_offsetp=0x0)

> >     at ../../../repos/binutils-gdb/gdb/dwarf2/frame-tailcall.c:388

> > #17 0x0000aaaaaad520c0 in dwarf2_frame_cache (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748) at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1190

> > #18 0x0000aaaaaad52204 in dwarf2_frame_this_id (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, this_id=0xaaaaab85a790)

> >     at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1218

> > #19 0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame.c:550

> > #20 0x0000aaaaaae1c958 in get_prev_frame_if_no_cycle (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:1927

> > #21 0x0000aaaaaae1cc44 in get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2006

> > FIRST CALL - #22 0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

> > #23 0x0000aaaaaae18f68 in skip_artificial_frames (frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:495

> > #24 0x0000aaaaaae193e8 in get_stack_frame_id (next_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:596

> > #25 0x0000aaaaaae87a54 in process_event_stop_test (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6857

> > #26 0x0000aaaaaae86bdc in handle_signal_stop (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6381

> > #27 0x0000aaaaaae84fd0 in handle_inferior_event (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:5578

> > #28 0x0000aaaaaae81588 in fetch_inferior_event (client_data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:4020

> > #29 0x0000aaaaaae5f7fc in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at ../../../repos/binutils-gdb/gdb/inf-loop.c:43

> > #30 0x0000aaaaaae8d768 in infrun_async_inferior_event_handler (data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:9377

> > #31 0x0000aaaaaabff970 in check_async_event_handlers () at ../../../repos/binutils-gdb/gdb/async-event.c:291

> > #32 0x0000aaaaab27cbec in gdb_do_one_event () at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:194

> > #33 0x0000aaaaaaef1894 in start_event_loop () at ../../../repos/binutils-gdb/gdb/main.c:356

> > #34 0x0000aaaaaaef1a04 in captured_command_loop () at ../../../repos/binutils-gdb/gdb/main.c:416

> > #35 0x0000aaaaaaef3338 in captured_main (data=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1254

> > #36 0x0000aaaaaaef33a0 in gdb_main (args=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1269

> > #37 0x0000aaaaaab6e0dc in main (argc=6, argv=0xfffffffff348) at ../../../repos/binutils-gdb/gdb/gdb.c:32

> > 

> > The following patch addresses this by using a function that unwinds the PC

> > from the next (inline) frame directly as opposed to creating a lazy value

> > that is bound to the next frame's ID (still not computed).

> > 

> > I've validated this for aarch64-linux and x86_64-linux by running the

> > testsuite.

> > 

> > Tromey, would you mind checking if this suits your problematic core file

> > tailcall scenario?

> > 

> > gdb/ChangeLog:

> > 

> > 2020-04-14  Luis Machado  <luis.machado@linaro.org>

> > 

> > 	* dwarf2/frame-tailcall.c (dwarf2_tailcall_sniffer_first): Use

> > 	get_frame_register instead of gdbarch_unwind_pc.

> > ---

> >  gdb/dwarf2/frame-tailcall.c | 4 +++-

> >  1 file changed, 3 insertions(+), 1 deletion(-)

> > 

> > diff --git a/gdb/dwarf2/frame-tailcall.c b/gdb/dwarf2/frame-tailcall.c

> > index 2d219f13f9..01bb134a5c 100644

> > --- a/gdb/dwarf2/frame-tailcall.c

> > +++ b/gdb/dwarf2/frame-tailcall.c

> > @@ -385,7 +385,9 @@ dwarf2_tailcall_sniffer_first (struct frame_info *this_frame,

> >        prev_gdbarch = frame_unwind_arch (this_frame);

> >  

> >        /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */

> > -      prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

> > +      get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),

> > +			  (gdb_byte *) &prev_pc);

> > +      prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);

> >  

> >        /* call_site_find_chain can throw an exception.  */

> >        chain = call_site_find_chain (prev_gdbarch, prev_pc, this_pc);

> 

> I'm now no longer convinced that this patch is correct, and I'd like

> to reopen the discussion.

> 

> Here's what concerns me, we used to make the following call-chain:

> 

>   gdbarch_unwind_pc --> frame_unwind_register_unsigned --> frame_unwind_register_value

> 

> Now we do this:

> 

>   get_frame_register --> frame_unwind_register --> frame_register_unwind --> frame_unwind_register_value

> 

> The problem is that gdbarch_unwind_pc' takes an argument 'next_frame',

> while, get_frame_register takes an argument called frame', but is

> really 'this_frame', it then passes 'frame->next' to

> 'frame_unwind_register'.

> 

> What this means is that if we have a call stack like this:

> 

>   #3 --> #2 --> #1 --> #0

> 

> And we invoke the tail-call sniffer in frame #1, previously we figured

> out the $pc value in frame #2, while now we figure out the $pc value

> in frame #1.

> 

> I'm even more convinced that this is an error based on the fix patch

> you applied later:

> 

>   commit 991a3e2e9944a4b3a27bd989ac03c18285bd545d

>   Date:   Sat Apr 25 00:32:44 2020 -0300

> 

>       Fix remaining inline/tailcall unwinding breakage for x86_64

> 

> This basically sent all but a select few cases down the old code path,

> while restricting just a few cases to the new path.

> 

> I ran the testsuite (on x86-64/Linux) looking for cases where the new

> code actually triggers and there are just 2.  Remember that this code

> is use the $pc value to identify tail-call chains.

> 

> In both of the cases I found, both _before_ and _after_ your change, a

> tail-call chain was not identified.  What this means is that even if

> your code is returning the wrong value, it's not going to cause a test

> regression.

> 

> Finally, if you catch the cases where your new code triggers, and then

> step into call_site_find_chain (which is called later in the sniffer),

> you'll see that this function is passed a caller address and a callee

> address.  The callee address passed in is 'this_pc', in our example

> above, this is the current address in #1.  We previously used to

> compute the address is #2, which makes sense, we're looking for a

> chain of tail-calls that gets us from #2 to #1.

> 

> However, after your change we're now simply passing in the address in

> #1 as both the caller and the callee address, which makes no sense (to

> me, right now).

> 

> I'm still investigating at the moment.  Right now I have more

> questions than answer, but I wanted to raise my concerns in case I'm

> just totally missing something obvious and you can set me straight.


Patch below is a test case that reveals the issue.  You'll notice that
if you revert this patch then there's an extra frame in the backtrace
that is missing with this patch.

Still looking into what the right fix here is, but would welcome
discussion.

[ It just occurred to me that the test case could end up being target
and compiler version dependent.  I'm on X86-64/Linux with GCC version
'gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)'. ]

Thanks,
Andrew

---

commit 566b2b1da20e461cee2798f3eda741c1e31bdff6
Author: Andrew Burgess <andrew.burgess@embecosm.com>
Date:   Thu Jun 18 18:25:00 2020 +0100

    gdb/testsuite: Test case for inline func, tailcall bug

diff --git a/gdb/testsuite/gdb.opt/inline-frame-tailcall.c b/gdb/testsuite/gdb.opt/inline-frame-tailcall.c
new file mode 100644
index 00000000000..2513c257a29
--- /dev/null
+++ b/gdb/testsuite/gdb.opt/inline-frame-tailcall.c
@@ -0,0 +1,52 @@
+/* Copyright 2020 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#ifdef __GNUC__
+# define ATTR_INLINE __attribute__((gnu_inline)) __attribute__((always_inline)) __attribute__((noclone))
+#else
+# define ATTR_INLINE
+#endif
+
+volatile int global;
+
+volatile int counter;
+
+static inline ATTR_INLINE int
+bar ()
+{
+  /* Just some filler.  */
+  for (counter = 0; counter < 10; ++counter)
+    global = 0;
+  return 0;
+}
+
+__attribute__ ((noinline)) int
+foo ()
+{
+  return bar ();
+}
+
+__attribute__ ((noinline)) int
+test_func ()
+{
+  return foo ();
+}
+
+int
+main ()
+{
+  global = test_func ();
+  return (global * 2);
+}
diff --git a/gdb/testsuite/gdb.opt/inline-frame-tailcall.exp b/gdb/testsuite/gdb.opt/inline-frame-tailcall.exp
new file mode 100644
index 00000000000..bac96835d12
--- /dev/null
+++ b/gdb/testsuite/gdb.opt/inline-frame-tailcall.exp
@@ -0,0 +1,47 @@
+# Copyright 2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests a specific combination, a tailcall into a function, which then
+# has another function inlined within it.  So:
+#
+#   main --> test_func --> foo --> bar
+#
+#   main makes a normal call to test_func.
+#
+#   test_func makes a tail call to foo.
+#
+#   bar is inlined within foo.
+#
+# We should still see test_func in the call stack.
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" ${testfile} \
+	  [list $srcfile] {debug optimize=-O2}] } {
+    return -1
+}
+
+if ![runto_main] {
+    return -1
+}
+
+gdb_breakpoint "bar"
+gdb_continue_to_breakpoint "bar"
+
+gdb_test "bt" \
+    [multi_line "#0  bar \\(\\).*" \
+	        "#1  foo \\(\\).*" \
+	 	"#2  $hex in test_func \\(\\).*" \
+	 	"#3  $hex in main \\(\\).*" ]
Andrew Burgess June 18, 2020, 5:40 p.m. | #16
* Andrew Burgess <andrew.burgess@embecosm.com> [2020-06-18 18:29:22 +0100]:

> * Andrew Burgess <andrew.burgess@embecosm.com> [2020-06-18 17:58:55 +0100]:

> 

> > * Luis Machado via Gdb-patches <gdb-patches@sourceware.org> [2020-04-14 18:38:36 -0300]:

> > 

> > > *** re-sending due to the poor choice of characters for the backtrace

> > > annotations. GIT swallowed parts of it.

> > > 

> > > There has been some breakage for aarch64-linux, arm-linux and s390-linux in

> > > terms of inline frame unwinding. There may be other targets, but these are

> > > the ones i'm aware of.

> > > 

> > > The following testcases started to show numerous failures and trigger internal

> > > errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

> > > "Find tailcall frames before inline frames".

> > > 

> > > gdb.opt/inline-break.exp

> > > gdb.opt/inline-cmds.exp

> > > gdb.python/py-frame-inline.exp

> > > gdb.reverse/insn-reverse.exp

> > > 

> > > The internal errors were of this kind:

> > > 

> > > binutils-gdb/gdb/frame.c:579: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

> > > 

> > > After a lengthy investigation to try and find the cause of these assertions,

> > > it seems we're dealing with some fragile/poorly documented code to handle inline

> > > frames and we are attempting to unwind from this fragile section of code.

> > > 

> > > Before commit 1009d92fc621bc4d017029b90a5bfab16e17fde5, the tailcall sniffer

> > > was invoked from dwarf2_frame_prev_register. By the time we invoke the

> > > dwarf2_frame_prev_register function, we've probably already calculated the

> > > frame id (via compute_frame_id).

> > > 

> > > After said commit, the call to dwarf2_tailcall_sniffer_first was moved to

> > > dwarf2_frame_cache. This is very early in a frame creation process, and

> > > we're still calculating the frame ID (so compute_frame_id is in the call

> > > stack).

> > > 

> > > This would be fine for regular frames, but the above testcases all deal

> > > with some inline frames.

> > > 

> > > The particularity of inline frames is that their frame ID's depend on

> > > the previous frame's ID, and the previous frame's ID relies in the inline

> > > frame's registers. So it is a bit of a messy situation.

> > > 

> > > We have comments in various parts of the code warning about some of these

> > > particularities.

> > > 

> > > In the case of dwarf2_tailcall_sniffer_first, we attempt to unwind the PC,

> > > which goes through various functions until we eventually invoke

> > > frame_unwind_got_register. This function will eventually attempt to create

> > > a lazy value for a particular register, and this lazy value will require

> > > a valid frame ID.  Since the inline frame doesn't have a valid frame ID

> > > yet (remember we're still calculating the previous frame's ID so we can tell

> > > what the inline frame ID is) we will call compute_frame_id for the inline

> > > frame (level 0).

> > > 

> > > We'll eventually hit the assertion above, inside get_frame_id:

> > > 

> > > --

> > >       /* If we haven't computed the frame id yet, then it must be that

> > >          this is the current frame.  Compute it now, and stash the

> > >          result.  The IDs of other frames are computed as soon as

> > >          they're created, in order to detect cycles.  See

> > >          get_prev_frame_if_no_cycle.  */

> > >       gdb_assert (fi->level == 0);

> > > --

> > > 

> > > It seems to me we shouldn't have reached this assertion without having the

> > > inline frame ID already calculated. In fact, it seems we even start recursing

> > > a bit when we invoke get_prev_frame_always within inline_frame_this_id. But

> > > a check makes us quit the recursion and proceed to compute the id.

> > > 

> > > Here's the call stack for context:

> > > 

> > > #0  get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2109

> > > RECURSION - #1  0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

> > > #2  0x0000aaaaaae95768 in inline_frame_this_id (this_frame=0xaaaaab85a670, this_cache=0xaaaaab85a688, this_id=0xaaaaab85a6d0)

> > >     at ../../../repos/binutils-gdb/gdb/inline-frame.c:165

> > > #3  0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:550

> > > #4  0x0000aaaaaae19318 in get_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:582

> > > #5  0x0000aaaaaae13480 in value_of_register_lazy (frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/findvar.c:296

> > > #6  0x0000aaaaaae16c00 in frame_unwind_got_register (frame=0xaaaaab85a730, regnum=30, new_regnum=30) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:268

> > > #7  0x0000aaaaaad52604 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=30)

> > >     at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1296

> > > #8  0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1229

> > > #9  0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1320

> > > #10 0x0000aaaaaab76574 in aarch64_dwarf2_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

> > >     at ../../../repos/binutils-gdb/gdb/aarch64-tdep.c:1114

> > > #11 0x0000aaaaaad52724 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

> > >     at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1316

> > > #12 0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1229

> > > #13 0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1320

> > > #14 0x0000aaaaaae16a84 in default_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:223

> > > #15 0x0000aaaaaae32124 in gdbarch_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/gdbarch.c:3074

> > > #16 0x0000aaaaaad4f15c in dwarf2_tailcall_sniffer_first (this_frame=0xaaaaab85a730, tailcall_cachep=0xaaaaab85a830, entry_cfa_sp_offsetp=0x0)

> > >     at ../../../repos/binutils-gdb/gdb/dwarf2/frame-tailcall.c:388

> > > #17 0x0000aaaaaad520c0 in dwarf2_frame_cache (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748) at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1190

> > > #18 0x0000aaaaaad52204 in dwarf2_frame_this_id (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, this_id=0xaaaaab85a790)

> > >     at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1218

> > > #19 0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame.c:550

> > > #20 0x0000aaaaaae1c958 in get_prev_frame_if_no_cycle (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:1927

> > > #21 0x0000aaaaaae1cc44 in get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2006

> > > FIRST CALL - #22 0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

> > > #23 0x0000aaaaaae18f68 in skip_artificial_frames (frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:495

> > > #24 0x0000aaaaaae193e8 in get_stack_frame_id (next_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:596

> > > #25 0x0000aaaaaae87a54 in process_event_stop_test (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6857

> > > #26 0x0000aaaaaae86bdc in handle_signal_stop (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6381

> > > #27 0x0000aaaaaae84fd0 in handle_inferior_event (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:5578

> > > #28 0x0000aaaaaae81588 in fetch_inferior_event (client_data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:4020

> > > #29 0x0000aaaaaae5f7fc in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at ../../../repos/binutils-gdb/gdb/inf-loop.c:43

> > > #30 0x0000aaaaaae8d768 in infrun_async_inferior_event_handler (data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:9377

> > > #31 0x0000aaaaaabff970 in check_async_event_handlers () at ../../../repos/binutils-gdb/gdb/async-event.c:291

> > > #32 0x0000aaaaab27cbec in gdb_do_one_event () at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:194

> > > #33 0x0000aaaaaaef1894 in start_event_loop () at ../../../repos/binutils-gdb/gdb/main.c:356

> > > #34 0x0000aaaaaaef1a04 in captured_command_loop () at ../../../repos/binutils-gdb/gdb/main.c:416

> > > #35 0x0000aaaaaaef3338 in captured_main (data=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1254

> > > #36 0x0000aaaaaaef33a0 in gdb_main (args=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1269

> > > #37 0x0000aaaaaab6e0dc in main (argc=6, argv=0xfffffffff348) at ../../../repos/binutils-gdb/gdb/gdb.c:32

> > > 

> > > The following patch addresses this by using a function that unwinds the PC

> > > from the next (inline) frame directly as opposed to creating a lazy value

> > > that is bound to the next frame's ID (still not computed).

> > > 

> > > I've validated this for aarch64-linux and x86_64-linux by running the

> > > testsuite.

> > > 

> > > Tromey, would you mind checking if this suits your problematic core file

> > > tailcall scenario?

> > > 

> > > gdb/ChangeLog:

> > > 

> > > 2020-04-14  Luis Machado  <luis.machado@linaro.org>

> > > 

> > > 	* dwarf2/frame-tailcall.c (dwarf2_tailcall_sniffer_first): Use

> > > 	get_frame_register instead of gdbarch_unwind_pc.

> > > ---

> > >  gdb/dwarf2/frame-tailcall.c | 4 +++-

> > >  1 file changed, 3 insertions(+), 1 deletion(-)

> > > 

> > > diff --git a/gdb/dwarf2/frame-tailcall.c b/gdb/dwarf2/frame-tailcall.c

> > > index 2d219f13f9..01bb134a5c 100644

> > > --- a/gdb/dwarf2/frame-tailcall.c

> > > +++ b/gdb/dwarf2/frame-tailcall.c

> > > @@ -385,7 +385,9 @@ dwarf2_tailcall_sniffer_first (struct frame_info *this_frame,

> > >        prev_gdbarch = frame_unwind_arch (this_frame);

> > >  

> > >        /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */

> > > -      prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

> > > +      get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),

> > > +			  (gdb_byte *) &prev_pc);

> > > +      prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);

> > >  

> > >        /* call_site_find_chain can throw an exception.  */

> > >        chain = call_site_find_chain (prev_gdbarch, prev_pc, this_pc);

> > 

> > I'm now no longer convinced that this patch is correct, and I'd like

> > to reopen the discussion.

> > 

> > Here's what concerns me, we used to make the following call-chain:

> > 

> >   gdbarch_unwind_pc --> frame_unwind_register_unsigned --> frame_unwind_register_value

> > 

> > Now we do this:

> > 

> >   get_frame_register --> frame_unwind_register --> frame_register_unwind --> frame_unwind_register_value

> > 

> > The problem is that gdbarch_unwind_pc' takes an argument 'next_frame',

> > while, get_frame_register takes an argument called frame', but is

> > really 'this_frame', it then passes 'frame->next' to

> > 'frame_unwind_register'.

> > 

> > What this means is that if we have a call stack like this:

> > 

> >   #3 --> #2 --> #1 --> #0

> > 

> > And we invoke the tail-call sniffer in frame #1, previously we figured

> > out the $pc value in frame #2, while now we figure out the $pc value

> > in frame #1.

> > 

> > I'm even more convinced that this is an error based on the fix patch

> > you applied later:

> > 

> >   commit 991a3e2e9944a4b3a27bd989ac03c18285bd545d

> >   Date:   Sat Apr 25 00:32:44 2020 -0300

> > 

> >       Fix remaining inline/tailcall unwinding breakage for x86_64

> > 


After this commit the code looks like this:

      /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */
      if (next_frame != nullptr && get_frame_type (next_frame) == INLINE_FRAME
	  && !frame_id_computed_p (next_frame))
	{
	  /* The next frame is an inline frame and its frame id has not been
	     computed yet.  */
	  get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),
			      (gdb_byte *) &prev_pc);
	  prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);
	}
      else
	prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

Now the point of this function is to decide if the frame we are in
_right_now_ was tail called into by it's "apparent" caller.  That is
when we unwind we have:

  main --> foo --> bar

And in foo we ask, can we build a tailcall chain that gets us from
main to foo.  So in this case, we get the answer yes, and the chain
returned represents:

  main --> test_func --> foo

However, my thinking is that we know we're not in the position of
'foo' (that is reached by a tail call) if, foo is an inline frame -
right?

So, I wonder if it's as simple as saying:

  /* We know that THIS_FRAME was not reached by a tail call if
     THIS_FRAME is an inline frame.  */
  if (get_frame_type (this_frame) == INLINE_FRAME)
    return;

That's totally untested, just a random thought....  I wonder if such a
change would fix the original failures you saw?

Thanks,
Andrew
Rogerio Alves via Gdb-patches June 18, 2020, 5:45 p.m. | #17
On 6/18/20 2:29 PM, Andrew Burgess wrote:
> * Andrew Burgess <andrew.burgess@embecosm.com> [2020-06-18 17:58:55 +0100]:

> 

>> * Luis Machado via Gdb-patches <gdb-patches@sourceware.org> [2020-04-14 18:38:36 -0300]:

>>

>>> *** re-sending due to the poor choice of characters for the backtrace

>>> annotations. GIT swallowed parts of it.

>>>

>>> There has been some breakage for aarch64-linux, arm-linux and s390-linux in

>>> terms of inline frame unwinding. There may be other targets, but these are

>>> the ones i'm aware of.

>>>

>>> The following testcases started to show numerous failures and trigger internal

>>> errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

>>> "Find tailcall frames before inline frames".

>>>

>>> gdb.opt/inline-break.exp

>>> gdb.opt/inline-cmds.exp

>>> gdb.python/py-frame-inline.exp

>>> gdb.reverse/insn-reverse.exp

>>>

>>> The internal errors were of this kind:

>>>

>>> binutils-gdb/gdb/frame.c:579: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

>>>

>>> After a lengthy investigation to try and find the cause of these assertions,

>>> it seems we're dealing with some fragile/poorly documented code to handle inline

>>> frames and we are attempting to unwind from this fragile section of code.

>>>

>>> Before commit 1009d92fc621bc4d017029b90a5bfab16e17fde5, the tailcall sniffer

>>> was invoked from dwarf2_frame_prev_register. By the time we invoke the

>>> dwarf2_frame_prev_register function, we've probably already calculated the

>>> frame id (via compute_frame_id).

>>>

>>> After said commit, the call to dwarf2_tailcall_sniffer_first was moved to

>>> dwarf2_frame_cache. This is very early in a frame creation process, and

>>> we're still calculating the frame ID (so compute_frame_id is in the call

>>> stack).

>>>

>>> This would be fine for regular frames, but the above testcases all deal

>>> with some inline frames.

>>>

>>> The particularity of inline frames is that their frame ID's depend on

>>> the previous frame's ID, and the previous frame's ID relies in the inline

>>> frame's registers. So it is a bit of a messy situation.

>>>

>>> We have comments in various parts of the code warning about some of these

>>> particularities.

>>>

>>> In the case of dwarf2_tailcall_sniffer_first, we attempt to unwind the PC,

>>> which goes through various functions until we eventually invoke

>>> frame_unwind_got_register. This function will eventually attempt to create

>>> a lazy value for a particular register, and this lazy value will require

>>> a valid frame ID.  Since the inline frame doesn't have a valid frame ID

>>> yet (remember we're still calculating the previous frame's ID so we can tell

>>> what the inline frame ID is) we will call compute_frame_id for the inline

>>> frame (level 0).

>>>

>>> We'll eventually hit the assertion above, inside get_frame_id:

>>>

>>> --

>>>        /* If we haven't computed the frame id yet, then it must be that

>>>           this is the current frame.  Compute it now, and stash the

>>>           result.  The IDs of other frames are computed as soon as

>>>           they're created, in order to detect cycles.  See

>>>           get_prev_frame_if_no_cycle.  */

>>>        gdb_assert (fi->level == 0);

>>> --

>>>

>>> It seems to me we shouldn't have reached this assertion without having the

>>> inline frame ID already calculated. In fact, it seems we even start recursing

>>> a bit when we invoke get_prev_frame_always within inline_frame_this_id. But

>>> a check makes us quit the recursion and proceed to compute the id.

>>>

>>> Here's the call stack for context:

>>>

>>> #0  get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2109

>>> RECURSION - #1  0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

>>> #2  0x0000aaaaaae95768 in inline_frame_this_id (this_frame=0xaaaaab85a670, this_cache=0xaaaaab85a688, this_id=0xaaaaab85a6d0)

>>>      at ../../../repos/binutils-gdb/gdb/inline-frame.c:165

>>> #3  0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:550

>>> #4  0x0000aaaaaae19318 in get_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:582

>>> #5  0x0000aaaaaae13480 in value_of_register_lazy (frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/findvar.c:296

>>> #6  0x0000aaaaaae16c00 in frame_unwind_got_register (frame=0xaaaaab85a730, regnum=30, new_regnum=30) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:268

>>> #7  0x0000aaaaaad52604 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=30)

>>>      at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1296

>>> #8  0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1229

>>> #9  0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1320

>>> #10 0x0000aaaaaab76574 in aarch64_dwarf2_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

>>>      at ../../../repos/binutils-gdb/gdb/aarch64-tdep.c:1114

>>> #11 0x0000aaaaaad52724 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

>>>      at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1316

>>> #12 0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1229

>>> #13 0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1320

>>> #14 0x0000aaaaaae16a84 in default_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:223

>>> #15 0x0000aaaaaae32124 in gdbarch_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/gdbarch.c:3074

>>> #16 0x0000aaaaaad4f15c in dwarf2_tailcall_sniffer_first (this_frame=0xaaaaab85a730, tailcall_cachep=0xaaaaab85a830, entry_cfa_sp_offsetp=0x0)

>>>      at ../../../repos/binutils-gdb/gdb/dwarf2/frame-tailcall.c:388

>>> #17 0x0000aaaaaad520c0 in dwarf2_frame_cache (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748) at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1190

>>> #18 0x0000aaaaaad52204 in dwarf2_frame_this_id (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, this_id=0xaaaaab85a790)

>>>      at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1218

>>> #19 0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame.c:550

>>> #20 0x0000aaaaaae1c958 in get_prev_frame_if_no_cycle (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:1927

>>> #21 0x0000aaaaaae1cc44 in get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2006

>>> FIRST CALL - #22 0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

>>> #23 0x0000aaaaaae18f68 in skip_artificial_frames (frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:495

>>> #24 0x0000aaaaaae193e8 in get_stack_frame_id (next_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:596

>>> #25 0x0000aaaaaae87a54 in process_event_stop_test (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6857

>>> #26 0x0000aaaaaae86bdc in handle_signal_stop (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6381

>>> #27 0x0000aaaaaae84fd0 in handle_inferior_event (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:5578

>>> #28 0x0000aaaaaae81588 in fetch_inferior_event (client_data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:4020

>>> #29 0x0000aaaaaae5f7fc in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at ../../../repos/binutils-gdb/gdb/inf-loop.c:43

>>> #30 0x0000aaaaaae8d768 in infrun_async_inferior_event_handler (data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:9377

>>> #31 0x0000aaaaaabff970 in check_async_event_handlers () at ../../../repos/binutils-gdb/gdb/async-event.c:291

>>> #32 0x0000aaaaab27cbec in gdb_do_one_event () at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:194

>>> #33 0x0000aaaaaaef1894 in start_event_loop () at ../../../repos/binutils-gdb/gdb/main.c:356

>>> #34 0x0000aaaaaaef1a04 in captured_command_loop () at ../../../repos/binutils-gdb/gdb/main.c:416

>>> #35 0x0000aaaaaaef3338 in captured_main (data=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1254

>>> #36 0x0000aaaaaaef33a0 in gdb_main (args=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1269

>>> #37 0x0000aaaaaab6e0dc in main (argc=6, argv=0xfffffffff348) at ../../../repos/binutils-gdb/gdb/gdb.c:32

>>>

>>> The following patch addresses this by using a function that unwinds the PC

>>> from the next (inline) frame directly as opposed to creating a lazy value

>>> that is bound to the next frame's ID (still not computed).

>>>

>>> I've validated this for aarch64-linux and x86_64-linux by running the

>>> testsuite.

>>>

>>> Tromey, would you mind checking if this suits your problematic core file

>>> tailcall scenario?

>>>

>>> gdb/ChangeLog:

>>>

>>> 2020-04-14  Luis Machado  <luis.machado@linaro.org>

>>>

>>> 	* dwarf2/frame-tailcall.c (dwarf2_tailcall_sniffer_first): Use

>>> 	get_frame_register instead of gdbarch_unwind_pc.

>>> ---

>>>   gdb/dwarf2/frame-tailcall.c | 4 +++-

>>>   1 file changed, 3 insertions(+), 1 deletion(-)

>>>

>>> diff --git a/gdb/dwarf2/frame-tailcall.c b/gdb/dwarf2/frame-tailcall.c

>>> index 2d219f13f9..01bb134a5c 100644

>>> --- a/gdb/dwarf2/frame-tailcall.c

>>> +++ b/gdb/dwarf2/frame-tailcall.c

>>> @@ -385,7 +385,9 @@ dwarf2_tailcall_sniffer_first (struct frame_info *this_frame,

>>>         prev_gdbarch = frame_unwind_arch (this_frame);

>>>   

>>>         /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */

>>> -      prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

>>> +      get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),

>>> +			  (gdb_byte *) &prev_pc);

>>> +      prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);

>>>   

>>>         /* call_site_find_chain can throw an exception.  */

>>>         chain = call_site_find_chain (prev_gdbarch, prev_pc, this_pc);

>>

>> I'm now no longer convinced that this patch is correct, and I'd like

>> to reopen the discussion.

>>

>> Here's what concerns me, we used to make the following call-chain:

>>

>>    gdbarch_unwind_pc --> frame_unwind_register_unsigned --> frame_unwind_register_value

>>

>> Now we do this:

>>

>>    get_frame_register --> frame_unwind_register --> frame_register_unwind --> frame_unwind_register_value

>>

>> The problem is that gdbarch_unwind_pc' takes an argument 'next_frame',

>> while, get_frame_register takes an argument called frame', but is

>> really 'this_frame', it then passes 'frame->next' to

>> 'frame_unwind_register'.

>>

>> What this means is that if we have a call stack like this:

>>

>>    #3 --> #2 --> #1 --> #0

>>

>> And we invoke the tail-call sniffer in frame #1, previously we figured

>> out the $pc value in frame #2, while now we figure out the $pc value

>> in frame #1.

>>

>> I'm even more convinced that this is an error based on the fix patch

>> you applied later:

>>

>>    commit 991a3e2e9944a4b3a27bd989ac03c18285bd545d

>>    Date:   Sat Apr 25 00:32:44 2020 -0300

>>

>>        Fix remaining inline/tailcall unwinding breakage for x86_64

>>

>> This basically sent all but a select few cases down the old code path,

>> while restricting just a few cases to the new path.

>>

>> I ran the testsuite (on x86-64/Linux) looking for cases where the new

>> code actually triggers and there are just 2.  Remember that this code

>> is use the $pc value to identify tail-call chains.

>>

>> In both of the cases I found, both _before_ and _after_ your change, a

>> tail-call chain was not identified.  What this means is that even if

>> your code is returning the wrong value, it's not going to cause a test

>> regression.

>>

>> Finally, if you catch the cases where your new code triggers, and then

>> step into call_site_find_chain (which is called later in the sniffer),

>> you'll see that this function is passed a caller address and a callee

>> address.  The callee address passed in is 'this_pc', in our example

>> above, this is the current address in #1.  We previously used to

>> compute the address is #2, which makes sense, we're looking for a

>> chain of tail-calls that gets us from #2 to #1.

>>

>> However, after your change we're now simply passing in the address in

>> #1 as both the caller and the callee address, which makes no sense (to

>> me, right now).

>>

>> I'm still investigating at the moment.  Right now I have more

>> questions than answer, but I wanted to raise my concerns in case I'm

>> just totally missing something obvious and you can set me straight.

> 

> Patch below is a test case that reveals the issue.  You'll notice that

> if you revert this patch then there's an extra frame in the backtrace

> that is missing with this patch.

> 

> Still looking into what the right fix here is, but would welcome

> discussion.


Thanks! That's a good assessment of the situation.

> 

> [ It just occurred to me that the test case could end up being target

> and compiler version dependent.  I'm on X86-64/Linux with GCC version

> 'gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)'. ]


Yeah, x86_64 seems to produce more CFI information from which PC can be 
recovered from the *current* frame, unlike AArch64, which assumes 
SAME_VALUE and goes to look for PC in the *next* frame.

 From what I recall, we're looking for PC, which translates to fetching 
LR, but LR is SAME_VALUE, so we go looking for LR from the *next* frame, 
and then run into a frame id assertion.

For x86_64 we're looking for PC, which, IIRC, translates to another 
register. That register, in turn, is available through CFI and we 
return. We never attempt to fetch things from the next frame.

The more general problem is attempting to unwind from within a frame 
that is still getting its frame id computed. I can't say that's invalid, 
but some targets may support this (x86_64) and others may not (aarch64, 
s390, riscv, possibly others in specific situations).

> 

> Thanks,

> Andrew

> 

> ---

> 

> commit 566b2b1da20e461cee2798f3eda741c1e31bdff6

> Author: Andrew Burgess <andrew.burgess@embecosm.com>

> Date:   Thu Jun 18 18:25:00 2020 +0100

> 

>      gdb/testsuite: Test case for inline func, tailcall bug

> 

> diff --git a/gdb/testsuite/gdb.opt/inline-frame-tailcall.c b/gdb/testsuite/gdb.opt/inline-frame-tailcall.c

> new file mode 100644

> index 00000000000..2513c257a29

> --- /dev/null

> +++ b/gdb/testsuite/gdb.opt/inline-frame-tailcall.c

> @@ -0,0 +1,52 @@

> +/* Copyright 2020 Free Software Foundation, Inc.

> +

> +   This program is free software; you can redistribute it and/or modify

> +   it under the terms of the GNU General Public License as published by

> +   the Free Software Foundation; either version 3 of the License, or

> +   (at your option) any later version.

> +

> +   This program is distributed in the hope that it will be useful,

> +   but WITHOUT ANY WARRANTY; without even the implied warranty of

> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> +   GNU General Public License for more details.

> +

> +   You should have received a copy of the GNU General Public License

> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */

> +

> +#ifdef __GNUC__

> +# define ATTR_INLINE __attribute__((gnu_inline)) __attribute__((always_inline)) __attribute__((noclone))

> +#else

> +# define ATTR_INLINE

> +#endif

> +

> +volatile int global;

> +

> +volatile int counter;

> +

> +static inline ATTR_INLINE int

> +bar ()

> +{

> +  /* Just some filler.  */

> +  for (counter = 0; counter < 10; ++counter)

> +    global = 0;

> +  return 0;

> +}

> +

> +__attribute__ ((noinline)) int

> +foo ()

> +{

> +  return bar ();

> +}

> +

> +__attribute__ ((noinline)) int

> +test_func ()

> +{

> +  return foo ();

> +}

> +

> +int

> +main ()

> +{

> +  global = test_func ();

> +  return (global * 2);

> +}

> diff --git a/gdb/testsuite/gdb.opt/inline-frame-tailcall.exp b/gdb/testsuite/gdb.opt/inline-frame-tailcall.exp

> new file mode 100644

> index 00000000000..bac96835d12

> --- /dev/null

> +++ b/gdb/testsuite/gdb.opt/inline-frame-tailcall.exp

> @@ -0,0 +1,47 @@

> +# Copyright 2020 Free Software Foundation, Inc.

> +

> +# This program is free software; you can redistribute it and/or modify

> +# it under the terms of the GNU General Public License as published by

> +# the Free Software Foundation; either version 3 of the License, or

> +# (at your option) any later version.

> +#

> +# This program is distributed in the hope that it will be useful,

> +# but WITHOUT ANY WARRANTY; without even the implied warranty of

> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> +# GNU General Public License for more details.

> +#

> +# You should have received a copy of the GNU General Public License

> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.

> +

> +# Tests a specific combination, a tailcall into a function, which then

> +# has another function inlined within it.  So:

> +#

> +#   main --> test_func --> foo --> bar

> +#

> +#   main makes a normal call to test_func.

> +#

> +#   test_func makes a tail call to foo.

> +#

> +#   bar is inlined within foo.

> +#

> +# We should still see test_func in the call stack.

> +

> +standard_testfile

> +

> +if { [prepare_for_testing "failed to prepare" ${testfile} \

> +	  [list $srcfile] {debug optimize=-O2}] } {

> +    return -1

> +}

> +

> +if ![runto_main] {

> +    return -1

> +}

> +

> +gdb_breakpoint "bar"

> +gdb_continue_to_breakpoint "bar"

> +

> +gdb_test "bt" \

> +    [multi_line "#0  bar \\(\\).*" \

> +	        "#1  foo \\(\\).*" \

> +	 	"#2  $hex in test_func \\(\\).*" \

> +	 	"#3  $hex in main \\(\\).*" ]

>
Andrew Burgess June 18, 2020, 6:04 p.m. | #18
* Luis Machado <luis.machado@linaro.org> [2020-06-18 14:45:00 -0300]:

> On 6/18/20 2:29 PM, Andrew Burgess wrote:

> > * Andrew Burgess <andrew.burgess@embecosm.com> [2020-06-18 17:58:55 +0100]:

> > 

> > > * Luis Machado via Gdb-patches <gdb-patches@sourceware.org> [2020-04-14 18:38:36 -0300]:

> > > 

> > > > *** re-sending due to the poor choice of characters for the backtrace

> > > > annotations. GIT swallowed parts of it.

> > > > 

> > > > There has been some breakage for aarch64-linux, arm-linux and s390-linux in

> > > > terms of inline frame unwinding. There may be other targets, but these are

> > > > the ones i'm aware of.

> > > > 

> > > > The following testcases started to show numerous failures and trigger internal

> > > > errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

> > > > "Find tailcall frames before inline frames".

> > > > 

> > > > gdb.opt/inline-break.exp

> > > > gdb.opt/inline-cmds.exp

> > > > gdb.python/py-frame-inline.exp

> > > > gdb.reverse/insn-reverse.exp

> > > > 

> > > > The internal errors were of this kind:

> > > > 

> > > > binutils-gdb/gdb/frame.c:579: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

> > > > 

> > > > After a lengthy investigation to try and find the cause of these assertions,

> > > > it seems we're dealing with some fragile/poorly documented code to handle inline

> > > > frames and we are attempting to unwind from this fragile section of code.

> > > > 

> > > > Before commit 1009d92fc621bc4d017029b90a5bfab16e17fde5, the tailcall sniffer

> > > > was invoked from dwarf2_frame_prev_register. By the time we invoke the

> > > > dwarf2_frame_prev_register function, we've probably already calculated the

> > > > frame id (via compute_frame_id).

> > > > 

> > > > After said commit, the call to dwarf2_tailcall_sniffer_first was moved to

> > > > dwarf2_frame_cache. This is very early in a frame creation process, and

> > > > we're still calculating the frame ID (so compute_frame_id is in the call

> > > > stack).

> > > > 

> > > > This would be fine for regular frames, but the above testcases all deal

> > > > with some inline frames.

> > > > 

> > > > The particularity of inline frames is that their frame ID's depend on

> > > > the previous frame's ID, and the previous frame's ID relies in the inline

> > > > frame's registers. So it is a bit of a messy situation.

> > > > 

> > > > We have comments in various parts of the code warning about some of these

> > > > particularities.

> > > > 

> > > > In the case of dwarf2_tailcall_sniffer_first, we attempt to unwind the PC,

> > > > which goes through various functions until we eventually invoke

> > > > frame_unwind_got_register. This function will eventually attempt to create

> > > > a lazy value for a particular register, and this lazy value will require

> > > > a valid frame ID.  Since the inline frame doesn't have a valid frame ID

> > > > yet (remember we're still calculating the previous frame's ID so we can tell

> > > > what the inline frame ID is) we will call compute_frame_id for the inline

> > > > frame (level 0).

> > > > 

> > > > We'll eventually hit the assertion above, inside get_frame_id:

> > > > 

> > > > --

> > > >        /* If we haven't computed the frame id yet, then it must be that

> > > >           this is the current frame.  Compute it now, and stash the

> > > >           result.  The IDs of other frames are computed as soon as

> > > >           they're created, in order to detect cycles.  See

> > > >           get_prev_frame_if_no_cycle.  */

> > > >        gdb_assert (fi->level == 0);

> > > > --

> > > > 

> > > > It seems to me we shouldn't have reached this assertion without having the

> > > > inline frame ID already calculated. In fact, it seems we even start recursing

> > > > a bit when we invoke get_prev_frame_always within inline_frame_this_id. But

> > > > a check makes us quit the recursion and proceed to compute the id.

> > > > 

> > > > Here's the call stack for context:

> > > > 

> > > > #0  get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2109

> > > > RECURSION - #1  0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

> > > > #2  0x0000aaaaaae95768 in inline_frame_this_id (this_frame=0xaaaaab85a670, this_cache=0xaaaaab85a688, this_id=0xaaaaab85a6d0)

> > > >      at ../../../repos/binutils-gdb/gdb/inline-frame.c:165

> > > > #3  0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:550

> > > > #4  0x0000aaaaaae19318 in get_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:582

> > > > #5  0x0000aaaaaae13480 in value_of_register_lazy (frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/findvar.c:296

> > > > #6  0x0000aaaaaae16c00 in frame_unwind_got_register (frame=0xaaaaab85a730, regnum=30, new_regnum=30) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:268

> > > > #7  0x0000aaaaaad52604 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=30)

> > > >      at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1296

> > > > #8  0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1229

> > > > #9  0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1320

> > > > #10 0x0000aaaaaab76574 in aarch64_dwarf2_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

> > > >      at ../../../repos/binutils-gdb/gdb/aarch64-tdep.c:1114

> > > > #11 0x0000aaaaaad52724 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

> > > >      at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1316

> > > > #12 0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1229

> > > > #13 0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1320

> > > > #14 0x0000aaaaaae16a84 in default_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:223

> > > > #15 0x0000aaaaaae32124 in gdbarch_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/gdbarch.c:3074

> > > > #16 0x0000aaaaaad4f15c in dwarf2_tailcall_sniffer_first (this_frame=0xaaaaab85a730, tailcall_cachep=0xaaaaab85a830, entry_cfa_sp_offsetp=0x0)

> > > >      at ../../../repos/binutils-gdb/gdb/dwarf2/frame-tailcall.c:388

> > > > #17 0x0000aaaaaad520c0 in dwarf2_frame_cache (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748) at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1190

> > > > #18 0x0000aaaaaad52204 in dwarf2_frame_this_id (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, this_id=0xaaaaab85a790)

> > > >      at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1218

> > > > #19 0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame.c:550

> > > > #20 0x0000aaaaaae1c958 in get_prev_frame_if_no_cycle (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:1927

> > > > #21 0x0000aaaaaae1cc44 in get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2006

> > > > FIRST CALL - #22 0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

> > > > #23 0x0000aaaaaae18f68 in skip_artificial_frames (frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:495

> > > > #24 0x0000aaaaaae193e8 in get_stack_frame_id (next_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:596

> > > > #25 0x0000aaaaaae87a54 in process_event_stop_test (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6857

> > > > #26 0x0000aaaaaae86bdc in handle_signal_stop (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6381

> > > > #27 0x0000aaaaaae84fd0 in handle_inferior_event (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:5578

> > > > #28 0x0000aaaaaae81588 in fetch_inferior_event (client_data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:4020

> > > > #29 0x0000aaaaaae5f7fc in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at ../../../repos/binutils-gdb/gdb/inf-loop.c:43

> > > > #30 0x0000aaaaaae8d768 in infrun_async_inferior_event_handler (data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:9377

> > > > #31 0x0000aaaaaabff970 in check_async_event_handlers () at ../../../repos/binutils-gdb/gdb/async-event.c:291

> > > > #32 0x0000aaaaab27cbec in gdb_do_one_event () at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:194

> > > > #33 0x0000aaaaaaef1894 in start_event_loop () at ../../../repos/binutils-gdb/gdb/main.c:356

> > > > #34 0x0000aaaaaaef1a04 in captured_command_loop () at ../../../repos/binutils-gdb/gdb/main.c:416

> > > > #35 0x0000aaaaaaef3338 in captured_main (data=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1254

> > > > #36 0x0000aaaaaaef33a0 in gdb_main (args=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1269

> > > > #37 0x0000aaaaaab6e0dc in main (argc=6, argv=0xfffffffff348) at ../../../repos/binutils-gdb/gdb/gdb.c:32

> > > > 

> > > > The following patch addresses this by using a function that unwinds the PC

> > > > from the next (inline) frame directly as opposed to creating a lazy value

> > > > that is bound to the next frame's ID (still not computed).

> > > > 

> > > > I've validated this for aarch64-linux and x86_64-linux by running the

> > > > testsuite.

> > > > 

> > > > Tromey, would you mind checking if this suits your problematic core file

> > > > tailcall scenario?

> > > > 

> > > > gdb/ChangeLog:

> > > > 

> > > > 2020-04-14  Luis Machado  <luis.machado@linaro.org>

> > > > 

> > > > 	* dwarf2/frame-tailcall.c (dwarf2_tailcall_sniffer_first): Use

> > > > 	get_frame_register instead of gdbarch_unwind_pc.

> > > > ---

> > > >   gdb/dwarf2/frame-tailcall.c | 4 +++-

> > > >   1 file changed, 3 insertions(+), 1 deletion(-)

> > > > 

> > > > diff --git a/gdb/dwarf2/frame-tailcall.c b/gdb/dwarf2/frame-tailcall.c

> > > > index 2d219f13f9..01bb134a5c 100644

> > > > --- a/gdb/dwarf2/frame-tailcall.c

> > > > +++ b/gdb/dwarf2/frame-tailcall.c

> > > > @@ -385,7 +385,9 @@ dwarf2_tailcall_sniffer_first (struct frame_info *this_frame,

> > > >         prev_gdbarch = frame_unwind_arch (this_frame);

> > > >         /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */

> > > > -      prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

> > > > +      get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),

> > > > +			  (gdb_byte *) &prev_pc);

> > > > +      prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);

> > > >         /* call_site_find_chain can throw an exception.  */

> > > >         chain = call_site_find_chain (prev_gdbarch, prev_pc, this_pc);

> > > 

> > > I'm now no longer convinced that this patch is correct, and I'd like

> > > to reopen the discussion.

> > > 

> > > Here's what concerns me, we used to make the following call-chain:

> > > 

> > >    gdbarch_unwind_pc --> frame_unwind_register_unsigned --> frame_unwind_register_value

> > > 

> > > Now we do this:

> > > 

> > >    get_frame_register --> frame_unwind_register --> frame_register_unwind --> frame_unwind_register_value

> > > 

> > > The problem is that gdbarch_unwind_pc' takes an argument 'next_frame',

> > > while, get_frame_register takes an argument called frame', but is

> > > really 'this_frame', it then passes 'frame->next' to

> > > 'frame_unwind_register'.

> > > 

> > > What this means is that if we have a call stack like this:

> > > 

> > >    #3 --> #2 --> #1 --> #0

> > > 

> > > And we invoke the tail-call sniffer in frame #1, previously we figured

> > > out the $pc value in frame #2, while now we figure out the $pc value

> > > in frame #1.

> > > 

> > > I'm even more convinced that this is an error based on the fix patch

> > > you applied later:

> > > 

> > >    commit 991a3e2e9944a4b3a27bd989ac03c18285bd545d

> > >    Date:   Sat Apr 25 00:32:44 2020 -0300

> > > 

> > >        Fix remaining inline/tailcall unwinding breakage for x86_64

> > > 

> > > This basically sent all but a select few cases down the old code path,

> > > while restricting just a few cases to the new path.

> > > 

> > > I ran the testsuite (on x86-64/Linux) looking for cases where the new

> > > code actually triggers and there are just 2.  Remember that this code

> > > is use the $pc value to identify tail-call chains.

> > > 

> > > In both of the cases I found, both _before_ and _after_ your change, a

> > > tail-call chain was not identified.  What this means is that even if

> > > your code is returning the wrong value, it's not going to cause a test

> > > regression.

> > > 

> > > Finally, if you catch the cases where your new code triggers, and then

> > > step into call_site_find_chain (which is called later in the sniffer),

> > > you'll see that this function is passed a caller address and a callee

> > > address.  The callee address passed in is 'this_pc', in our example

> > > above, this is the current address in #1.  We previously used to

> > > compute the address is #2, which makes sense, we're looking for a

> > > chain of tail-calls that gets us from #2 to #1.

> > > 

> > > However, after your change we're now simply passing in the address in

> > > #1 as both the caller and the callee address, which makes no sense (to

> > > me, right now).

> > > 

> > > I'm still investigating at the moment.  Right now I have more

> > > questions than answer, but I wanted to raise my concerns in case I'm

> > > just totally missing something obvious and you can set me straight.

> > 

> > Patch below is a test case that reveals the issue.  You'll notice that

> > if you revert this patch then there's an extra frame in the backtrace

> > that is missing with this patch.

> > 

> > Still looking into what the right fix here is, but would welcome

> > discussion.

> 

> Thanks! That's a good assessment of the situation.

> 

> > 

> > [ It just occurred to me that the test case could end up being target

> > and compiler version dependent.  I'm on X86-64/Linux with GCC version

> > 'gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)'. ]

> 

> Yeah, x86_64 seems to produce more CFI information from which PC can be

> recovered from the *current* frame, unlike AArch64, which assumes SAME_VALUE

> and goes to look for PC in the *next* frame.

> 

> From what I recall, we're looking for PC, which translates to fetching LR,

> but LR is SAME_VALUE, so we go looking for LR from the *next* frame, and

> then run into a frame id assertion.

> 

> For x86_64 we're looking for PC, which, IIRC, translates to another

> register. That register, in turn, is available through CFI and we return. We

> never attempt to fetch things from the next frame.

> 

> The more general problem is attempting to unwind from within a frame that is

> still getting its frame id computed. I can't say that's invalid, but some

> targets may support this (x86_64) and others may not (aarch64, s390, riscv,

> possibly others in specific situations).


I suspect it might be more that for x86-64 the lazy value returned is
a lazy memory reference, while for architectures with a link-register
we create a lazy register value, and then we're in trouble.

I don't think we should ever try to unwind the previous value of a
register from a frame that doesn't yet know its frame-id.  That just
sounds crazy.  IIRC the process is:

  - For each possible unwinder
    - Install the unwinder on the frame
    - Sniff frame
    - If claimed then:
      - Compute frame-id
      - Break out of loop
      Else:
      - Clean up after failed sniffing
    - Loop around and try next unwinder
  - Use frame which has a valid unwinder in place.

So if we don't yet have a frame-id then we don't yet know for sure
that the unwinder that is in place is actually correct, we're just
hoping for the best.

I wonder if we should actually assert in all the frame_unwind_got_*
functions that the frame we 'got' the value for is not lazy.  This
might throw up a few issues, but surely anywhere that we might
'frame_unwind_got_memory' (which is fine with no frame-id) some other
architecture might do 'frame_unwind_got_register', in which case we're
in trouble.

As x86-64 is register lite (compared to many others) then it's more
likely to do 'frame_unwind_got_memory' and thus dodge bugs that might
hit other architectures.

Might give that a go and see what it throws up...

Additionally I wonder why frame_unwind_register_value doesn't insist
that the frame_id be computed before we even try to fetch a register?

Might give that a go too...

Thanks,
Andrew



> 

> > 

> > Thanks,

> > Andrew

> > 

> > ---

> > 

> > commit 566b2b1da20e461cee2798f3eda741c1e31bdff6

> > Author: Andrew Burgess <andrew.burgess@embecosm.com>

> > Date:   Thu Jun 18 18:25:00 2020 +0100

> > 

> >      gdb/testsuite: Test case for inline func, tailcall bug

> > 

> > diff --git a/gdb/testsuite/gdb.opt/inline-frame-tailcall.c b/gdb/testsuite/gdb.opt/inline-frame-tailcall.c

> > new file mode 100644

> > index 00000000000..2513c257a29

> > --- /dev/null

> > +++ b/gdb/testsuite/gdb.opt/inline-frame-tailcall.c

> > @@ -0,0 +1,52 @@

> > +/* Copyright 2020 Free Software Foundation, Inc.

> > +

> > +   This program is free software; you can redistribute it and/or modify

> > +   it under the terms of the GNU General Public License as published by

> > +   the Free Software Foundation; either version 3 of the License, or

> > +   (at your option) any later version.

> > +

> > +   This program is distributed in the hope that it will be useful,

> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of

> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> > +   GNU General Public License for more details.

> > +

> > +   You should have received a copy of the GNU General Public License

> > +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */

> > +

> > +#ifdef __GNUC__

> > +# define ATTR_INLINE __attribute__((gnu_inline)) __attribute__((always_inline)) __attribute__((noclone))

> > +#else

> > +# define ATTR_INLINE

> > +#endif

> > +

> > +volatile int global;

> > +

> > +volatile int counter;

> > +

> > +static inline ATTR_INLINE int

> > +bar ()

> > +{

> > +  /* Just some filler.  */

> > +  for (counter = 0; counter < 10; ++counter)

> > +    global = 0;

> > +  return 0;

> > +}

> > +

> > +__attribute__ ((noinline)) int

> > +foo ()

> > +{

> > +  return bar ();

> > +}

> > +

> > +__attribute__ ((noinline)) int

> > +test_func ()

> > +{

> > +  return foo ();

> > +}

> > +

> > +int

> > +main ()

> > +{

> > +  global = test_func ();

> > +  return (global * 2);

> > +}

> > diff --git a/gdb/testsuite/gdb.opt/inline-frame-tailcall.exp b/gdb/testsuite/gdb.opt/inline-frame-tailcall.exp

> > new file mode 100644

> > index 00000000000..bac96835d12

> > --- /dev/null

> > +++ b/gdb/testsuite/gdb.opt/inline-frame-tailcall.exp

> > @@ -0,0 +1,47 @@

> > +# Copyright 2020 Free Software Foundation, Inc.

> > +

> > +# This program is free software; you can redistribute it and/or modify

> > +# it under the terms of the GNU General Public License as published by

> > +# the Free Software Foundation; either version 3 of the License, or

> > +# (at your option) any later version.

> > +#

> > +# This program is distributed in the hope that it will be useful,

> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of

> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> > +# GNU General Public License for more details.

> > +#

> > +# You should have received a copy of the GNU General Public License

> > +# along with this program.  If not, see <http://www.gnu.org/licenses/>.

> > +

> > +# Tests a specific combination, a tailcall into a function, which then

> > +# has another function inlined within it.  So:

> > +#

> > +#   main --> test_func --> foo --> bar

> > +#

> > +#   main makes a normal call to test_func.

> > +#

> > +#   test_func makes a tail call to foo.

> > +#

> > +#   bar is inlined within foo.

> > +#

> > +# We should still see test_func in the call stack.

> > +

> > +standard_testfile

> > +

> > +if { [prepare_for_testing "failed to prepare" ${testfile} \

> > +	  [list $srcfile] {debug optimize=-O2}] } {

> > +    return -1

> > +}

> > +

> > +if ![runto_main] {

> > +    return -1

> > +}

> > +

> > +gdb_breakpoint "bar"

> > +gdb_continue_to_breakpoint "bar"

> > +

> > +gdb_test "bt" \

> > +    [multi_line "#0  bar \\(\\).*" \

> > +	        "#1  foo \\(\\).*" \

> > +	 	"#2  $hex in test_func \\(\\).*" \

> > +	 	"#3  $hex in main \\(\\).*" ]

> >
Rogerio Alves via Gdb-patches June 18, 2020, 6:19 p.m. | #19
On 6/18/20 2:40 PM, Andrew Burgess wrote:
> * Andrew Burgess <andrew.burgess@embecosm.com> [2020-06-18 18:29:22 +0100]:

> 

>> * Andrew Burgess <andrew.burgess@embecosm.com> [2020-06-18 17:58:55 +0100]:

>>

>>> * Luis Machado via Gdb-patches <gdb-patches@sourceware.org> [2020-04-14 18:38:36 -0300]:

>>>

>>>> *** re-sending due to the poor choice of characters for the backtrace

>>>> annotations. GIT swallowed parts of it.

>>>>

>>>> There has been some breakage for aarch64-linux, arm-linux and s390-linux in

>>>> terms of inline frame unwinding. There may be other targets, but these are

>>>> the ones i'm aware of.

>>>>

>>>> The following testcases started to show numerous failures and trigger internal

>>>> errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

>>>> "Find tailcall frames before inline frames".

>>>>

>>>> gdb.opt/inline-break.exp

>>>> gdb.opt/inline-cmds.exp

>>>> gdb.python/py-frame-inline.exp

>>>> gdb.reverse/insn-reverse.exp

>>>>

>>>> The internal errors were of this kind:

>>>>

>>>> binutils-gdb/gdb/frame.c:579: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

>>>>

>>>> After a lengthy investigation to try and find the cause of these assertions,

>>>> it seems we're dealing with some fragile/poorly documented code to handle inline

>>>> frames and we are attempting to unwind from this fragile section of code.

>>>>

>>>> Before commit 1009d92fc621bc4d017029b90a5bfab16e17fde5, the tailcall sniffer

>>>> was invoked from dwarf2_frame_prev_register. By the time we invoke the

>>>> dwarf2_frame_prev_register function, we've probably already calculated the

>>>> frame id (via compute_frame_id).

>>>>

>>>> After said commit, the call to dwarf2_tailcall_sniffer_first was moved to

>>>> dwarf2_frame_cache. This is very early in a frame creation process, and

>>>> we're still calculating the frame ID (so compute_frame_id is in the call

>>>> stack).

>>>>

>>>> This would be fine for regular frames, but the above testcases all deal

>>>> with some inline frames.

>>>>

>>>> The particularity of inline frames is that their frame ID's depend on

>>>> the previous frame's ID, and the previous frame's ID relies in the inline

>>>> frame's registers. So it is a bit of a messy situation.

>>>>

>>>> We have comments in various parts of the code warning about some of these

>>>> particularities.

>>>>

>>>> In the case of dwarf2_tailcall_sniffer_first, we attempt to unwind the PC,

>>>> which goes through various functions until we eventually invoke

>>>> frame_unwind_got_register. This function will eventually attempt to create

>>>> a lazy value for a particular register, and this lazy value will require

>>>> a valid frame ID.  Since the inline frame doesn't have a valid frame ID

>>>> yet (remember we're still calculating the previous frame's ID so we can tell

>>>> what the inline frame ID is) we will call compute_frame_id for the inline

>>>> frame (level 0).

>>>>

>>>> We'll eventually hit the assertion above, inside get_frame_id:

>>>>

>>>> --

>>>>        /* If we haven't computed the frame id yet, then it must be that

>>>>           this is the current frame.  Compute it now, and stash the

>>>>           result.  The IDs of other frames are computed as soon as

>>>>           they're created, in order to detect cycles.  See

>>>>           get_prev_frame_if_no_cycle.  */

>>>>        gdb_assert (fi->level == 0);

>>>> --

>>>>

>>>> It seems to me we shouldn't have reached this assertion without having the

>>>> inline frame ID already calculated. In fact, it seems we even start recursing

>>>> a bit when we invoke get_prev_frame_always within inline_frame_this_id. But

>>>> a check makes us quit the recursion and proceed to compute the id.

>>>>

>>>> Here's the call stack for context:

>>>>

>>>> #0  get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2109

>>>> RECURSION - #1  0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

>>>> #2  0x0000aaaaaae95768 in inline_frame_this_id (this_frame=0xaaaaab85a670, this_cache=0xaaaaab85a688, this_id=0xaaaaab85a6d0)

>>>>      at ../../../repos/binutils-gdb/gdb/inline-frame.c:165

>>>> #3  0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:550

>>>> #4  0x0000aaaaaae19318 in get_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:582

>>>> #5  0x0000aaaaaae13480 in value_of_register_lazy (frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/findvar.c:296

>>>> #6  0x0000aaaaaae16c00 in frame_unwind_got_register (frame=0xaaaaab85a730, regnum=30, new_regnum=30) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:268

>>>> #7  0x0000aaaaaad52604 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=30)

>>>>      at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1296

>>>> #8  0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1229

>>>> #9  0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1320

>>>> #10 0x0000aaaaaab76574 in aarch64_dwarf2_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

>>>>      at ../../../repos/binutils-gdb/gdb/aarch64-tdep.c:1114

>>>> #11 0x0000aaaaaad52724 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

>>>>      at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1316

>>>> #12 0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1229

>>>> #13 0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1320

>>>> #14 0x0000aaaaaae16a84 in default_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:223

>>>> #15 0x0000aaaaaae32124 in gdbarch_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/gdbarch.c:3074

>>>> #16 0x0000aaaaaad4f15c in dwarf2_tailcall_sniffer_first (this_frame=0xaaaaab85a730, tailcall_cachep=0xaaaaab85a830, entry_cfa_sp_offsetp=0x0)

>>>>      at ../../../repos/binutils-gdb/gdb/dwarf2/frame-tailcall.c:388

>>>> #17 0x0000aaaaaad520c0 in dwarf2_frame_cache (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748) at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1190

>>>> #18 0x0000aaaaaad52204 in dwarf2_frame_this_id (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, this_id=0xaaaaab85a790)

>>>>      at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1218

>>>> #19 0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame.c:550

>>>> #20 0x0000aaaaaae1c958 in get_prev_frame_if_no_cycle (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:1927

>>>> #21 0x0000aaaaaae1cc44 in get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2006

>>>> FIRST CALL - #22 0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

>>>> #23 0x0000aaaaaae18f68 in skip_artificial_frames (frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:495

>>>> #24 0x0000aaaaaae193e8 in get_stack_frame_id (next_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:596

>>>> #25 0x0000aaaaaae87a54 in process_event_stop_test (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6857

>>>> #26 0x0000aaaaaae86bdc in handle_signal_stop (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6381

>>>> #27 0x0000aaaaaae84fd0 in handle_inferior_event (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:5578

>>>> #28 0x0000aaaaaae81588 in fetch_inferior_event (client_data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:4020

>>>> #29 0x0000aaaaaae5f7fc in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at ../../../repos/binutils-gdb/gdb/inf-loop.c:43

>>>> #30 0x0000aaaaaae8d768 in infrun_async_inferior_event_handler (data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:9377

>>>> #31 0x0000aaaaaabff970 in check_async_event_handlers () at ../../../repos/binutils-gdb/gdb/async-event.c:291

>>>> #32 0x0000aaaaab27cbec in gdb_do_one_event () at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:194

>>>> #33 0x0000aaaaaaef1894 in start_event_loop () at ../../../repos/binutils-gdb/gdb/main.c:356

>>>> #34 0x0000aaaaaaef1a04 in captured_command_loop () at ../../../repos/binutils-gdb/gdb/main.c:416

>>>> #35 0x0000aaaaaaef3338 in captured_main (data=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1254

>>>> #36 0x0000aaaaaaef33a0 in gdb_main (args=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1269

>>>> #37 0x0000aaaaaab6e0dc in main (argc=6, argv=0xfffffffff348) at ../../../repos/binutils-gdb/gdb/gdb.c:32

>>>>

>>>> The following patch addresses this by using a function that unwinds the PC

>>>> from the next (inline) frame directly as opposed to creating a lazy value

>>>> that is bound to the next frame's ID (still not computed).

>>>>

>>>> I've validated this for aarch64-linux and x86_64-linux by running the

>>>> testsuite.

>>>>

>>>> Tromey, would you mind checking if this suits your problematic core file

>>>> tailcall scenario?

>>>>

>>>> gdb/ChangeLog:

>>>>

>>>> 2020-04-14  Luis Machado  <luis.machado@linaro.org>

>>>>

>>>> 	* dwarf2/frame-tailcall.c (dwarf2_tailcall_sniffer_first): Use

>>>> 	get_frame_register instead of gdbarch_unwind_pc.

>>>> ---

>>>>   gdb/dwarf2/frame-tailcall.c | 4 +++-

>>>>   1 file changed, 3 insertions(+), 1 deletion(-)

>>>>

>>>> diff --git a/gdb/dwarf2/frame-tailcall.c b/gdb/dwarf2/frame-tailcall.c

>>>> index 2d219f13f9..01bb134a5c 100644

>>>> --- a/gdb/dwarf2/frame-tailcall.c

>>>> +++ b/gdb/dwarf2/frame-tailcall.c

>>>> @@ -385,7 +385,9 @@ dwarf2_tailcall_sniffer_first (struct frame_info *this_frame,

>>>>         prev_gdbarch = frame_unwind_arch (this_frame);

>>>>   

>>>>         /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */

>>>> -      prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

>>>> +      get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),

>>>> +			  (gdb_byte *) &prev_pc);

>>>> +      prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);

>>>>   

>>>>         /* call_site_find_chain can throw an exception.  */

>>>>         chain = call_site_find_chain (prev_gdbarch, prev_pc, this_pc);

>>>

>>> I'm now no longer convinced that this patch is correct, and I'd like

>>> to reopen the discussion.

>>>

>>> Here's what concerns me, we used to make the following call-chain:

>>>

>>>    gdbarch_unwind_pc --> frame_unwind_register_unsigned --> frame_unwind_register_value

>>>

>>> Now we do this:

>>>

>>>    get_frame_register --> frame_unwind_register --> frame_register_unwind --> frame_unwind_register_value

>>>

>>> The problem is that gdbarch_unwind_pc' takes an argument 'next_frame',

>>> while, get_frame_register takes an argument called frame', but is

>>> really 'this_frame', it then passes 'frame->next' to

>>> 'frame_unwind_register'.

>>>

>>> What this means is that if we have a call stack like this:

>>>

>>>    #3 --> #2 --> #1 --> #0

>>>

>>> And we invoke the tail-call sniffer in frame #1, previously we figured

>>> out the $pc value in frame #2, while now we figure out the $pc value

>>> in frame #1.

>>>

>>> I'm even more convinced that this is an error based on the fix patch

>>> you applied later:

>>>

>>>    commit 991a3e2e9944a4b3a27bd989ac03c18285bd545d

>>>    Date:   Sat Apr 25 00:32:44 2020 -0300

>>>

>>>        Fix remaining inline/tailcall unwinding breakage for x86_64

>>>

> 

> After this commit the code looks like this:

> 

>        /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */

>        if (next_frame != nullptr && get_frame_type (next_frame) == INLINE_FRAME

> 	  && !frame_id_computed_p (next_frame))

> 	{

> 	  /* The next frame is an inline frame and its frame id has not been

> 	     computed yet.  */

> 	  get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),

> 			      (gdb_byte *) &prev_pc);

> 	  prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);

> 	}

>        else

> 	prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

> 

> Now the point of this function is to decide if the frame we are in

> _right_now_ was tail called into by it's "apparent" caller.  That is

> when we unwind we have:

> 

>    main --> foo --> bar

> 

> And in foo we ask, can we build a tailcall chain that gets us from

> main to foo.  So in this case, we get the answer yes, and the chain

> returned represents:

> 

>    main --> test_func --> foo

> 

> However, my thinking is that we know we're not in the position of

> 'foo' (that is reached by a tail call) if, foo is an inline frame -

> right?

> 

> So, I wonder if it's as simple as saying:

> 

>    /* We know that THIS_FRAME was not reached by a tail call if

>       THIS_FRAME is an inline frame.  */

>    if (get_frame_type (this_frame) == INLINE_FRAME)

>      return;

> 

> That's totally untested, just a random thought....  I wonder if such a

> change would fix the original failures you saw?

> 

> Thanks,

> Andrew

> 


I can attest your testcase fails for aarch64. But your proposed early 
return runs into the known frame level assertion, inside get_frame_id:

gdb_assert (fi->level == 0);

This happens because when we're unwinding the frame for foo, the current 
frame is no longer an inline frame. So we proceed to try and fetch PC 
from the next (inline) frame. And then we run into the same problem with 
the lack of a frame id.

When we're unwinding an inline frame whose next frame is the sentinel 
one, we're ok. We always have register access through the sentinel frame.

I wonder how likely it is to break if there are inline frames at other 
positions in the call chain. So your suggestion may fix those cases. But 
it won't fix the remaining cases that attempt to unwind registers from 
the inline frame.

Does that make sense?
Andrew Burgess June 18, 2020, 6:31 p.m. | #20
* Luis Machado <luis.machado@linaro.org> [2020-06-18 15:19:56 -0300]:

> On 6/18/20 2:40 PM, Andrew Burgess wrote:

> > * Andrew Burgess <andrew.burgess@embecosm.com> [2020-06-18 18:29:22 +0100]:

> > 

> > > * Andrew Burgess <andrew.burgess@embecosm.com> [2020-06-18 17:58:55 +0100]:

> > > 

> > > > * Luis Machado via Gdb-patches <gdb-patches@sourceware.org> [2020-04-14 18:38:36 -0300]:

> > > > 

> > > > > *** re-sending due to the poor choice of characters for the backtrace

> > > > > annotations. GIT swallowed parts of it.

> > > > > 

> > > > > There has been some breakage for aarch64-linux, arm-linux and s390-linux in

> > > > > terms of inline frame unwinding. There may be other targets, but these are

> > > > > the ones i'm aware of.

> > > > > 

> > > > > The following testcases started to show numerous failures and trigger internal

> > > > > errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

> > > > > "Find tailcall frames before inline frames".

> > > > > 

> > > > > gdb.opt/inline-break.exp

> > > > > gdb.opt/inline-cmds.exp

> > > > > gdb.python/py-frame-inline.exp

> > > > > gdb.reverse/insn-reverse.exp

> > > > > 

> > > > > The internal errors were of this kind:

> > > > > 

> > > > > binutils-gdb/gdb/frame.c:579: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

> > > > > 

> > > > > After a lengthy investigation to try and find the cause of these assertions,

> > > > > it seems we're dealing with some fragile/poorly documented code to handle inline

> > > > > frames and we are attempting to unwind from this fragile section of code.

> > > > > 

> > > > > Before commit 1009d92fc621bc4d017029b90a5bfab16e17fde5, the tailcall sniffer

> > > > > was invoked from dwarf2_frame_prev_register. By the time we invoke the

> > > > > dwarf2_frame_prev_register function, we've probably already calculated the

> > > > > frame id (via compute_frame_id).

> > > > > 

> > > > > After said commit, the call to dwarf2_tailcall_sniffer_first was moved to

> > > > > dwarf2_frame_cache. This is very early in a frame creation process, and

> > > > > we're still calculating the frame ID (so compute_frame_id is in the call

> > > > > stack).

> > > > > 

> > > > > This would be fine for regular frames, but the above testcases all deal

> > > > > with some inline frames.

> > > > > 

> > > > > The particularity of inline frames is that their frame ID's depend on

> > > > > the previous frame's ID, and the previous frame's ID relies in the inline

> > > > > frame's registers. So it is a bit of a messy situation.

> > > > > 

> > > > > We have comments in various parts of the code warning about some of these

> > > > > particularities.

> > > > > 

> > > > > In the case of dwarf2_tailcall_sniffer_first, we attempt to unwind the PC,

> > > > > which goes through various functions until we eventually invoke

> > > > > frame_unwind_got_register. This function will eventually attempt to create

> > > > > a lazy value for a particular register, and this lazy value will require

> > > > > a valid frame ID.  Since the inline frame doesn't have a valid frame ID

> > > > > yet (remember we're still calculating the previous frame's ID so we can tell

> > > > > what the inline frame ID is) we will call compute_frame_id for the inline

> > > > > frame (level 0).

> > > > > 

> > > > > We'll eventually hit the assertion above, inside get_frame_id:

> > > > > 

> > > > > --

> > > > >        /* If we haven't computed the frame id yet, then it must be that

> > > > >           this is the current frame.  Compute it now, and stash the

> > > > >           result.  The IDs of other frames are computed as soon as

> > > > >           they're created, in order to detect cycles.  See

> > > > >           get_prev_frame_if_no_cycle.  */

> > > > >        gdb_assert (fi->level == 0);

> > > > > --

> > > > > 

> > > > > It seems to me we shouldn't have reached this assertion without having the

> > > > > inline frame ID already calculated. In fact, it seems we even start recursing

> > > > > a bit when we invoke get_prev_frame_always within inline_frame_this_id. But

> > > > > a check makes us quit the recursion and proceed to compute the id.

> > > > > 

> > > > > Here's the call stack for context:

> > > > > 

> > > > > #0  get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2109

> > > > > RECURSION - #1  0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

> > > > > #2  0x0000aaaaaae95768 in inline_frame_this_id (this_frame=0xaaaaab85a670, this_cache=0xaaaaab85a688, this_id=0xaaaaab85a6d0)

> > > > >      at ../../../repos/binutils-gdb/gdb/inline-frame.c:165

> > > > > #3  0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:550

> > > > > #4  0x0000aaaaaae19318 in get_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:582

> > > > > #5  0x0000aaaaaae13480 in value_of_register_lazy (frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/findvar.c:296

> > > > > #6  0x0000aaaaaae16c00 in frame_unwind_got_register (frame=0xaaaaab85a730, regnum=30, new_regnum=30) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:268

> > > > > #7  0x0000aaaaaad52604 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=30)

> > > > >      at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1296

> > > > > #8  0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1229

> > > > > #9  0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1320

> > > > > #10 0x0000aaaaaab76574 in aarch64_dwarf2_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

> > > > >      at ../../../repos/binutils-gdb/gdb/aarch64-tdep.c:1114

> > > > > #11 0x0000aaaaaad52724 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

> > > > >      at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1316

> > > > > #12 0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1229

> > > > > #13 0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1320

> > > > > #14 0x0000aaaaaae16a84 in default_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:223

> > > > > #15 0x0000aaaaaae32124 in gdbarch_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/gdbarch.c:3074

> > > > > #16 0x0000aaaaaad4f15c in dwarf2_tailcall_sniffer_first (this_frame=0xaaaaab85a730, tailcall_cachep=0xaaaaab85a830, entry_cfa_sp_offsetp=0x0)

> > > > >      at ../../../repos/binutils-gdb/gdb/dwarf2/frame-tailcall.c:388

> > > > > #17 0x0000aaaaaad520c0 in dwarf2_frame_cache (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748) at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1190

> > > > > #18 0x0000aaaaaad52204 in dwarf2_frame_this_id (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, this_id=0xaaaaab85a790)

> > > > >      at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1218

> > > > > #19 0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame.c:550

> > > > > #20 0x0000aaaaaae1c958 in get_prev_frame_if_no_cycle (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:1927

> > > > > #21 0x0000aaaaaae1cc44 in get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2006

> > > > > FIRST CALL - #22 0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

> > > > > #23 0x0000aaaaaae18f68 in skip_artificial_frames (frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:495

> > > > > #24 0x0000aaaaaae193e8 in get_stack_frame_id (next_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:596

> > > > > #25 0x0000aaaaaae87a54 in process_event_stop_test (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6857

> > > > > #26 0x0000aaaaaae86bdc in handle_signal_stop (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6381

> > > > > #27 0x0000aaaaaae84fd0 in handle_inferior_event (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:5578

> > > > > #28 0x0000aaaaaae81588 in fetch_inferior_event (client_data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:4020

> > > > > #29 0x0000aaaaaae5f7fc in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at ../../../repos/binutils-gdb/gdb/inf-loop.c:43

> > > > > #30 0x0000aaaaaae8d768 in infrun_async_inferior_event_handler (data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:9377

> > > > > #31 0x0000aaaaaabff970 in check_async_event_handlers () at ../../../repos/binutils-gdb/gdb/async-event.c:291

> > > > > #32 0x0000aaaaab27cbec in gdb_do_one_event () at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:194

> > > > > #33 0x0000aaaaaaef1894 in start_event_loop () at ../../../repos/binutils-gdb/gdb/main.c:356

> > > > > #34 0x0000aaaaaaef1a04 in captured_command_loop () at ../../../repos/binutils-gdb/gdb/main.c:416

> > > > > #35 0x0000aaaaaaef3338 in captured_main (data=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1254

> > > > > #36 0x0000aaaaaaef33a0 in gdb_main (args=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1269

> > > > > #37 0x0000aaaaaab6e0dc in main (argc=6, argv=0xfffffffff348) at ../../../repos/binutils-gdb/gdb/gdb.c:32

> > > > > 

> > > > > The following patch addresses this by using a function that unwinds the PC

> > > > > from the next (inline) frame directly as opposed to creating a lazy value

> > > > > that is bound to the next frame's ID (still not computed).

> > > > > 

> > > > > I've validated this for aarch64-linux and x86_64-linux by running the

> > > > > testsuite.

> > > > > 

> > > > > Tromey, would you mind checking if this suits your problematic core file

> > > > > tailcall scenario?

> > > > > 

> > > > > gdb/ChangeLog:

> > > > > 

> > > > > 2020-04-14  Luis Machado  <luis.machado@linaro.org>

> > > > > 

> > > > > 	* dwarf2/frame-tailcall.c (dwarf2_tailcall_sniffer_first): Use

> > > > > 	get_frame_register instead of gdbarch_unwind_pc.

> > > > > ---

> > > > >   gdb/dwarf2/frame-tailcall.c | 4 +++-

> > > > >   1 file changed, 3 insertions(+), 1 deletion(-)

> > > > > 

> > > > > diff --git a/gdb/dwarf2/frame-tailcall.c b/gdb/dwarf2/frame-tailcall.c

> > > > > index 2d219f13f9..01bb134a5c 100644

> > > > > --- a/gdb/dwarf2/frame-tailcall.c

> > > > > +++ b/gdb/dwarf2/frame-tailcall.c

> > > > > @@ -385,7 +385,9 @@ dwarf2_tailcall_sniffer_first (struct frame_info *this_frame,

> > > > >         prev_gdbarch = frame_unwind_arch (this_frame);

> > > > >         /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */

> > > > > -      prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

> > > > > +      get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),

> > > > > +			  (gdb_byte *) &prev_pc);

> > > > > +      prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);

> > > > >         /* call_site_find_chain can throw an exception.  */

> > > > >         chain = call_site_find_chain (prev_gdbarch, prev_pc, this_pc);

> > > > 

> > > > I'm now no longer convinced that this patch is correct, and I'd like

> > > > to reopen the discussion.

> > > > 

> > > > Here's what concerns me, we used to make the following call-chain:

> > > > 

> > > >    gdbarch_unwind_pc --> frame_unwind_register_unsigned --> frame_unwind_register_value

> > > > 

> > > > Now we do this:

> > > > 

> > > >    get_frame_register --> frame_unwind_register --> frame_register_unwind --> frame_unwind_register_value

> > > > 

> > > > The problem is that gdbarch_unwind_pc' takes an argument 'next_frame',

> > > > while, get_frame_register takes an argument called frame', but is

> > > > really 'this_frame', it then passes 'frame->next' to

> > > > 'frame_unwind_register'.

> > > > 

> > > > What this means is that if we have a call stack like this:

> > > > 

> > > >    #3 --> #2 --> #1 --> #0

> > > > 

> > > > And we invoke the tail-call sniffer in frame #1, previously we figured

> > > > out the $pc value in frame #2, while now we figure out the $pc value

> > > > in frame #1.

> > > > 

> > > > I'm even more convinced that this is an error based on the fix patch

> > > > you applied later:

> > > > 

> > > >    commit 991a3e2e9944a4b3a27bd989ac03c18285bd545d

> > > >    Date:   Sat Apr 25 00:32:44 2020 -0300

> > > > 

> > > >        Fix remaining inline/tailcall unwinding breakage for x86_64

> > > > 

> > 

> > After this commit the code looks like this:

> > 

> >        /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */

> >        if (next_frame != nullptr && get_frame_type (next_frame) == INLINE_FRAME

> > 	  && !frame_id_computed_p (next_frame))

> > 	{

> > 	  /* The next frame is an inline frame and its frame id has not been

> > 	     computed yet.  */

> > 	  get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),

> > 			      (gdb_byte *) &prev_pc);

> > 	  prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);

> > 	}

> >        else

> > 	prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

> > 

> > Now the point of this function is to decide if the frame we are in

> > _right_now_ was tail called into by it's "apparent" caller.  That is

> > when we unwind we have:

> > 

> >    main --> foo --> bar

> > 

> > And in foo we ask, can we build a tailcall chain that gets us from

> > main to foo.  So in this case, we get the answer yes, and the chain

> > returned represents:

> > 

> >    main --> test_func --> foo

> > 

> > However, my thinking is that we know we're not in the position of

> > 'foo' (that is reached by a tail call) if, foo is an inline frame -

> > right?

> > 

> > So, I wonder if it's as simple as saying:

> > 

> >    /* We know that THIS_FRAME was not reached by a tail call if

> >       THIS_FRAME is an inline frame.  */

> >    if (get_frame_type (this_frame) == INLINE_FRAME)

> >      return;

> > 

> > That's totally untested, just a random thought....  I wonder if such a

> > change would fix the original failures you saw?

> > 

> > Thanks,

> > Andrew

> > 

> 

> I can attest your testcase fails for aarch64. But your proposed early return

> runs into the known frame level assertion, inside get_frame_id:

> 

> gdb_assert (fi->level == 0);

> 

> This happens because when we're unwinding the frame for foo, the current

> frame is no longer an inline frame. So we proceed to try and fetch PC from

> the next (inline) frame. And then we run into the same problem with the lack

> of a frame id.

> 

> When we're unwinding an inline frame whose next frame is the sentinel one,

> we're ok. We always have register access through the sentinel frame.

> 

> I wonder how likely it is to break if there are inline frames at other

> positions in the call chain. So your suggestion may fix those cases. But it

> won't fix the remaining cases that attempt to unwind registers from the

> inline frame.

> 

> Does that make sense?


I think so.

The more I stare at this code the more I'm convinced that trying to
run the tail-call detection (not going to call it sniffing) from an
actual frame sniffer, is just a bad idea.

Think about it, we have

  test_func --> foo

we're sniffing for foo, which means we don't yet know what type of
frame foo is, and how to unwind it; yet to do tail call detection we
need to ask foo to unwind its $pc.

I think the real fix here is to move tail call detection later in the
process, hopefully Tom will offer some more details of the original
bug that motivated moving tail call detection.

Thanks,
Andrew
Rogerio Alves via Gdb-patches June 18, 2020, 6:39 p.m. | #21
On 6/18/20 3:31 PM, Andrew Burgess wrote:
> * Luis Machado <luis.machado@linaro.org> [2020-06-18 15:19:56 -0300]:

> 

>> On 6/18/20 2:40 PM, Andrew Burgess wrote:

>>> * Andrew Burgess <andrew.burgess@embecosm.com> [2020-06-18 18:29:22 +0100]:

>>>

>>>> * Andrew Burgess <andrew.burgess@embecosm.com> [2020-06-18 17:58:55 +0100]:

>>>>

>>>>> * Luis Machado via Gdb-patches <gdb-patches@sourceware.org> [2020-04-14 18:38:36 -0300]:

>>>>>

>>>>>> *** re-sending due to the poor choice of characters for the backtrace

>>>>>> annotations. GIT swallowed parts of it.

>>>>>>

>>>>>> There has been some breakage for aarch64-linux, arm-linux and s390-linux in

>>>>>> terms of inline frame unwinding. There may be other targets, but these are

>>>>>> the ones i'm aware of.

>>>>>>

>>>>>> The following testcases started to show numerous failures and trigger internal

>>>>>> errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

>>>>>> "Find tailcall frames before inline frames".

>>>>>>

>>>>>> gdb.opt/inline-break.exp

>>>>>> gdb.opt/inline-cmds.exp

>>>>>> gdb.python/py-frame-inline.exp

>>>>>> gdb.reverse/insn-reverse.exp

>>>>>>

>>>>>> The internal errors were of this kind:

>>>>>>

>>>>>> binutils-gdb/gdb/frame.c:579: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

>>>>>>

>>>>>> After a lengthy investigation to try and find the cause of these assertions,

>>>>>> it seems we're dealing with some fragile/poorly documented code to handle inline

>>>>>> frames and we are attempting to unwind from this fragile section of code.

>>>>>>

>>>>>> Before commit 1009d92fc621bc4d017029b90a5bfab16e17fde5, the tailcall sniffer

>>>>>> was invoked from dwarf2_frame_prev_register. By the time we invoke the

>>>>>> dwarf2_frame_prev_register function, we've probably already calculated the

>>>>>> frame id (via compute_frame_id).

>>>>>>

>>>>>> After said commit, the call to dwarf2_tailcall_sniffer_first was moved to

>>>>>> dwarf2_frame_cache. This is very early in a frame creation process, and

>>>>>> we're still calculating the frame ID (so compute_frame_id is in the call

>>>>>> stack).

>>>>>>

>>>>>> This would be fine for regular frames, but the above testcases all deal

>>>>>> with some inline frames.

>>>>>>

>>>>>> The particularity of inline frames is that their frame ID's depend on

>>>>>> the previous frame's ID, and the previous frame's ID relies in the inline

>>>>>> frame's registers. So it is a bit of a messy situation.

>>>>>>

>>>>>> We have comments in various parts of the code warning about some of these

>>>>>> particularities.

>>>>>>

>>>>>> In the case of dwarf2_tailcall_sniffer_first, we attempt to unwind the PC,

>>>>>> which goes through various functions until we eventually invoke

>>>>>> frame_unwind_got_register. This function will eventually attempt to create

>>>>>> a lazy value for a particular register, and this lazy value will require

>>>>>> a valid frame ID.  Since the inline frame doesn't have a valid frame ID

>>>>>> yet (remember we're still calculating the previous frame's ID so we can tell

>>>>>> what the inline frame ID is) we will call compute_frame_id for the inline

>>>>>> frame (level 0).

>>>>>>

>>>>>> We'll eventually hit the assertion above, inside get_frame_id:

>>>>>>

>>>>>> --

>>>>>>         /* If we haven't computed the frame id yet, then it must be that

>>>>>>            this is the current frame.  Compute it now, and stash the

>>>>>>            result.  The IDs of other frames are computed as soon as

>>>>>>            they're created, in order to detect cycles.  See

>>>>>>            get_prev_frame_if_no_cycle.  */

>>>>>>         gdb_assert (fi->level == 0);

>>>>>> --

>>>>>>

>>>>>> It seems to me we shouldn't have reached this assertion without having the

>>>>>> inline frame ID already calculated. In fact, it seems we even start recursing

>>>>>> a bit when we invoke get_prev_frame_always within inline_frame_this_id. But

>>>>>> a check makes us quit the recursion and proceed to compute the id.

>>>>>>

>>>>>> Here's the call stack for context:

>>>>>>

>>>>>> #0  get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2109

>>>>>> RECURSION - #1  0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

>>>>>> #2  0x0000aaaaaae95768 in inline_frame_this_id (this_frame=0xaaaaab85a670, this_cache=0xaaaaab85a688, this_id=0xaaaaab85a6d0)

>>>>>>       at ../../../repos/binutils-gdb/gdb/inline-frame.c:165

>>>>>> #3  0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:550

>>>>>> #4  0x0000aaaaaae19318 in get_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:582

>>>>>> #5  0x0000aaaaaae13480 in value_of_register_lazy (frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/findvar.c:296

>>>>>> #6  0x0000aaaaaae16c00 in frame_unwind_got_register (frame=0xaaaaab85a730, regnum=30, new_regnum=30) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:268

>>>>>> #7  0x0000aaaaaad52604 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=30)

>>>>>>       at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1296

>>>>>> #8  0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1229

>>>>>> #9  0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1320

>>>>>> #10 0x0000aaaaaab76574 in aarch64_dwarf2_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

>>>>>>       at ../../../repos/binutils-gdb/gdb/aarch64-tdep.c:1114

>>>>>> #11 0x0000aaaaaad52724 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

>>>>>>       at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1316

>>>>>> #12 0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1229

>>>>>> #13 0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1320

>>>>>> #14 0x0000aaaaaae16a84 in default_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:223

>>>>>> #15 0x0000aaaaaae32124 in gdbarch_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/gdbarch.c:3074

>>>>>> #16 0x0000aaaaaad4f15c in dwarf2_tailcall_sniffer_first (this_frame=0xaaaaab85a730, tailcall_cachep=0xaaaaab85a830, entry_cfa_sp_offsetp=0x0)

>>>>>>       at ../../../repos/binutils-gdb/gdb/dwarf2/frame-tailcall.c:388

>>>>>> #17 0x0000aaaaaad520c0 in dwarf2_frame_cache (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748) at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1190

>>>>>> #18 0x0000aaaaaad52204 in dwarf2_frame_this_id (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, this_id=0xaaaaab85a790)

>>>>>>       at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1218

>>>>>> #19 0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame.c:550

>>>>>> #20 0x0000aaaaaae1c958 in get_prev_frame_if_no_cycle (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:1927

>>>>>> #21 0x0000aaaaaae1cc44 in get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2006

>>>>>> FIRST CALL - #22 0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

>>>>>> #23 0x0000aaaaaae18f68 in skip_artificial_frames (frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:495

>>>>>> #24 0x0000aaaaaae193e8 in get_stack_frame_id (next_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:596

>>>>>> #25 0x0000aaaaaae87a54 in process_event_stop_test (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6857

>>>>>> #26 0x0000aaaaaae86bdc in handle_signal_stop (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6381

>>>>>> #27 0x0000aaaaaae84fd0 in handle_inferior_event (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:5578

>>>>>> #28 0x0000aaaaaae81588 in fetch_inferior_event (client_data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:4020

>>>>>> #29 0x0000aaaaaae5f7fc in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at ../../../repos/binutils-gdb/gdb/inf-loop.c:43

>>>>>> #30 0x0000aaaaaae8d768 in infrun_async_inferior_event_handler (data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:9377

>>>>>> #31 0x0000aaaaaabff970 in check_async_event_handlers () at ../../../repos/binutils-gdb/gdb/async-event.c:291

>>>>>> #32 0x0000aaaaab27cbec in gdb_do_one_event () at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:194

>>>>>> #33 0x0000aaaaaaef1894 in start_event_loop () at ../../../repos/binutils-gdb/gdb/main.c:356

>>>>>> #34 0x0000aaaaaaef1a04 in captured_command_loop () at ../../../repos/binutils-gdb/gdb/main.c:416

>>>>>> #35 0x0000aaaaaaef3338 in captured_main (data=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1254

>>>>>> #36 0x0000aaaaaaef33a0 in gdb_main (args=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1269

>>>>>> #37 0x0000aaaaaab6e0dc in main (argc=6, argv=0xfffffffff348) at ../../../repos/binutils-gdb/gdb/gdb.c:32

>>>>>>

>>>>>> The following patch addresses this by using a function that unwinds the PC

>>>>>> from the next (inline) frame directly as opposed to creating a lazy value

>>>>>> that is bound to the next frame's ID (still not computed).

>>>>>>

>>>>>> I've validated this for aarch64-linux and x86_64-linux by running the

>>>>>> testsuite.

>>>>>>

>>>>>> Tromey, would you mind checking if this suits your problematic core file

>>>>>> tailcall scenario?

>>>>>>

>>>>>> gdb/ChangeLog:

>>>>>>

>>>>>> 2020-04-14  Luis Machado  <luis.machado@linaro.org>

>>>>>>

>>>>>> 	* dwarf2/frame-tailcall.c (dwarf2_tailcall_sniffer_first): Use

>>>>>> 	get_frame_register instead of gdbarch_unwind_pc.

>>>>>> ---

>>>>>>    gdb/dwarf2/frame-tailcall.c | 4 +++-

>>>>>>    1 file changed, 3 insertions(+), 1 deletion(-)

>>>>>>

>>>>>> diff --git a/gdb/dwarf2/frame-tailcall.c b/gdb/dwarf2/frame-tailcall.c

>>>>>> index 2d219f13f9..01bb134a5c 100644

>>>>>> --- a/gdb/dwarf2/frame-tailcall.c

>>>>>> +++ b/gdb/dwarf2/frame-tailcall.c

>>>>>> @@ -385,7 +385,9 @@ dwarf2_tailcall_sniffer_first (struct frame_info *this_frame,

>>>>>>          prev_gdbarch = frame_unwind_arch (this_frame);

>>>>>>          /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */

>>>>>> -      prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

>>>>>> +      get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),

>>>>>> +			  (gdb_byte *) &prev_pc);

>>>>>> +      prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);

>>>>>>          /* call_site_find_chain can throw an exception.  */

>>>>>>          chain = call_site_find_chain (prev_gdbarch, prev_pc, this_pc);

>>>>>

>>>>> I'm now no longer convinced that this patch is correct, and I'd like

>>>>> to reopen the discussion.

>>>>>

>>>>> Here's what concerns me, we used to make the following call-chain:

>>>>>

>>>>>     gdbarch_unwind_pc --> frame_unwind_register_unsigned --> frame_unwind_register_value

>>>>>

>>>>> Now we do this:

>>>>>

>>>>>     get_frame_register --> frame_unwind_register --> frame_register_unwind --> frame_unwind_register_value

>>>>>

>>>>> The problem is that gdbarch_unwind_pc' takes an argument 'next_frame',

>>>>> while, get_frame_register takes an argument called frame', but is

>>>>> really 'this_frame', it then passes 'frame->next' to

>>>>> 'frame_unwind_register'.

>>>>>

>>>>> What this means is that if we have a call stack like this:

>>>>>

>>>>>     #3 --> #2 --> #1 --> #0

>>>>>

>>>>> And we invoke the tail-call sniffer in frame #1, previously we figured

>>>>> out the $pc value in frame #2, while now we figure out the $pc value

>>>>> in frame #1.

>>>>>

>>>>> I'm even more convinced that this is an error based on the fix patch

>>>>> you applied later:

>>>>>

>>>>>     commit 991a3e2e9944a4b3a27bd989ac03c18285bd545d

>>>>>     Date:   Sat Apr 25 00:32:44 2020 -0300

>>>>>

>>>>>         Fix remaining inline/tailcall unwinding breakage for x86_64

>>>>>

>>>

>>> After this commit the code looks like this:

>>>

>>>         /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */

>>>         if (next_frame != nullptr && get_frame_type (next_frame) == INLINE_FRAME

>>> 	  && !frame_id_computed_p (next_frame))

>>> 	{

>>> 	  /* The next frame is an inline frame and its frame id has not been

>>> 	     computed yet.  */

>>> 	  get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),

>>> 			      (gdb_byte *) &prev_pc);

>>> 	  prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);

>>> 	}

>>>         else

>>> 	prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

>>>

>>> Now the point of this function is to decide if the frame we are in

>>> _right_now_ was tail called into by it's "apparent" caller.  That is

>>> when we unwind we have:

>>>

>>>     main --> foo --> bar

>>>

>>> And in foo we ask, can we build a tailcall chain that gets us from

>>> main to foo.  So in this case, we get the answer yes, and the chain

>>> returned represents:

>>>

>>>     main --> test_func --> foo

>>>

>>> However, my thinking is that we know we're not in the position of

>>> 'foo' (that is reached by a tail call) if, foo is an inline frame -

>>> right?

>>>

>>> So, I wonder if it's as simple as saying:

>>>

>>>     /* We know that THIS_FRAME was not reached by a tail call if

>>>        THIS_FRAME is an inline frame.  */

>>>     if (get_frame_type (this_frame) == INLINE_FRAME)

>>>       return;

>>>

>>> That's totally untested, just a random thought....  I wonder if such a

>>> change would fix the original failures you saw?

>>>

>>> Thanks,

>>> Andrew

>>>

>>

>> I can attest your testcase fails for aarch64. But your proposed early return

>> runs into the known frame level assertion, inside get_frame_id:

>>

>> gdb_assert (fi->level == 0);

>>

>> This happens because when we're unwinding the frame for foo, the current

>> frame is no longer an inline frame. So we proceed to try and fetch PC from

>> the next (inline) frame. And then we run into the same problem with the lack

>> of a frame id.

>>

>> When we're unwinding an inline frame whose next frame is the sentinel one,

>> we're ok. We always have register access through the sentinel frame.

>>

>> I wonder how likely it is to break if there are inline frames at other

>> positions in the call chain. So your suggestion may fix those cases. But it

>> won't fix the remaining cases that attempt to unwind registers from the

>> inline frame.

>>

>> Does that make sense?

> 

> I think so.

> 

> The more I stare at this code the more I'm convinced that trying to

> run the tail-call detection (not going to call it sniffing) from an

> actual frame sniffer, is just a bad idea.

> 

> Think about it, we have

> 

>    test_func --> foo

> 

> we're sniffing for foo, which means we don't yet know what type of

> frame foo is, and how to unwind it; yet to do tail call detection we

> need to ask foo to unwind its $pc.

> 

> I think the real fix here is to move tail call detection later in the

> process, hopefully Tom will offer some more details of the original

> bug that motivated moving tail call detection.

> 

> Thanks,

> Andrew

> 


I tend to agree.

I think we've tried a few things to accommodate tailcall detection 
early, but this is getting harder to solve. So I'm inclined to revisit 
the tailcall detection relocation and make sure it works, but also make 
sure other non-x86 architectures are fine with it as well.

Introducing more and more conditionals may make the code more prone to bugs.
Andrew Burgess June 22, 2020, 3:49 p.m. | #22
* Luis Machado <luis.machado@linaro.org> [2020-06-18 15:39:32 -0300]:

> On 6/18/20 3:31 PM, Andrew Burgess wrote:

> > * Luis Machado <luis.machado@linaro.org> [2020-06-18 15:19:56 -0300]:

> > 

> > > On 6/18/20 2:40 PM, Andrew Burgess wrote:

> > > > * Andrew Burgess <andrew.burgess@embecosm.com> [2020-06-18 18:29:22 +0100]:

> > > > 

> > > > > * Andrew Burgess <andrew.burgess@embecosm.com> [2020-06-18 17:58:55 +0100]:

> > > > > 

> > > > > > * Luis Machado via Gdb-patches <gdb-patches@sourceware.org> [2020-04-14 18:38:36 -0300]:

> > > > > > 

> > > > > > > *** re-sending due to the poor choice of characters for the backtrace

> > > > > > > annotations. GIT swallowed parts of it.

> > > > > > > 

> > > > > > > There has been some breakage for aarch64-linux, arm-linux and s390-linux in

> > > > > > > terms of inline frame unwinding. There may be other targets, but these are

> > > > > > > the ones i'm aware of.

> > > > > > > 

> > > > > > > The following testcases started to show numerous failures and trigger internal

> > > > > > > errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,

> > > > > > > "Find tailcall frames before inline frames".

> > > > > > > 

> > > > > > > gdb.opt/inline-break.exp

> > > > > > > gdb.opt/inline-cmds.exp

> > > > > > > gdb.python/py-frame-inline.exp

> > > > > > > gdb.reverse/insn-reverse.exp

> > > > > > > 

> > > > > > > The internal errors were of this kind:

> > > > > > > 

> > > > > > > binutils-gdb/gdb/frame.c:579: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

> > > > > > > 

> > > > > > > After a lengthy investigation to try and find the cause of these assertions,

> > > > > > > it seems we're dealing with some fragile/poorly documented code to handle inline

> > > > > > > frames and we are attempting to unwind from this fragile section of code.

> > > > > > > 

> > > > > > > Before commit 1009d92fc621bc4d017029b90a5bfab16e17fde5, the tailcall sniffer

> > > > > > > was invoked from dwarf2_frame_prev_register. By the time we invoke the

> > > > > > > dwarf2_frame_prev_register function, we've probably already calculated the

> > > > > > > frame id (via compute_frame_id).

> > > > > > > 

> > > > > > > After said commit, the call to dwarf2_tailcall_sniffer_first was moved to

> > > > > > > dwarf2_frame_cache. This is very early in a frame creation process, and

> > > > > > > we're still calculating the frame ID (so compute_frame_id is in the call

> > > > > > > stack).

> > > > > > > 

> > > > > > > This would be fine for regular frames, but the above testcases all deal

> > > > > > > with some inline frames.

> > > > > > > 

> > > > > > > The particularity of inline frames is that their frame ID's depend on

> > > > > > > the previous frame's ID, and the previous frame's ID relies in the inline

> > > > > > > frame's registers. So it is a bit of a messy situation.

> > > > > > > 

> > > > > > > We have comments in various parts of the code warning about some of these

> > > > > > > particularities.

> > > > > > > 

> > > > > > > In the case of dwarf2_tailcall_sniffer_first, we attempt to unwind the PC,

> > > > > > > which goes through various functions until we eventually invoke

> > > > > > > frame_unwind_got_register. This function will eventually attempt to create

> > > > > > > a lazy value for a particular register, and this lazy value will require

> > > > > > > a valid frame ID.  Since the inline frame doesn't have a valid frame ID

> > > > > > > yet (remember we're still calculating the previous frame's ID so we can tell

> > > > > > > what the inline frame ID is) we will call compute_frame_id for the inline

> > > > > > > frame (level 0).

> > > > > > > 

> > > > > > > We'll eventually hit the assertion above, inside get_frame_id:

> > > > > > > 

> > > > > > > --

> > > > > > >         /* If we haven't computed the frame id yet, then it must be that

> > > > > > >            this is the current frame.  Compute it now, and stash the

> > > > > > >            result.  The IDs of other frames are computed as soon as

> > > > > > >            they're created, in order to detect cycles.  See

> > > > > > >            get_prev_frame_if_no_cycle.  */

> > > > > > >         gdb_assert (fi->level == 0);

> > > > > > > --

> > > > > > > 

> > > > > > > It seems to me we shouldn't have reached this assertion without having the

> > > > > > > inline frame ID already calculated. In fact, it seems we even start recursing

> > > > > > > a bit when we invoke get_prev_frame_always within inline_frame_this_id. But

> > > > > > > a check makes us quit the recursion and proceed to compute the id.

> > > > > > > 

> > > > > > > Here's the call stack for context:

> > > > > > > 

> > > > > > > #0  get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2109

> > > > > > > RECURSION - #1  0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

> > > > > > > #2  0x0000aaaaaae95768 in inline_frame_this_id (this_frame=0xaaaaab85a670, this_cache=0xaaaaab85a688, this_id=0xaaaaab85a6d0)

> > > > > > >       at ../../../repos/binutils-gdb/gdb/inline-frame.c:165

> > > > > > > #3  0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:550

> > > > > > > #4  0x0000aaaaaae19318 in get_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:582

> > > > > > > #5  0x0000aaaaaae13480 in value_of_register_lazy (frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/findvar.c:296

> > > > > > > #6  0x0000aaaaaae16c00 in frame_unwind_got_register (frame=0xaaaaab85a730, regnum=30, new_regnum=30) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:268

> > > > > > > #7  0x0000aaaaaad52604 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=30)

> > > > > > >       at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1296

> > > > > > > #8  0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1229

> > > > > > > #9  0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1320

> > > > > > > #10 0x0000aaaaaab76574 in aarch64_dwarf2_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

> > > > > > >       at ../../../repos/binutils-gdb/gdb/aarch64-tdep.c:1114

> > > > > > > #11 0x0000aaaaaad52724 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)

> > > > > > >       at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1316

> > > > > > > #12 0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1229

> > > > > > > #13 0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1320

> > > > > > > #14 0x0000aaaaaae16a84 in default_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:223

> > > > > > > #15 0x0000aaaaaae32124 in gdbarch_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/gdbarch.c:3074

> > > > > > > #16 0x0000aaaaaad4f15c in dwarf2_tailcall_sniffer_first (this_frame=0xaaaaab85a730, tailcall_cachep=0xaaaaab85a830, entry_cfa_sp_offsetp=0x0)

> > > > > > >       at ../../../repos/binutils-gdb/gdb/dwarf2/frame-tailcall.c:388

> > > > > > > #17 0x0000aaaaaad520c0 in dwarf2_frame_cache (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748) at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1190

> > > > > > > #18 0x0000aaaaaad52204 in dwarf2_frame_this_id (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, this_id=0xaaaaab85a790)

> > > > > > >       at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1218

> > > > > > > #19 0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame.c:550

> > > > > > > #20 0x0000aaaaaae1c958 in get_prev_frame_if_no_cycle (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:1927

> > > > > > > #21 0x0000aaaaaae1cc44 in get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2006

> > > > > > > FIRST CALL - #22 0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124

> > > > > > > #23 0x0000aaaaaae18f68 in skip_artificial_frames (frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:495

> > > > > > > #24 0x0000aaaaaae193e8 in get_stack_frame_id (next_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:596

> > > > > > > #25 0x0000aaaaaae87a54 in process_event_stop_test (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6857

> > > > > > > #26 0x0000aaaaaae86bdc in handle_signal_stop (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6381

> > > > > > > #27 0x0000aaaaaae84fd0 in handle_inferior_event (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:5578

> > > > > > > #28 0x0000aaaaaae81588 in fetch_inferior_event (client_data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:4020

> > > > > > > #29 0x0000aaaaaae5f7fc in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at ../../../repos/binutils-gdb/gdb/inf-loop.c:43

> > > > > > > #30 0x0000aaaaaae8d768 in infrun_async_inferior_event_handler (data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:9377

> > > > > > > #31 0x0000aaaaaabff970 in check_async_event_handlers () at ../../../repos/binutils-gdb/gdb/async-event.c:291

> > > > > > > #32 0x0000aaaaab27cbec in gdb_do_one_event () at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:194

> > > > > > > #33 0x0000aaaaaaef1894 in start_event_loop () at ../../../repos/binutils-gdb/gdb/main.c:356

> > > > > > > #34 0x0000aaaaaaef1a04 in captured_command_loop () at ../../../repos/binutils-gdb/gdb/main.c:416

> > > > > > > #35 0x0000aaaaaaef3338 in captured_main (data=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1254

> > > > > > > #36 0x0000aaaaaaef33a0 in gdb_main (args=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1269

> > > > > > > #37 0x0000aaaaaab6e0dc in main (argc=6, argv=0xfffffffff348) at ../../../repos/binutils-gdb/gdb/gdb.c:32

> > > > > > > 

> > > > > > > The following patch addresses this by using a function that unwinds the PC

> > > > > > > from the next (inline) frame directly as opposed to creating a lazy value

> > > > > > > that is bound to the next frame's ID (still not computed).

> > > > > > > 

> > > > > > > I've validated this for aarch64-linux and x86_64-linux by running the

> > > > > > > testsuite.

> > > > > > > 

> > > > > > > Tromey, would you mind checking if this suits your problematic core file

> > > > > > > tailcall scenario?

> > > > > > > 

> > > > > > > gdb/ChangeLog:

> > > > > > > 

> > > > > > > 2020-04-14  Luis Machado  <luis.machado@linaro.org>

> > > > > > > 

> > > > > > > 	* dwarf2/frame-tailcall.c (dwarf2_tailcall_sniffer_first): Use

> > > > > > > 	get_frame_register instead of gdbarch_unwind_pc.

> > > > > > > ---

> > > > > > >    gdb/dwarf2/frame-tailcall.c | 4 +++-

> > > > > > >    1 file changed, 3 insertions(+), 1 deletion(-)

> > > > > > > 

> > > > > > > diff --git a/gdb/dwarf2/frame-tailcall.c b/gdb/dwarf2/frame-tailcall.c

> > > > > > > index 2d219f13f9..01bb134a5c 100644

> > > > > > > --- a/gdb/dwarf2/frame-tailcall.c

> > > > > > > +++ b/gdb/dwarf2/frame-tailcall.c

> > > > > > > @@ -385,7 +385,9 @@ dwarf2_tailcall_sniffer_first (struct frame_info *this_frame,

> > > > > > >          prev_gdbarch = frame_unwind_arch (this_frame);

> > > > > > >          /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */

> > > > > > > -      prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

> > > > > > > +      get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),

> > > > > > > +			  (gdb_byte *) &prev_pc);

> > > > > > > +      prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);

> > > > > > >          /* call_site_find_chain can throw an exception.  */

> > > > > > >          chain = call_site_find_chain (prev_gdbarch, prev_pc, this_pc);

> > > > > > 

> > > > > > I'm now no longer convinced that this patch is correct, and I'd like

> > > > > > to reopen the discussion.

> > > > > > 

> > > > > > Here's what concerns me, we used to make the following call-chain:

> > > > > > 

> > > > > >     gdbarch_unwind_pc --> frame_unwind_register_unsigned --> frame_unwind_register_value

> > > > > > 

> > > > > > Now we do this:

> > > > > > 

> > > > > >     get_frame_register --> frame_unwind_register --> frame_register_unwind --> frame_unwind_register_value

> > > > > > 

> > > > > > The problem is that gdbarch_unwind_pc' takes an argument 'next_frame',

> > > > > > while, get_frame_register takes an argument called frame', but is

> > > > > > really 'this_frame', it then passes 'frame->next' to

> > > > > > 'frame_unwind_register'.

> > > > > > 

> > > > > > What this means is that if we have a call stack like this:

> > > > > > 

> > > > > >     #3 --> #2 --> #1 --> #0

> > > > > > 

> > > > > > And we invoke the tail-call sniffer in frame #1, previously we figured

> > > > > > out the $pc value in frame #2, while now we figure out the $pc value

> > > > > > in frame #1.

> > > > > > 

> > > > > > I'm even more convinced that this is an error based on the fix patch

> > > > > > you applied later:

> > > > > > 

> > > > > >     commit 991a3e2e9944a4b3a27bd989ac03c18285bd545d

> > > > > >     Date:   Sat Apr 25 00:32:44 2020 -0300

> > > > > > 

> > > > > >         Fix remaining inline/tailcall unwinding breakage for x86_64

> > > > > > 

> > > > 

> > > > After this commit the code looks like this:

> > > > 

> > > >         /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */

> > > >         if (next_frame != nullptr && get_frame_type (next_frame) == INLINE_FRAME

> > > > 	  && !frame_id_computed_p (next_frame))

> > > > 	{

> > > > 	  /* The next frame is an inline frame and its frame id has not been

> > > > 	     computed yet.  */

> > > > 	  get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),

> > > > 			      (gdb_byte *) &prev_pc);

> > > > 	  prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);

> > > > 	}

> > > >         else

> > > > 	prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

> > > > 

> > > > Now the point of this function is to decide if the frame we are in

> > > > _right_now_ was tail called into by it's "apparent" caller.  That is

> > > > when we unwind we have:

> > > > 

> > > >     main --> foo --> bar

> > > > 

> > > > And in foo we ask, can we build a tailcall chain that gets us from

> > > > main to foo.  So in this case, we get the answer yes, and the chain

> > > > returned represents:

> > > > 

> > > >     main --> test_func --> foo

> > > > 

> > > > However, my thinking is that we know we're not in the position of

> > > > 'foo' (that is reached by a tail call) if, foo is an inline frame -

> > > > right?

> > > > 

> > > > So, I wonder if it's as simple as saying:

> > > > 

> > > >     /* We know that THIS_FRAME was not reached by a tail call if

> > > >        THIS_FRAME is an inline frame.  */

> > > >     if (get_frame_type (this_frame) == INLINE_FRAME)

> > > >       return;

> > > > 

> > > > That's totally untested, just a random thought....  I wonder if such a

> > > > change would fix the original failures you saw?

> > > > 

> > > > Thanks,

> > > > Andrew

> > > > 

> > > 

> > > I can attest your testcase fails for aarch64. But your proposed early return

> > > runs into the known frame level assertion, inside get_frame_id:

> > > 

> > > gdb_assert (fi->level == 0);

> > > 

> > > This happens because when we're unwinding the frame for foo, the current

> > > frame is no longer an inline frame. So we proceed to try and fetch PC from

> > > the next (inline) frame. And then we run into the same problem with the lack

> > > of a frame id.

> > > 

> > > When we're unwinding an inline frame whose next frame is the sentinel one,

> > > we're ok. We always have register access through the sentinel frame.

> > > 

> > > I wonder how likely it is to break if there are inline frames at other

> > > positions in the call chain. So your suggestion may fix those cases. But it

> > > won't fix the remaining cases that attempt to unwind registers from the

> > > inline frame.

> > > 

> > > Does that make sense?

> > 

> > I think so.

> > 

> > The more I stare at this code the more I'm convinced that trying to

> > run the tail-call detection (not going to call it sniffing) from an

> > actual frame sniffer, is just a bad idea.

> > 

> > Think about it, we have

> > 

> >    test_func --> foo

> > 

> > we're sniffing for foo, which means we don't yet know what type of

> > frame foo is, and how to unwind it; yet to do tail call detection we

> > need to ask foo to unwind its $pc.

> > 

> > I think the real fix here is to move tail call detection later in the

> > process, hopefully Tom will offer some more details of the original

> > bug that motivated moving tail call detection.

> > 

> > Thanks,

> > Andrew

> > 

> 

> I tend to agree.

> 

> I think we've tried a few things to accommodate tailcall detection early,

> but this is getting harder to solve. So I'm inclined to revisit the tailcall

> detection relocation and make sure it works, but also make sure other

> non-x86 architectures are fine with it as well.

> 

> Introducing more and more conditionals may make the code more prone

> to bugs.


Luis,

I've posted a follow up here:

  https://sourceware.org/pipermail/gdb-patches/2020-June/169789.html

That addresses the issues I've highlighted in this thread.  I would
value your feedback.

Thanks,
Andrew

Patch

diff --git a/gdb/dwarf2/frame-tailcall.c b/gdb/dwarf2/frame-tailcall.c
index 2d219f13f9..01bb134a5c 100644
--- a/gdb/dwarf2/frame-tailcall.c
+++ b/gdb/dwarf2/frame-tailcall.c
@@ -385,7 +385,9 @@  dwarf2_tailcall_sniffer_first (struct frame_info *this_frame,
       prev_gdbarch = frame_unwind_arch (this_frame);
 
       /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */
-      prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);
+      get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),
+			  (gdb_byte *) &prev_pc);
+      prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);
 
       /* call_site_find_chain can throw an exception.  */
       chain = call_site_find_chain (prev_gdbarch, prev_pc, this_pc);