Find tailcall frames before inline frames

Message ID 20200220155820.22809-1-tromey@adacore.com
State New
Headers show
Series
  • Find tailcall frames before inline frames
Related show

Commit Message

Tom Tromey Feb. 20, 2020, 3:58 p.m.
A customer reported a failure to unwind in a certain core dump.  A
lengthy investigation showed that the problem came from the
interaction between the tailcall and inline frame sniffers.

Normally, the regular DWARF unwinder may discover a chain of tail
calls ending in the current frame.  In this case, it sets a member on
the dwarf2_frame_cache object, so that a subsequent call into the
tailcall sniffer will create the tailcall frames.

However, in this scenario, what happened is that the DWARF unwinder
did find tailcall frames -- but then the PC of the first such frame
was recognized and claimed by the inline frame sniffer.

This then caused unwinding to go astray further up the stack.

This patch fixes the problem by arranging for the tailcall sniffer to
be called before the inline sniffer.  This way, if a DWARF frame has
tailcall information, the tailcalls will always be processed first.
This is safe to do, because the tailcall sniffer can only claim a
frame if the previous frame did in fact find this information.  (So,
for example, if no DWARF frame is ever found, then this sniffer will
never trigger.)

This patch also partially reverts:

    commit 1ec56e88aa9b052ab10b806d82fbdbc8d153d977
    Author: Pedro Alves <palves@redhat.com>
    Date:   Fri Nov 22 13:17:46 2013 +0000

	Eliminate dwarf2_frame_cache recursion, don't unwind from the dwarf2 sniffer (move dwarf2_tailcall_sniffer_first elsewhere).

That patch moved the call to dwarf2_tailcall_sniffer_first out of
dwarf2_frame_cache, and into dwarf2_frame_prev_register.  However, in
this situation, this is too late -- by the time
dwarf2_frame_prev_register is called, the frame in question is already
recognized by the inline frame sniffer.

Rather than fully revert that patch, though, this just arranges to
call dwarf2_tailcall_sniffer_first from dwarf2_frame_cache -- which is
called shortly after the DWARF frame sniffer succeeds, via
compute_frame_id.

I don't know how to write a test case for this.

gdb/ChangeLog
2020-02-20  Tom Tromey  <tromey@adacore.com>

	* dwarf2/frame.c (struct dwarf2_frame_cache)
	<checked_tailcall_bottom, entry_cfa_sp_offset,
	entry_cfa_sp_offset_p>: Remove members.
	(dwarf2_frame_cache): Call dwarf2_tailcall_sniffer_first.
	(dwarf2_frame_prev_register): Don't call
	dwarf2_tailcall_sniffer_first.
	(dwarf2_append_unwinders): Don't append tailcall unwinder.
	* frame-unwind.c (add_unwinder): New fuction.
	(frame_unwind_init): Use it.  Add tailcall unwinder.
---
 gdb/ChangeLog      | 12 ++++++++++++
 gdb/dwarf2/frame.c | 34 ++++++++--------------------------
 gdb/frame-unwind.c | 33 +++++++++++++++++++++++++++------
 3 files changed, 47 insertions(+), 32 deletions(-)

-- 
2.21.1

Comments

Tom Tromey March 3, 2020, 9:45 p.m. | #1
Tom> gdb/ChangeLog
Tom> 2020-02-20  Tom Tromey  <tromey@adacore.com>

Tom> 	* dwarf2/frame.c (struct dwarf2_frame_cache)
Tom> 	<checked_tailcall_bottom, entry_cfa_sp_offset,
Tom>    entry_cfa_sp_offset_p> : Remove members.
Tom> 	(dwarf2_frame_cache): Call dwarf2_tailcall_sniffer_first.
Tom> 	(dwarf2_frame_prev_register): Don't call
Tom> 	dwarf2_tailcall_sniffer_first.
Tom> 	(dwarf2_append_unwinders): Don't append tailcall unwinder.
Tom> 	* frame-unwind.c (add_unwinder): New fuction.
Tom> 	(frame_unwind_init): Use it.  Add tailcall unwinder.

I'm going to check this in now.

Tom
Luis Machado March 5, 2020, 10:21 a.m. | #2
Hi Tom,

On 3/3/20 6:45 PM, Tom Tromey wrote:
> Tom> gdb/ChangeLog

> Tom> 2020-02-20  Tom Tromey  <tromey@adacore.com>

> 

> Tom> 	* dwarf2/frame.c (struct dwarf2_frame_cache)

> Tom> 	<checked_tailcall_bottom, entry_cfa_sp_offset,

> Tom>    entry_cfa_sp_offset_p> : Remove members.

> Tom> 	(dwarf2_frame_cache): Call dwarf2_tailcall_sniffer_first.

> Tom> 	(dwarf2_frame_prev_register): Don't call

> Tom> 	dwarf2_tailcall_sniffer_first.

> Tom> 	(dwarf2_append_unwinders): Don't append tailcall unwinder.

> Tom> 	* frame-unwind.c (add_unwinder): New fuction.

> Tom> 	(frame_unwind_init): Use it.  Add tailcall unwinder.

> 

> I'm going to check this in now.

> 

> Tom

> 


This has caused quite a few failures in the following tests for 
aarch64-linux:

gdb.opt/inline-break.exp
gdb.opt/inline-cmds.exp
gdb.python/py-frame-inline.exp
gdb.reverse/insn-reverse.exp

I see the following:

info frame^M
../../../repos/binutils-gdb/gdb/frame.c:579: internal-error: frame_id 
get_frame_id(frame_info*): Assertion `fi->level == 0' failed.^M
A problem internal to GDB has been detected,^M
further debugging may prove unreliable.^M
Quit this debugging session? (y or n) FAIL: 
gdb.python/py-frame-inline.exp: info frame (GDB internal error)
Resyncing due to internal error.
n^M
^M
This is a bug, please report it.  For instructions, see:^M
<http://www.gnu.org/software/gdb/bugs/>.^M


(gdb) up^M
../../../repos/binutils-gdb/gdb/inline-frame.c:172: internal-error: void 
inline_frame_this_id(frame_info*, void**, frame_id*): Assertion 
`frame_id_p (*this_id)' failed.^M
A problem internal to GDB has been detected,^M
further debugging may prove unreliable.^M
Quit this debugging session? (y or n) FAIL: 
gdb.python/py-frame-inline.exp: up (GDB internal error)
Resyncing due to internal error.
n^M
^M
This is a bug, please report it.  For instructions, see:^M
<http://www.gnu.org/software/gdb/bugs/>.^M


I can help get more information on it if you need.
Tom Tromey March 5, 2020, 4:56 p.m. | #3
Luis> This has caused quite a few failures in the following tests for
Luis> aarch64-linux:

Sorry about that.  I will take a look.

Tom
Luis Machado March 9, 2020, 5:55 p.m. | #4
On 3/5/20 7:21 AM, Luis Machado wrote:
> Hi Tom,

> 

> On 3/3/20 6:45 PM, Tom Tromey wrote:

>> Tom> gdb/ChangeLog

>> Tom> 2020-02-20  Tom Tromey  <tromey@adacore.com>

>>

>> Tom>     * dwarf2/frame.c (struct dwarf2_frame_cache)

>> Tom>     <checked_tailcall_bottom, entry_cfa_sp_offset,

>> Tom>    entry_cfa_sp_offset_p> : Remove members.

>> Tom>     (dwarf2_frame_cache): Call dwarf2_tailcall_sniffer_first.

>> Tom>     (dwarf2_frame_prev_register): Don't call

>> Tom>     dwarf2_tailcall_sniffer_first.

>> Tom>     (dwarf2_append_unwinders): Don't append tailcall unwinder.

>> Tom>     * frame-unwind.c (add_unwinder): New fuction.

>> Tom>     (frame_unwind_init): Use it.  Add tailcall unwinder.

>>

>> I'm going to check this in now.

>>

>> Tom

>>

> 

> This has caused quite a few failures in the following tests for 

> aarch64-linux:

> 

> gdb.opt/inline-break.exp

> gdb.opt/inline-cmds.exp

> gdb.python/py-frame-inline.exp

> gdb.reverse/insn-reverse.exp

> 

> I see the following:

> 

> info frame^M

> ../../../repos/binutils-gdb/gdb/frame.c:579: internal-error: frame_id 

> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.^M

> A problem internal to GDB has been detected,^M

> further debugging may prove unreliable.^M

> Quit this debugging session? (y or n) FAIL: 

> gdb.python/py-frame-inline.exp: info frame (GDB internal error)

> Resyncing due to internal error.

> n^M

> ^M

> This is a bug, please report it.  For instructions, see:^M

> <http://www.gnu.org/software/gdb/bugs/>.^M

> 

> 

> (gdb) up^M

> ../../../repos/binutils-gdb/gdb/inline-frame.c:172: internal-error: void 

> inline_frame_this_id(frame_info*, void**, frame_id*): Assertion 

> `frame_id_p (*this_id)' failed.^M

> A problem internal to GDB has been detected,^M

> further debugging may prove unreliable.^M

> Quit this debugging session? (y or n) FAIL: 

> gdb.python/py-frame-inline.exp: up (GDB internal error)

> Resyncing due to internal error.

> n^M

> ^M

> This is a bug, please report it.  For instructions, see:^M

> <http://www.gnu.org/software/gdb/bugs/>.^M

> 

> 

> I can help get more information on it if you need.


Reported as https://sourceware.org/bugzilla/show_bug.cgi?id=25649 so we 
can track it.
Tom Tromey March 12, 2020, 9:34 p.m. | #5
>>>>> "Luis" == Luis Machado <luis.machado@linaro.org> writes:


Luis> This has caused quite a few failures in the following tests for
Luis> aarch64-linux:

I still haven't really tried to reproduce this yet.
I'll try tomorrow, I hope.

Luis> ../../../repos/binutils-gdb/gdb/frame.c:579: internal-error: frame_id
Luis> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

Meanwhile I wonder if this is the same as

https://sourceware.org/pipermail/gdb-patches/2020-February/165511.html

Tom
Pedro Alves via Gdb-patches March 13, 2020, 1:31 p.m. | #6
On 3/12/20 6:34 PM, Tom Tromey wrote:
>>>>>> "Luis" == Luis Machado <luis.machado@linaro.org> writes:

> 

> Luis> This has caused quite a few failures in the following tests for

> Luis> aarch64-linux:

> 

> I still haven't really tried to reproduce this yet.

> I'll try tomorrow, I hope.

> 

> Luis> ../../../repos/binutils-gdb/gdb/frame.c:579: internal-error: frame_id

> Luis> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

> 

> Meanwhile I wonder if this is the same as

> 

> https://sourceware.org/pipermail/gdb-patches/2020-February/165511.html

> 

> Tom

> 


The mention of fi->level looks the same, but i haven't looked into it yet.

I was planning to pinpoint the failure point in order to make this 
easier to solve.
Pedro Alves via Gdb-patches March 24, 2020, 9:24 p.m. | #7
Hi Tom,

On 3/13/20 10:31 AM, Luis Machado wrote:
> On 3/12/20 6:34 PM, Tom Tromey wrote:

>>>>>>> "Luis" == Luis Machado <luis.machado@linaro.org> writes:

>>

>> Luis> This has caused quite a few failures in the following tests for

>> Luis> aarch64-linux:

>>

>> I still haven't really tried to reproduce this yet.

>> I'll try tomorrow, I hope.

>>

>> Luis> ../../../repos/binutils-gdb/gdb/frame.c:579: internal-error: 

>> frame_id

>> Luis> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

>>

>> Meanwhile I wonder if this is the same as

>>

>> https://sourceware.org/pipermail/gdb-patches/2020-February/165511.html

>>

>> Tom

>>

> 

> The mention of fi->level looks the same, but i haven't looked into it yet.

> 

> I was planning to pinpoint the failure point in order to make this 

> easier to solve.


Having spent a few days trying to understand this problem, it seems all 
of these fi->level assertions (including 
https://sourceware.org/bugzilla/show_bug.cgi?id=22748) are related to 
attempting to unwind from places not safe to do so. That is, we're 
trying to unwind some content (registers for example) before a given 
frame is assigned a frame id.

For some cases we can see compute_frame_id being invoked in recursion 
for the same frame, which would lead to an infinite recursion. So the 
assertion catches this.

In my particular case, the call to dwarf2_tailcall_sniffer_first inside 
dwarf2_frame_cache leads to such infinite recursion, since no frame id 
has been assigned to the frame being used yet. It will be assigned by 
the time compute_frame_id is done.

I think dwarf2_tailcall_sniffer_first would have to be called from 
somewhere else, or conditions put in place. But I'm afraid adding more 
conditions would complicate things further. And this code is already 
reasonably complicated.

Since this is causing a number of inlining test failures for aarch64 
and, from what i saw, some other architectures like s390, should we 
consider reverting this while we discuss/review a reworked version of 
the patch?
Tom Tromey March 26, 2020, 1:59 a.m. | #8
>>>>> "Luis" == Luis Machado <luis.machado@linaro.org> writes:


Luis> Having spent a few days trying to understand this problem, it seems
Luis> all of these fi->level assertions (including 
Luis> https://sourceware.org/bugzilla/show_bug.cgi?id=22748) are related to
Luis> attempting to unwind from places not safe to do so. That is, we're 
Luis> trying to unwind some content (registers for example) before a given
Luis> frame is assigned a frame id.

Yes, I agree.

Luis> I think dwarf2_tailcall_sniffer_first would have to be called from
Luis> somewhere else, or conditions put in place. But I'm afraid adding more 
Luis> conditions would complicate things further. And this code is already
Luis> reasonably complicated.

Luis> Since this is causing a number of inlining test failures for aarch64
Luis> and, from what i saw, some other architectures like s390, should we 
Luis> consider reverting this while we discuss/review a reworked version of
Luis> the patch?

I think that would be fine.  I haven't found the time to really dig into it.

I suspect that maybe the architectures doing this aren't playing by the rules.
Even so, though, it doesn't change that this used to work and now doesn't.

Tom
Pedro Alves via Gdb-patches March 26, 2020, 2:47 a.m. | #9
On Wed, Mar 25, 2020, 22:59 Tom Tromey <tom@tromey.com> wrote:

> >>>>> "Luis" == Luis Machado <luis.machado@linaro.org> writes:

>

> Luis> Having spent a few days trying to understand this problem, it seems

> Luis> all of these fi->level assertions (including

> Luis> https://sourceware.org/bugzilla/show_bug.cgi?id=22748) are related

> to

> Luis> attempting to unwind from places not safe to do so. That is, we're

> Luis> trying to unwind some content (registers for example) before a given

> Luis> frame is assigned a frame id.

>

> Yes, I agree.

>

> Luis> I think dwarf2_tailcall_sniffer_first would have to be called from

> Luis> somewhere else, or conditions put in place. But I'm afraid adding

> more

> Luis> conditions would complicate things further. And this code is already

> Luis> reasonably complicated.

>

> Luis> Since this is causing a number of inlining test failures for aarch64

> Luis> and, from what i saw, some other architectures like s390, should we

> Luis> consider reverting this while we discuss/review a reworked version of

> Luis> the patch?

>

> I think that would be fine.  I haven't found the time to really dig into

> it.

>

> I suspect that maybe the architectures doing this aren't playing by the

> rules.

> Even so, though, it doesn't change that this used to work and now doesn't.

>


It could be. I noticed aarch64 doesn't implement gdbarch_unwind_pc. But
s390 does.

It is hard to tell what is wrong given different unwinding implementations
may give correct results, even with wrong assumptions.


> Tom

>

Patch

diff --git a/gdb/dwarf2/frame.c b/gdb/dwarf2/frame.c
index b240a25e2d8..74488f9a8aa 100644
--- a/gdb/dwarf2/frame.c
+++ b/gdb/dwarf2/frame.c
@@ -959,22 +959,12 @@  struct dwarf2_frame_cache
   /* The .text offset.  */
   CORE_ADDR text_offset;
 
-  /* True if we already checked whether this frame is the bottom frame
-     of a virtual tail call frame chain.  */
-  int checked_tailcall_bottom;
-
   /* If not NULL then this frame is the bottom frame of a TAILCALL_FRAME
      sequence.  If NULL then it is a normal case with no TAILCALL_FRAME
      involved.  Non-bottom frames of a virtual tail call frames chain use
      dwarf2_tailcall_frame_unwind unwinder so this field does not apply for
      them.  */
   void *tailcall_cache;
-
-  /* The number of bytes to subtract from TAILCALL_FRAME frames frame
-     base to get the SP, to simulate the return address pushed on the
-     stack.  */
-  LONGEST entry_cfa_sp_offset;
-  int entry_cfa_sp_offset_p;
 };
 
 static struct dwarf2_frame_cache *
@@ -1037,6 +1027,8 @@  dwarf2_frame_cache (struct frame_info *this_frame, void **this_cache)
      in an address that's within the range of FDE locations.  This
      is due to the possibility of the function occupying non-contiguous
      ranges.  */
+  LONGEST entry_cfa_sp_offset;
+  int entry_cfa_sp_offset_p = 0;
   if (get_frame_func_if_available (this_frame, &entry_pc)
       && fde->initial_location <= entry_pc
       && entry_pc < fde->initial_location + fde->address_range)
@@ -1049,8 +1041,8 @@  dwarf2_frame_cache (struct frame_info *this_frame, void **this_cache)
 	  && (dwarf_reg_to_regnum (gdbarch, fs.regs.cfa_reg)
 	      == gdbarch_sp_regnum (gdbarch)))
 	{
-	  cache->entry_cfa_sp_offset = fs.regs.cfa_offset;
-	  cache->entry_cfa_sp_offset_p = 1;
+	  entry_cfa_sp_offset = fs.regs.cfa_offset;
+	  entry_cfa_sp_offset_p = 1;
 	}
     }
   else
@@ -1195,6 +1187,10 @@  incomplete CFI data; unspecified registers (e.g., %s) at %s"),
       && fs.regs.reg[fs.retaddr_column].how == DWARF2_FRAME_REG_UNDEFINED)
     cache->undefined_retaddr = 1;
 
+  dwarf2_tailcall_sniffer_first (this_frame, &cache->tailcall_cache,
+				 (entry_cfa_sp_offset_p
+				  ? &entry_cfa_sp_offset : NULL));
+
   return cache;
 }
 
@@ -1239,16 +1235,6 @@  dwarf2_frame_prev_register (struct frame_info *this_frame, void **this_cache,
   CORE_ADDR addr;
   int realnum;
 
-  /* Check whether THIS_FRAME is the bottom frame of a virtual tail
-     call frame chain.  */
-  if (!cache->checked_tailcall_bottom)
-    {
-      cache->checked_tailcall_bottom = 1;
-      dwarf2_tailcall_sniffer_first (this_frame, &cache->tailcall_cache,
-				     (cache->entry_cfa_sp_offset_p
-				      ? &cache->entry_cfa_sp_offset : NULL));
-    }
-
   /* Non-bottom frames of a virtual tail call frames chain use
      dwarf2_tailcall_frame_unwind unwinder so this code does not apply for
      them.  If dwarf2_tailcall_prev_register_first does not have specific value
@@ -1410,10 +1396,6 @@  static const struct frame_unwind dwarf2_signal_frame_unwind =
 void
 dwarf2_append_unwinders (struct gdbarch *gdbarch)
 {
-  /* TAILCALL_FRAME must be first to find the record by
-     dwarf2_tailcall_sniffer_first.  */
-  frame_unwind_append_unwinder (gdbarch, &dwarf2_tailcall_frame_unwind);
-
   frame_unwind_append_unwinder (gdbarch, &dwarf2_frame_unwind);
   frame_unwind_append_unwinder (gdbarch, &dwarf2_signal_frame_unwind);
 }
diff --git a/gdb/frame-unwind.c b/gdb/frame-unwind.c
index 35f2e82c57d..3334c472d02 100644
--- a/gdb/frame-unwind.c
+++ b/gdb/frame-unwind.c
@@ -27,6 +27,7 @@ 
 #include "gdb_obstack.h"
 #include "target.h"
 #include "gdbarch.h"
+#include "dwarf2/frame-tailcall.h"
 
 static struct gdbarch_data *frame_unwind_data;
 
@@ -43,6 +44,18 @@  struct frame_unwind_table
   struct frame_unwind_table_entry **osabi_head;
 };
 
+/* A helper function to add an unwinder to a list.  LINK says where to
+   install the new unwinder.  The new link is returned.  */
+
+static struct frame_unwind_table_entry **
+add_unwinder (struct obstack *obstack, const struct frame_unwind *unwinder,
+	      struct frame_unwind_table_entry **link)
+{
+  *link = OBSTACK_ZALLOC (obstack, struct frame_unwind_table_entry);
+  (*link)->unwinder = unwinder;
+  return &(*link)->next;
+}
+
 static void *
 frame_unwind_init (struct obstack *obstack)
 {
@@ -51,13 +64,21 @@  frame_unwind_init (struct obstack *obstack)
 
   /* Start the table out with a few default sniffers.  OSABI code
      can't override this.  */
-  table->list = OBSTACK_ZALLOC (obstack, struct frame_unwind_table_entry);
-  table->list->unwinder = &dummy_frame_unwind;
-  table->list->next = OBSTACK_ZALLOC (obstack,
-				      struct frame_unwind_table_entry);
-  table->list->next->unwinder = &inline_frame_unwind;
+  struct frame_unwind_table_entry **link = &table->list;
+
+  link = add_unwinder (obstack, &dummy_frame_unwind, link);
+  /* The DWARF tailcall sniffer must come before the inline sniffer.
+     Otherwise, we can end up in a situation where a DWARF frame finds
+     tailcall information, but then the inline sniffer claims a frame
+     before the tailcall sniffer, resulting in confusion.  This is
+     safe to do always because the tailcall sniffer can only ever be
+     activated if the newer frame was created using the DWARF
+     unwinder, and it also found tailcall information.  */
+  link = add_unwinder (obstack, &dwarf2_tailcall_frame_unwind, link);
+  link = add_unwinder (obstack, &inline_frame_unwind, link);
+
   /* The insertion point for OSABI sniffers.  */
-  table->osabi_head = &table->list->next->next;
+  table->osabi_head = link;
   return table;
 }