[4/5] gdb: prevent an assertion when computing the frame_id for an inline frame

Message ID 26d9be2b7b27c00c4fc282e6b94b7e63d52ad2d0.1622321523.git.andrew.burgess@embecosm.com
State Superseded
Headers show
Series
  • Fix for an assertion when unwinding with inline frames
Related show

Commit Message

Andrew Burgess May 29, 2021, 8:57 p.m.
I ran into this assertion while GDB was trying to unwind the stack:

  gdb/inline-frame.c:173: internal-error: void inline_frame_this_id(frame_info*, void**, frame_id*): Assertion `frame_id_p (*this_id)' failed.

That is, when building the frame_id for an inline frame, GDB asks for
the frame_id of the previous frame.  Unfortunately, no valid frame_id
was returned for the previous frame, and so the assertion triggers.

What is happening is this, I had a stack that looked something like
this (the arrows '->' point from caller to callee):

  normal_frame -> inline_frame

However, for whatever reason (e.g. broken debug information, or
corrupted stack contents in the inferior), when GDB tries to unwind
"normal_frame", it ends up getting back effectively the same frame,
thus the call stack looks like this to GDB:

  .-> normal_frame -> inline_frame
  |     |
  '-----'

Given such a situation we would expect GDB to terminate the stack with
an error like this:

  Backtrace stopped: previous frame identical to this frame (corrupt stack?)

However, the inline_frame causes a problem, and here's why:

When unwinding we start from the sentinel frame and call
get_prev_frame.  We eventually end up in get_prev_frame_if_no_cycle,
in here we create a raw frame, and as this is frame #0 we immediately
return.

However, eventually we will try to unwind the stack further.  When we
do this we inevitably needing to know the frame_id for frame #0, and
so, eventually, we end up in compute_frame_id.

In compute_frame_id we first find the right unwinder for this frame,
in our case (i.e. for inline_frame) the $pc is within the function
normal_frame, but also within a block associated with the inlined
function inline_frame, as such the inline frame unwinder claims this
frame.

Back in compute_frame_id we next compute the frame_id, for our
inline_frame this means a call to inline_frame_this_id.

The ID of an inline frame is based on the id of the previous frame, so
from inline_frame_this_id we call get_prev_frame_always, this
eventually calls get_prev_frame_if_no_cycle again, which creates
another raw frame and calls compute_frame_id (for frames other than
frame 0 we immediately compute the frame_id).

In compute_frame_id we again identify the correct unwinder for this
frame.  Our $pc is unchanged, however, the fact that the next frame is
of type INLINE_FRAME prevents the inline frame unwinder from claiming
this frame again, and so, the standard DWARF frame unwinder claims
normal_frame.

We return to compute_frame_id and call the standard DWARF function to
build the frame_id for normal_frame.

With the frame_id of normal_frame figured out we return to
compute_frame_id, and then to get_prev_frame_if_no_cycle, where we add
the ID for normal_frame into the frame_id cache, and return the frame
back to inline_frame_this_id.

From inline_frame_this_id we build a frame_id for inline_frame and
return to compute_frame_id, and then to get_prev_frame_if_no_cycle,
which adds the frame_id for inline_frame into the frame_id cache.

So far, so good.

However, as we are trying to unwind the compute stack, we eventually
ask for the previous frame of normal_frame, remember, that at this
point GDB doesn't know the stack is corrupted (with a cycle), GDB
still needs to figure that out.

So, we eventually end up in get_prev_frame_if_no_cycle where we create
a raw frame and call compute_frame_id, remember, this is for the frame
before normal_frame.

The first task for compute_frame_id is to find the unwinder for this
frame, so all of the frame sniffers are tried in order, this includes
the inline frame sniffer.

The inline frame sniffer asks for the $pc, this request is sent up the
stack to normal_frame, which, due to its cyclic behaviour, tells GDB
that the $pc in the previous frame was the same as the $pc in
normal_frame.

GDB spots that this $pc corresponds to both the function normal_frame
and also the inline function inline_frame.  As the next frame is not
an INLINE_FRAME then GDB figures that we have not yet built a frame to
cover inline_frame, and so the inline sniffer claims this new frame.
Our stack is now looking like this:

  inline_frame -> normal_frame -> inline_frame

But, we have not yet computed the frame id for the outer most (on the
left) inline_frame.  After the frame sniffer has claimed the inline
frame GDB returns to compute_frame_id and calls inline_frame_this_id.

In here GDB calls get_prev_frame_always, which eventually ends up
in get_prev_frame_if_no_cycle again, where we create a raw frame and
call compute_frame_id.

Just like before compute_frame_id tries to find an unwinder for this
new frame, it sees that the $pc is within both normal_frame and
inline_frame, but the next frame is, again, an INLINE_FRAME, so, just
like before the standard DWARF unwinder claims this frame.  Back in
compute_frame_id we again call the standard DWARF function to build
the frame_id for this new copy of normal_frame.

At this point the stack looks like this:

  normal_frame -> inline_frame -> normal_frame -> inline_frame

After compute_frame_id we return to get_prev_frame_if_no_cycle, where
we try to add the frame_id for the new normal_frame into the frame_id
cache, however, unlike before, we fail to add this frame_id as this is
a duplicate of the previous normal_frame frame_id.  Having found a
duplicate get_prev_frame_if_no_cycle unlinks the new frame from the
stack, and returns nullptr, the stack now looks like this:

  inline_frame -> normal_frame -> inline_frame

The nullptr result from get_prev_frame_if_no_cycle is fed back to
inline_frame_this_id, which forwards this to get_frame_id, which
immediately returns null_frame_id.  As null_frame_id is not considered
a valid frame_id, this is what triggers the assertion.

In summary then:

 - inline_frame_this_id currently assumes that as the inline frame
   exists, we will always get a valid frame back from
   get_prev_frame_always,

 - get_prev_frame_if_no_cycle currently assumes that it is safe to
   return nullptr when it sees a cycle.

Notice that in frame.c:compute_frame_id, this code:

  fi->this_id.value = outer_frame_id;
  fi->unwind->this_id (fi, &fi->prologue_cache, &fi->this_id.value);
  gdb_assert (frame_id_p (fi->this_id.value));

The assertion makes it clear that the this_id function must always
return a valid frame_id (e.g. null_frame_id is not a valid return
value), and similarly in inline_frame.c:inline_frame_this_id this
code:

  *this_id = get_frame_id (get_prev_frame_always (this_frame));
  /* snip comment */
  gdb_assert (frame_id_p (*this_id));

Makes it clear that every inline frame expects to be able to get a
previous frame, which will have a valid frame_id.

As I have discussed above, these assumptions don't currently hold in
all cases.

One possibility would be to move the call to get_prev_frame_always
forward from inline_frame_this_id to inline_frame_sniffer, however,
this falls foul of (in frame.c:frame_cleanup_after_sniffer) this
assertion:

  /* No sniffer should extend the frame chain; sniff based on what is
     already certain.  */
  gdb_assert (!frame->prev_p);

This assert prohibits any sniffer from trying to get the previous
frame, as getting the previous frame is likely to depend on the next
frame, I can understand why this assertion is a good thing, and I'm in
no rush to alter this rule.

The solution I am proposing here is to add a special case to
get_prev_frame_if_no_cycle, such that, if we find a cycle, and we know
we are fetching the previous frame as a result of computing the
frame_id for the next frame, which is an INLINE_FRAME, then, instead
of returning nullptr, do still return the frame.

This is safe (I claim) because, if the frame_id of the NORMAL_FRAME
was a duplicate then the INLINE_FRAME should also be a duplicate, and
so, the INLINE_FRAME will be rejected as a duplicate just as the
NORMAL_FRAME was.

To catch cases where this special case might go wrong I do two things,
first, even though I do now return the previous frame, I still
disconnect the previous frame from the next/prev links, this allows me
to do the second thing, which is to add an assert, if a frame is added
to the frame_id cache, and it is an INLINE_FRAME, then its prev link
must not be nullptr.

This logic should be sound as, computing the frame_id for an inline
frame requires GDB to fetch the previous frame.  For most (all?) other
frame types this is not the case, and so, it is only inline frames for
which you are guaranteed that, after computing the frame_id, the
previous frame is known.

So, if my new special case triggers, and we return a previous frame
even when that previous frame is a duplicate, and _somehow_ the inline
frame that we return this special case frame too is not then rejected
from the frame_id cache, the inline frame's prev link will be nullptr,
and the new assertion will trigger.

gdb/ChangeLog:

	* frame.c (get_prev_frame_if_no_cycle): Always return prev_frame
	when computing the frame_id for an INLINE_FRAME.  Add an extra
	assertion.

gdb/testsuite/ChangeLog:

	* gdb.base/inline-frame-bad-unwind.c: New file.
	* gdb.base/inline-frame-bad-unwind.exp: New file.
	* gdb.base/inline-frame-bad-unwind.py: New file.
---
 gdb/ChangeLog                                 |   6 +
 gdb/frame.c                                   |  45 ++++++-
 gdb/testsuite/ChangeLog                       |   6 +
 .../gdb.base/inline-frame-bad-unwind.c        |  58 +++++++++
 .../gdb.base/inline-frame-bad-unwind.exp      | 122 ++++++++++++++++++
 .../gdb.base/inline-frame-bad-unwind.py       |  85 ++++++++++++
 6 files changed, 321 insertions(+), 1 deletion(-)
 create mode 100644 gdb/testsuite/gdb.base/inline-frame-bad-unwind.c
 create mode 100644 gdb/testsuite/gdb.base/inline-frame-bad-unwind.exp
 create mode 100644 gdb/testsuite/gdb.base/inline-frame-bad-unwind.py

-- 
2.25.4

Patch

diff --git a/gdb/frame.c b/gdb/frame.c
index d2e14c831a0..b0943c02115 100644
--- a/gdb/frame.c
+++ b/gdb/frame.c
@@ -2125,7 +2125,50 @@  get_prev_frame_if_no_cycle (struct frame_info *this_frame)
 	  /* Unlink.  */
 	  prev_frame->next = NULL;
 	  this_frame->prev = NULL;
-	  prev_frame = NULL;
+
+	  /* Consider the call stack A->B, where A is a normal frame and B
+	     is an inline frame.  When computing the frame-id for B we need
+	     to compute the frame-id for A.
+
+	     If the frame-id for A is a duplicate then it must be the case
+	     that B will also be a duplicate.
+
+	     If we spot A as being a duplicate here and so return NULL then
+	     B will fail to obtain a valid frame-id for A, and thus B will
+	     be unable to return a valid frame-id (in fact an assertion
+	     will trigger).
+
+	     What this means is that, if we are being asked to get the
+	     previous frame for an inline frame and we want to reject the
+	     new (previous) frame then we should really return the frame so
+	     that the inline frame can still compute its frame-id.  This is
+	     safe as we can be confident that the inline frame-id will also
+	     be a duplicate, and so the inline frame (and therefore all
+	     frames previous to it) will then be rejected.  */
+	  if (this_frame->unwind->type != INLINE_FRAME
+	      || this_frame->this_id.p != frame_id_status::COMPUTING)
+	    prev_frame = NULL;
+	}
+      else
+	{
+	  /* This assertion ties into the special handling of inline frames
+	     above.
+
+	     We know that to compute the frame-id of an inline frame we
+	     must first compute the frame-id of the inline frame's previous
+	     frame.
+
+	     If the previous frame is rejected as a duplicate then it
+	     should be the case that the inline frame is also rejected as a
+	     duplicate, and we should not reach this assertion.
+
+	     However, if we do reach this assertion then the inline frame
+	     has not been rejected, thus, it should be the case that the
+	     frame previous to the inline frame has also not be rejected,
+	     this is reflected by the requirement that the inline frame's
+	     previous pointer not be nullptr at this point.  */
+	  gdb_assert (this_frame->unwind->type != INLINE_FRAME
+		      || this_frame->prev != nullptr);
 	}
     }
   catch (const gdb_exception &ex)
diff --git a/gdb/testsuite/gdb.base/inline-frame-bad-unwind.c b/gdb/testsuite/gdb.base/inline-frame-bad-unwind.c
new file mode 100644
index 00000000000..704a994c4e6
--- /dev/null
+++ b/gdb/testsuite/gdb.base/inline-frame-bad-unwind.c
@@ -0,0 +1,58 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2021 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+static void foo (void);
+static void bar (void);
+
+volatile int global_var;
+volatile int level_counter;
+
+static void __attribute__((noinline))
+bar (void)
+{
+  /* Do some work.  */
+  ++global_var;
+
+  /* Now the inline function.  */
+  --level_counter;
+  foo ();
+  ++level_counter;
+
+  /* Do some work.  */
+  ++global_var;
+}
+
+static inline void __attribute__((__always_inline__))
+foo (void)
+{
+  if (level_counter > 1)
+    {
+      --level_counter;
+      bar ();
+      ++level_counter;
+    }
+  else
+    ++global_var;	/* Break here.  */
+}
+
+int
+main ()
+{
+  level_counter = 6;
+  bar ();
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/inline-frame-bad-unwind.exp b/gdb/testsuite/gdb.base/inline-frame-bad-unwind.exp
new file mode 100644
index 00000000000..49c35517801
--- /dev/null
+++ b/gdb/testsuite/gdb.base/inline-frame-bad-unwind.exp
@@ -0,0 +1,122 @@ 
+# Copyright (C) 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This test checks for an edge case when unwinding inline frames which
+# occur towards the older end of the stack when the stack ends with a
+# cycle.  Consider this well formed stack:
+#
+#   main -> normal_frame -> inline_frame
+#
+# Now consider that, for whatever reason, the stack unwinding of
+# "normal_frame" becomes corrupted, such that the stack appears to be
+# this:
+#
+#   .-> normal_frame -> inline_frame
+#   |      |
+#   '------'
+#
+# When confrontend with such a situation we would expect GDB to detect
+# the stack frame cycle and terminate the backtrace at the first
+# instance of "normal_frame" with a message:
+#
+#   Backtrace stopped: previous frame identical to this frame (corrupt stack?)
+#
+# However, at one point there was a bug in GDB's inline frame
+# mechanism such that the fact that "inline_frame" was inlined into
+# "normal_frame" would cause GDB to trigger an assertion.
+#
+# This text makes use of a Python unwinder which can fake the cyclic
+# stack cycle, further the test sets up multiple levels of normal and
+# inline frames.  At the point of testing the stack looks like this:
+#
+#   main -> bar -> foo -> bar -> foo -> bar -> foo
+#
+# Where "bar" is a normal frame, and "foo" is an inline frame.
+#
+# The python unwinder is then used to force a stack cycle at each
+# "bar" frame in turn, we then check that GDB can successfully unwind
+# the stack.
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile}]} {
+    return -1
+}
+
+# Skip this test if Python scripting is not enabled.
+if { [skip_python_tests] } { continue }
+
+if ![runto_main] then {
+    fail "can't run to main"
+    return 0
+}
+
+set pyfile [gdb_remote_download host ${srcdir}/${subdir}/${testfile}.py]
+
+# Run to the breakpoint where we will carry out the test.
+gdb_breakpoint [gdb_get_line_number "Break here"]
+gdb_continue_to_breakpoint "stop at test breakpoint"
+
+# Load the script containing the unwinder, this must be done at the
+# testing point as the script will examine the stack as it is loaded.
+gdb_test_no_output "source ${pyfile}"\
+    "import python scripts"
+
+# Check the unbroken stack.
+gdb_test_sequence "bt"  "Backtrace when the unwind is left unbroken" {
+    "\\r\\n#0 \[^\r\n\]* foo \\(\\) at "
+    "\\r\\n#1 \[^\r\n\]* bar \\(\\) at "
+    "\\r\\n#2 \[^\r\n\]* foo \\(\\) at "
+    "\\r\\n#3 \[^\r\n\]* bar \\(\\) at "
+    "\\r\\n#4 \[^\r\n\]* foo \\(\\) at "
+    "\\r\\n#5 \[^\r\n\]* bar \\(\\) at "
+    "\\r\\n#6 \[^\r\n\]* main \\(\\) at "
+}
+
+# Arrange to introduce a stack cycle at frame 5.
+gdb_test_no_output "python stop_at_level=5"
+gdb_test "maint flush register-cache" \
+    "Register cache flushed\\." ""
+gdb_test_sequence "bt"  "Backtrace when the unwind is broken at frame 5" {
+    "\\r\\n#0 \[^\r\n\]* foo \\(\\) at "
+    "\\r\\n#1 \[^\r\n\]* bar \\(\\) at "
+    "\\r\\n#2 \[^\r\n\]* foo \\(\\) at "
+    "\\r\\n#3 \[^\r\n\]* bar \\(\\) at "
+    "\\r\\n#4 \[^\r\n\]* foo \\(\\) at "
+    "\\r\\n#5 \[^\r\n\]* bar \\(\\) at "
+    "\\r\\nBacktrace stopped: previous frame identical to this frame \\(corrupt stack\\?\\)"
+}
+
+# Arrange to introduce a stack cycle at frame 3.
+gdb_test_no_output "python stop_at_level=3"
+gdb_test "maint flush register-cache" \
+    "Register cache flushed\\." ""
+gdb_test_sequence "bt"  "Backtrace when the unwind is broken at frame 3" {
+    "\\r\\n#0 \[^\r\n\]* foo \\(\\) at "
+    "\\r\\n#1 \[^\r\n\]* bar \\(\\) at "
+    "\\r\\n#2 \[^\r\n\]* foo \\(\\) at "
+    "\\r\\n#3 \[^\r\n\]* bar \\(\\) at "
+    "\\r\\nBacktrace stopped: previous frame identical to this frame \\(corrupt stack\\?\\)"
+}
+
+# Arrange to introduce a stack cycle at frame 1.
+gdb_test_no_output "python stop_at_level=1"
+gdb_test "maint flush register-cache" \
+    "Register cache flushed\\." ""
+gdb_test_sequence "bt"  "Backtrace when the unwind is broken at frame 1" {
+    "\\r\\n#0 \[^\r\n\]* foo \\(\\) at "
+    "\\r\\n#1 \[^\r\n\]* bar \\(\\) at "
+    "\\r\\nBacktrace stopped: previous frame identical to this frame \\(corrupt stack\\?\\)"
+}
diff --git a/gdb/testsuite/gdb.base/inline-frame-bad-unwind.py b/gdb/testsuite/gdb.base/inline-frame-bad-unwind.py
new file mode 100644
index 00000000000..21743f7864a
--- /dev/null
+++ b/gdb/testsuite/gdb.base/inline-frame-bad-unwind.py
@@ -0,0 +1,85 @@ 
+# Copyright (C) 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import gdb
+from gdb.unwinder import Unwinder
+
+# Set this to the stack level the backtrace should be corrupted at.
+# This will only work for frame 1, 3, or 5 in the test this unwinder
+# was written for.
+stop_at_level = None
+
+# Set this to the stack frame size of frames 1, 3, and 5.  These
+# frames wil all have the same stack frame size as they are the same
+# function called recursively.
+stack_adjust = None
+
+
+class FrameId(object):
+    def __init__(self, sp, pc):
+        self._sp = sp
+        self._pc = pc
+
+    @property
+    def sp(self):
+        return self._sp
+
+    @property
+    def pc(self):
+        return self._pc
+
+
+class TestUnwinder(Unwinder):
+    def __init__(self):
+        Unwinder.__init__(self, "stop at level")
+
+    def __call__(self, pending_frame):
+        global stop_at_level
+        global stack_adjust
+
+        if stop_at_level == None or pending_frame.level() != stop_at_level:
+            return None
+
+        if stack_adjust is None:
+            raise gdb.GdbError("invalid stack_adjust")
+
+        if not stop_at_level in [1, 3, 5]:
+            raise gdb.GdbError("invalid stop_at_level")
+
+        sp_desc = pending_frame.architecture().registers().find("sp")
+        sp = pending_frame.read_register(sp_desc) + stack_adjust
+        pc = (gdb.lookup_symbol("bar"))[0].value().address
+        unwinder = pending_frame.create_unwind_info(FrameId(sp, pc))
+
+        for reg in pending_frame.architecture().registers("general"):
+            val = pending_frame.read_register(reg)
+            unwinder.add_saved_register(reg, val)
+        return unwinder
+
+
+gdb.unwinder.register_unwinder(None, TestUnwinder(), True)
+
+# When loaded, it is expected that the stack looks like:
+#
+#   main -> bar -> foo -> bar -> foo -> bar -> foo
+#
+# Compute the stack frame size of bar, which has foo inlined within
+# it.
+f0 = gdb.newest_frame()
+f1 = f0.older()
+f2 = f1.older()
+f0_sp = f0.read_register("sp")
+f2_sp = f2.read_register("sp")
+stack_adjust = f2_sp - f0_sp