[3/4] Allow display of negative offsets in print_address_symbolic()

Message ID 20190608195434.26512-4-kevinb@redhat.com
State Superseded
Headers show
Series
  • Non-contiguous address range bug fixes / improvements
Related show

Commit Message

Kevin Buettner June 8, 2019, 7:54 p.m.
When examining addresses associated with blocks with non-contiguous
address ranges, it's not uncommon to see large positive offsets which,
for some address width, actually represent a smaller negative offset.
Here's an example taken from the test case:

    (gdb) x/i foo_cold
       0x40110d <foo+4294967277>:	push   %rbp

This commit causes cases like the above to be displayed like this (below)
instead:

    (gdb) x/i foo_cold
       0x40110d <foo-19>:	push   %rbp

gdb/ChangeLog:

	* printcmd.c (print_address_symbolic): Print negative offsets.
	(build_address_symbolic): Force signed arithmetic when computing
	offset.
---
 gdb/printcmd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.21.0

Comments

Pedro Alves June 21, 2019, 2:45 p.m. | #1
On 6/8/19 8:54 PM, Kevin Buettner wrote:
> When examining addresses associated with blocks with non-contiguous

> address ranges, it's not uncommon to see large positive offsets which,

> for some address width, actually represent a smaller negative offset.

> Here's an example taken from the test case:

> 

>     (gdb) x/i foo_cold

>        0x40110d <foo+4294967277>:	push   %rbp

> 

> This commit causes cases like the above to be displayed like this (below)

> instead:

> 

>     (gdb) x/i foo_cold

>        0x40110d <foo-19>:	push   %rbp

> 

> gdb/ChangeLog:

> 

> 	* printcmd.c (print_address_symbolic): Print negative offsets.

> 	(build_address_symbolic): Force signed arithmetic when computing

> 	offset.


Seems reasonable to me, if we assume that the symbol name to put
within <> is "foo".

This change makes makes me doubt that, though.  We're looking at
the lower level, disassembly code.  I think I'd want to see

  0x40110d <foo_cold+0>:

there?

E.g., I might want to follow up with
disassemble foo_cold.

But the present state of things, I wouldn't be able to see the
foo_cold symbol, where it starts?

Maybe a larger disassemble output including several cold sections
in view would help determine the best output.

Thanks,
Pedro Alves
Kevin Buettner July 3, 2019, 11:09 p.m. | #2
On Fri, 21 Jun 2019 15:45:13 +0100
Pedro Alves <palves@redhat.com> wrote:

> On 6/8/19 8:54 PM, Kevin Buettner wrote:

> > When examining addresses associated with blocks with non-contiguous

> > address ranges, it's not uncommon to see large positive offsets which,

> > for some address width, actually represent a smaller negative offset.

> > Here's an example taken from the test case:

> > 

> >     (gdb) x/i foo_cold

> >        0x40110d <foo+4294967277>:	push   %rbp

> > 

> > This commit causes cases like the above to be displayed like this (below)

> > instead:

> > 

> >     (gdb) x/i foo_cold

> >        0x40110d <foo-19>:	push   %rbp

> > 

> > gdb/ChangeLog:

> > 

> > 	* printcmd.c (print_address_symbolic): Print negative offsets.

> > 	(build_address_symbolic): Force signed arithmetic when computing

> > 	offset.  

> 

> Seems reasonable to me, if we assume that the symbol name to put

> within <> is "foo".

> 

> This change makes makes me doubt that, though.  We're looking at

> the lower level, disassembly code.  I think I'd want to see

> 

>   0x40110d <foo_cold+0>:

> 

> there?

> 

> E.g., I might want to follow up with

> disassemble foo_cold.

> 

> But the present state of things, I wouldn't be able to see the

> foo_cold symbol, where it starts?

> 

> Maybe a larger disassemble output including several cold sections

> in view would help determine the best output.


I've been conducting some experiments with this patch...

diff --git a/gdb/printcmd.c b/gdb/printcmd.c
index 886e4464df..e6599493ff 100644
--- a/gdb/printcmd.c
+++ b/gdb/printcmd.c
@@ -629,9 +629,15 @@ build_address_symbolic (struct gdbarch *gdbarch,
 
   if (msymbol.minsym != NULL)
     {
+#if 1
       if (symbol == NULL
           || (BLOCK_CONTIGUOUS_P (SYMBOL_BLOCK_VALUE (symbol))
 	      && BMSYMBOL_VALUE_ADDRESS (msymbol) > name_location))
+#else
+      if (symbol == NULL
+          || !BLOCK_CONTIGUOUS_P (SYMBOL_BLOCK_VALUE (symbol))
+	  || BMSYMBOL_VALUE_ADDRESS (msymbol) > name_location)
+#endif
 	{
 	  /* If this is a function (i.e. a code address), strip out any
 	     non-address bits.  For instance, display a pointer to the

...applied on top of the other patches in this set.

As shown, GDB will prefer the symtab symbol over the minsym for
display of symbols in disassembled code.  When I change the #if 1
to #if 0, GDB will always prefer the minsym for functions with
non-contiguous ranges.  (Whatever it is that we do, I want the
"lo-cold" and "hi-cold" cases to behave the same.)

Then, using the "lo-cold" executable for the dw2-ranges-func test,
I've been doing the following:

./gdb testsuite/outputs/gdb.dwarf2/dw2-ranges-func/dw2-ranges-func-lo-cold
b 70
b baz
run
set var e=1
c
bt
up
x/5i $pc
disassemble foo
x/i foo_cold
disassemble foo_cold

I don't see anything interesting until we get to the "bt" command,  For "bt", we
see (as expected) the same output for both cases:

(gdb) bt
#0  0x000000000040110a in baz ()
    at worktree-ranges/gdb/testsuite/gdb.dwarf2/dw2-ranges-func-lo-cold.c:48
#1  0x0000000000401116 in foo ()
    at worktree-ranges/gdb/testsuite/gdb.dwarf2/dw2-ranges-func-lo-cold.c:54
#2  0x0000000000401138 in foo ()
    at worktree-ranges/gdb/testsuite/gdb.dwarf2/dw2-ranges-func-lo-cold.c:70
#3  0x0000000000401144 in main ()
    at worktree-ranges/gdb/testsuite/gdb.dwarf2/dw2-ranges-func-lo-cold.c:78

(I've shortened the paths for readability.)

The thing to note here is that the call of foo at frame #1 is actually a
call to foo_cold.  Showing foo_cold in the backtrace is the behavior that
Eli found objectionable.

Likewise, "up" shows the same behavior for both cases:

(gdb) up
#1  0x0000000000401116 in foo ()
    at worktree-ranges/gdb/testsuite/gdb.dwarf2/dw2-ranges-func-lo-cold.c:54
54	  baz ();					/* foo_cold baz call */

"x/5i" shows some differences in output.  I'll show the "prefer symtab
sym" version first, followed by the "prefer minsym" version second:

(gdb) x/5i $pc
=> 0x401116 <foo-10>:	nop
   0x401117 <foo-9>:	pop    %rbp
   0x401118 <foo-8>:	retq   
   0x401119 <bar>:	push   %rbp
   0x40111a <bar+1>:	mov    %rsp,%rbp

   --- versus ---

(gdb) x/5i $pc
=> 0x401116 <foo_cold+9>:	nop
   0x401117 <foo_cold+10>:	pop    %rbp
   0x401118 <foo_cold+11>:	retq   
   0x401119 <bar>:	push   %rbp
   0x40111a <bar+1>:	mov    %rsp,%rbp

The thing to observe in the above output is that offsets from foo are
used in the first case, where as offsets from foo_cold are shown for
the "prefer minsym" version.

Both versions show similar output for the "disassemble foo" command.
Here is the output for the "prefer minsym" version:

(gdb) disassemble foo
Dump of assembler code for function foo:
Address range 0x401120 to 0x40113b:
   0x0000000000401120 <+0>:	push   %rbp
   0x0000000000401121 <+1>:	mov    %rsp,%rbp
   0x0000000000401124 <+4>:	callq  0x401119 <bar>
   0x0000000000401129 <+9>:	mov    0x2ef1(%rip),%eax        # 0x404020 <e>
   0x000000000040112f <+15>:	test   %eax,%eax
   0x0000000000401131 <+17>:	je     0x401138 <foo+24>
   0x0000000000401133 <+19>:	callq  0x40110d <foo_cold>
   0x0000000000401138 <+24>:	nop
   0x0000000000401139 <+25>:	pop    %rbp
   0x000000000040113a <+26>:	retq   
Address range 0x40110d to 0x401119:
   0x000000000040110d <+0>:	push   %rbp
   0x000000000040110e <+1>:	mov    %rsp,%rbp
   0x0000000000401111 <+4>:	callq  0x401106 <baz>
=> 0x0000000000401116 <+9>:	nop
   0x0000000000401117 <+10>:	pop    %rbp
   0x0000000000401118 <+11>:	retq   
End of assembler dump.

The only line where there's a difference is:

   0x0000000000401133 <+19>:	callq  0x40110d <foo-19>

   --- versus ---

   0x0000000000401133 <+19>:	callq  0x40110d <foo_cold>

I think I prefer the negative offset in this case.

"x/i foo_cold" produces different outputs...

(gdb) x/i foo_cold
   0x40110d <foo-19>:	push   %rbp

   --- versus ---

(gdb) x/i foo_cold
   0x40110d <foo_cold>:	push   %rbp

The version that prefers the symtab sym shows foo versus foo_cold for
the version that prefers the minsym sym.

"disassemble foo_cold" shows the same output as "disassemble foo"
above.  I won't show it here since it's the same as what's shown
earlier.  I was sort of surprised that it showed the entire function
(both) ranges, but after thinking about it, this made sense since you
see the entire function when you disassemble some address that's in
the middle of the function.

My thoughts...

When I say "x/i foo_cold", I do think I'd prefer to see <foo_cold> instead
of <foo-19>.

However, when I do "x/5i $pc" after doing "up" from the baz frame, I think
I somewhat prefer seeing foo with negative offsets.

What would you think about this behavior?

(gdb) x/5i foo_cold
   0x40110d <foo_cold>:	push   %rbp
   0x40110e <foo-18>:	mov    %rsp,%rbp
   0x401111 <foo-15>:	callq  0x401106 <baz>
=> 0x401116 <foo-10>:	nop
   0x401117 <foo-9>:	pop    %rbp

I.e. prefer the minsym for offset 0, but use the function symbol for
the non-zero offsets.

Another possibility:

(gdb) x/5i foo_cold
   0x40110d <foo-19> <foo_cold>: push   %rbp
   0x40110e <foo-18>:	mov    %rsp,%rbp
   0x401111 <foo-15>:	callq  0x401106 <baz>
=> 0x401116 <foo-10>:	nop
   0x401117 <foo-9>:	pop    %rbp

I.e, show both the function symbol (plus/minus offset) AND the minsym,
but only show the minsym for the zero offset.

I haven't tried implementing either of these approaches yet, but
I can take a look at it if we have some concensus over what the output
should look like.

Kevin
Kevin Buettner July 4, 2019, 1:05 a.m. | #3
On Wed, 3 Jul 2019 16:09:21 -0700
Kevin Buettner <kevinb@redhat.com> wrote:

> When I say "x/i foo_cold", I do think I'd prefer to see <foo_cold> instead

> of <foo-19>.

> 

> However, when I do "x/5i $pc" after doing "up" from the baz frame, I think

> I somewhat prefer seeing foo with negative offsets.

> 

> What would you think about this behavior?

> 

> (gdb) x/5i foo_cold

>    0x40110d <foo_cold>:	push   %rbp

>    0x40110e <foo-18>:	mov    %rsp,%rbp

>    0x401111 <foo-15>:	callq  0x401106 <baz>

> => 0x401116 <foo-10>:	nop  

>    0x401117 <foo-9>:	pop    %rbp

> 

> I.e. prefer the minsym for offset 0, but use the function symbol for

> the non-zero offsets.


For the v2 version of this series, I've implemented the behavior shown
above.

I (hopefully) provide a good rationale for this behavior in the
commit comment.  (So, if you don't immediately like it, stay tuned
for the v2 patch series.)

> Another possibility:

> 

> (gdb) x/5i foo_cold

>    0x40110d <foo-19> <foo_cold>: push   %rbp

>    0x40110e <foo-18>:	mov    %rsp,%rbp

>    0x401111 <foo-15>:	callq  0x401106 <baz>

> => 0x401116 <foo-10>:	nop  

>    0x401117 <foo-9>:	pop    %rbp

> 

> I.e, show both the function symbol (plus/minus offset) AND the minsym,

> but only show the minsym for the zero offset.

> 

> I haven't tried implementing either of these approaches yet, but

> I can take a look at it if we have some concensus over what the output

> should look like.


I sort of like this one too, but it's harder to implement, causes some
of the lines to be longer, and will also make it less likely that all
of the instructions associated with a given function will line up.

Kevin

Patch

diff --git a/gdb/printcmd.c b/gdb/printcmd.c
index e00a9c671a..8ceddd633a 100644
--- a/gdb/printcmd.c
+++ b/gdb/printcmd.c
@@ -537,7 +537,7 @@  print_address_symbolic (struct gdbarch *gdbarch, CORE_ADDR addr,
     fputs_filtered ("<", stream);
   fputs_styled (name.c_str (), function_name_style.style (), stream);
   if (offset != 0)
-    fprintf_filtered (stream, "+%u", (unsigned int) offset);
+    fprintf_filtered (stream, "%+d", offset);
 
   /* Append source filename and line number if desired.  Give specific
      line # of this addr, if we have it; else line # of the nearest symbol.  */
@@ -666,7 +666,7 @@  build_address_symbolic (struct gdbarch *gdbarch,
       && name_location + max_symbolic_offset > name_location)
     return 1;
 
-  *offset = addr - name_location;
+  *offset = (LONGEST) addr - name_location;
 
   *name = name_temp;