[01/14] Fix latent bug in dwarf2_find_containing_comp_unit

Message ID 20200215165444.32653-2-tom@tromey.com
State New
Headers show
Series
  • Share DWARF partial symtabs between objfiles
Related show

Commit Message

Tom Tromey Feb. 15, 2020, 4:54 p.m.
dwarf2_find_containing_comp_unit has this in its binary search:

      if (mid_cu->is_dwz > offset_in_dwz
	  || (mid_cu->is_dwz == offset_in_dwz
	      && mid_cu->sect_off + mid_cu->length >= sect_off))
	high = mid;

The intent here is to determine whether SECT_OFF appears in or before
MID_CU.

I believe this has an off-by-one error, and that the check should use
">" rather than ">=".  If the two side are equal, then SECT_OFF
actually appears at the start of the next CU.

I've had this patch kicking around for ages but I forget how I found
the problem.

gdb/ChangeLog
2020-02-15  Tom Tromey  <tom@tromey.com>

	* dwarf2/read.c (dwarf2_find_containing_comp_unit): Use ">", not
	">=", in binary search.
---
 gdb/ChangeLog     | 5 +++++
 gdb/dwarf2/read.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

-- 
2.17.2

Comments

Simon Marchi Feb. 19, 2020, 3:42 a.m. | #1
On 2020-02-15 11:54 a.m., Tom Tromey wrote:
> dwarf2_find_containing_comp_unit has this in its binary search:

> 

>       if (mid_cu->is_dwz > offset_in_dwz

> 	  || (mid_cu->is_dwz == offset_in_dwz

> 	      && mid_cu->sect_off + mid_cu->length >= sect_off))

> 	high = mid;

> 

> The intent here is to determine whether SECT_OFF appears in or before

> MID_CU.

> 

> I believe this has an off-by-one error, and that the check should use

> ">" rather than ">=".  If the two side are equal, then SECT_OFF

> actually appears at the start of the next CU.

> 

> I've had this patch kicking around for ages but I forget how I found

> the problem.

> 

> gdb/ChangeLog

> 2020-02-15  Tom Tromey  <tom@tromey.com>

> 

> 	* dwarf2/read.c (dwarf2_find_containing_comp_unit): Use ">", not

> 	">=", in binary search.

> ---

>  gdb/ChangeLog     | 5 +++++

>  gdb/dwarf2/read.c | 2 +-

>  2 files changed, 6 insertions(+), 1 deletion(-)

> 

> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c

> index e74383e01dc..e373cc0af2c 100644

> --- a/gdb/dwarf2/read.c

> +++ b/gdb/dwarf2/read.c

> @@ -24171,7 +24171,7 @@ dwarf2_find_containing_comp_unit (sect_offset sect_off,

>        mid_cu = dwarf2_per_objfile->all_comp_units[mid];

>        if (mid_cu->is_dwz > offset_in_dwz

>  	  || (mid_cu->is_dwz == offset_in_dwz

> -	      && mid_cu->sect_off + mid_cu->length >= sect_off))

> +	      && mid_cu->sect_off + mid_cu->length > sect_off))

>  	high = mid;

>        else

>  	low = mid + 1;


I can only convince myself of these things by plugging in real numbers.  So let's say
mid_cu's range is [100,150[, so 150 bytes long, and we are looking for sect_off == 150.
150 is outside (just after) mid_cu's range, so the right answer will be the cu just
after this one.

Without your change, we would have gone in the "if", and therefore excluded the right
answer.  With your change, we would have gone in the "else", and therefore chosen the
range with the right answer.  So that looks correct to me.

I'm going to ask if you thought about a way to test this.  I don't think writing a typical
test case for this is feasible.  By refactoring the code a bit, we could maybe factor out the
meat of this function into one that operates on an std::vector<dwarf2_per_cu_data> instead
of on a dwarf2_per_objfile.  It should then be feasible to create an std::vector with
dwarf2_per_cu_data elements out of thin air to unit-test the function, including edge
cases like this.

Also, do you understand what's the logic with this dwz stuff?  Is it that all the CUs
coming from a dwz file are at the end of the list, or something like that?

Simon
Tom Tromey Feb. 19, 2020, 2:08 p.m. | #2
>>>>> "Simon" == Simon Marchi <simark@simark.ca> writes:


Simon> I'm going to ask if you thought about a way to test this.  I
Simon> don't think writing a typical test case for this is feasible.

Yeah.  And like I said, I don't remember how I encountered this.
Maybe only with some other dubious patch in place.

Simon>  By refactoring the code a bit, we could maybe factor out the
Simon> meat of this function into one that operates on an std::vector<dwarf2_per_cu_data> instead
Simon> of on a dwarf2_per_objfile.  It should then be feasible to create an std::vector with
Simon> dwarf2_per_cu_data elements out of thin air to unit-test the function, including edge
Simon> cases like this.

Another idea that occurred to me is that we could just use
std::binary_search here.  Then instead of implementing a binary search,
we'd only need to implement a comparison function -- which seems
simpler.

Anyway I will try to write a unit test, that's a good idea.

Simon> Also, do you understand what's the logic with this dwz stuff?  Is it that all the CUs
Simon> coming from a dwz file are at the end of the list, or something like that?

Yes, exactly.  Maybe this can be cleaned up a bit after this series,
since now we have the index directly in the dwarf2_per_cu_data.  I am
not sure offhand.

Tom
Tom Tromey Feb. 20, 2020, 12:11 a.m. | #3
Simon> By refactoring the code a bit, we could maybe factor out the meat
Simon> of this function into one that operates on an
Simon> std::vector<dwarf2_per_cu_data> instead of on a
Simon> dwarf2_per_objfile.  It should then be feasible to create an
Simon> std::vector with dwarf2_per_cu_data elements out of thin air to
Simon> unit-test the function, including edge cases like this.

Here is a new version of this patch.
I propose landing it separately, because it is just a straightforward
latent bug fix.

Simon> Also, do you understand what's the logic with this dwz stuff?  Is it that all the CUs
Simon> coming from a dwz file are at the end of the list, or something like that?

Tom> Yes, exactly.

To expand on that, in multi-file mode, dwz may create a combined, shared
file of partial symtabs.  When reading the DWARF for the main file, gdb
may find it needs the dwz file; and in this case the additional CUs are
put into the same vector, at the end.  It's done this way so that we
can easily find the correct CU (since the section offsets between the
main and shared dwz file will overlap).

It would probably be better to try to share these CUs across objfiles.
That may be somewhat involved given the current code, though.

Tom
Tom Tromey Feb. 20, 2020, 12:12 a.m. | #4
Simon> By refactoring the code a bit, we could maybe factor out the
Simon> meat of this function into one that operates on an std::vector<dwarf2_per_cu_data> instead
Simon> of on a dwarf2_per_objfile.  It should then be feasible to create an std::vector with
Simon> dwarf2_per_cu_data elements out of thin air to unit-test the function, including edge
Simon> cases like this.

Oops, I meant to attach the patch in that last email.

Let me know what you think.

Tom

commit 5d02b8ad013451b03e004ef1e71c4636b43252c2
Author: Tom Tromey <tom@tromey.com>
Date:   Sat Feb 15 09:08:09 2020 -0700

    Fix latent bug in dwarf2_find_containing_comp_unit
    
    dwarf2_find_containing_comp_unit has this in its binary search:
    
          if (mid_cu->is_dwz > offset_in_dwz
              || (mid_cu->is_dwz == offset_in_dwz
                  && mid_cu->sect_off + mid_cu->length >= sect_off))
            high = mid;
    
    The intent here is to determine whether SECT_OFF appears in or before
    MID_CU.
    
    I believe this has an off-by-one error, and that the check should use
    ">" rather than ">=".  If the two side are equal, then SECT_OFF
    actually appears at the start of the next CU.
    
    I've had this patch kicking around for ages but I forget how I found
    the problem.
    
    gdb/ChangeLog
    2020-02-19  Tom Tromey  <tom@tromey.com>
    
            * dwarf2/read.c (dwarf2_find_containing_comp_unit): Use ">", not
            ">=", in binary search.
            (dwarf2_find_containing_comp_unit): New overload.
            (run_test): New self-test.
            (_initialize_dwarf2_read): Register new test.

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index b4a586c333a..b07c9156e28 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,11 @@
+2020-02-19  Tom Tromey  <tom@tromey.com>
+
+	* dwarf2/read.c (dwarf2_find_containing_comp_unit): Use ">", not
+	">=", in binary search.
+	(dwarf2_find_containing_comp_unit): New overload.
+	(run_test): New self-test.
+	(_initialize_dwarf2_read): Register new test.
+
 2020-02-19  Simon Marchi  <simon.marchi@efficios.com>
 
 	* dwarf2/read.c (allocate_signatured_type_table,
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index 4d767a59af7..f998fe6b8d0 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -24136,34 +24136,53 @@ dwarf2_per_cu_data::addr_type () const
   return addr_type;
 }
 
-/* Locate the .debug_info compilation unit from CU's objfile which contains
-   the DIE at OFFSET.  Raises an error on failure.  */
+/* A helper function for dwarf2_find_containing_comp_unit that returns
+   the index of the result, and that searches a vector.  It will
+   return a result even if the offset in question does not actually
+   occur in any CU.  This is separate so that it can be unit
+   tested.  */
 
-static struct dwarf2_per_cu_data *
-dwarf2_find_containing_comp_unit (sect_offset sect_off,
-				  unsigned int offset_in_dwz,
-				  struct dwarf2_per_objfile *dwarf2_per_objfile)
+static int
+dwarf2_find_containing_comp_unit
+  (sect_offset sect_off,
+   unsigned int offset_in_dwz,
+   const std::vector<dwarf2_per_cu_data *> &all_comp_units)
 {
-  struct dwarf2_per_cu_data *this_cu;
   int low, high;
 
   low = 0;
-  high = dwarf2_per_objfile->all_comp_units.size () - 1;
+  high = all_comp_units.size () - 1;
   while (high > low)
     {
       struct dwarf2_per_cu_data *mid_cu;
       int mid = low + (high - low) / 2;
 
-      mid_cu = dwarf2_per_objfile->all_comp_units[mid];
+      mid_cu = all_comp_units[mid];
       if (mid_cu->is_dwz > offset_in_dwz
 	  || (mid_cu->is_dwz == offset_in_dwz
-	      && mid_cu->sect_off + mid_cu->length >= sect_off))
+	      && mid_cu->sect_off + mid_cu->length > sect_off))
 	high = mid;
       else
 	low = mid + 1;
     }
   gdb_assert (low == high);
-  this_cu = dwarf2_per_objfile->all_comp_units[low];
+  return low;
+}
+
+/* Locate the .debug_info compilation unit from CU's objfile which contains
+   the DIE at OFFSET.  Raises an error on failure.  */
+
+static struct dwarf2_per_cu_data *
+dwarf2_find_containing_comp_unit (sect_offset sect_off,
+				  unsigned int offset_in_dwz,
+				  struct dwarf2_per_objfile *dwarf2_per_objfile)
+{
+  int low
+    = dwarf2_find_containing_comp_unit (sect_off, offset_in_dwz,
+					dwarf2_per_objfile->all_comp_units);
+  struct dwarf2_per_cu_data *this_cu
+    = dwarf2_per_objfile->all_comp_units[low];
+
   if (this_cu->is_dwz != offset_in_dwz || this_cu->sect_off > sect_off)
     {
       if (low == 0 || this_cu->is_dwz != offset_in_dwz)
@@ -24186,6 +24205,57 @@ dwarf2_find_containing_comp_unit (sect_offset sect_off,
     }
 }
 
+#if GDB_SELF_TEST
+
+namespace selftests {
+namespace find_containing_comp_unit {
+
+static void
+run_test ()
+{
+  struct dwarf2_per_cu_data one {};
+  struct dwarf2_per_cu_data two {};
+  struct dwarf2_per_cu_data three {};
+  struct dwarf2_per_cu_data four {};
+
+  one.length = 5;
+  two.sect_off = sect_offset (one.length);
+  two.length = 7;
+
+  three.length = 5;
+  three.is_dwz = 1;
+  four.sect_off = sect_offset (three.length);
+  four.length = 7;
+  four.is_dwz = 1;
+
+  std::vector<dwarf2_per_cu_data *> units;
+  units.push_back (&one);
+  units.push_back (&two);
+  units.push_back (&three);
+  units.push_back (&four);
+
+  int result;
+
+  result = dwarf2_find_containing_comp_unit (sect_offset (0), 0, units);
+  SELF_CHECK (units[result] == &one);
+  result = dwarf2_find_containing_comp_unit (sect_offset (3), 0, units);
+  SELF_CHECK (units[result] == &one);
+  result = dwarf2_find_containing_comp_unit (sect_offset (5), 0, units);
+  SELF_CHECK (units[result] == &two);
+
+  result = dwarf2_find_containing_comp_unit (sect_offset (0), 1, units);
+  SELF_CHECK (units[result] == &three);
+  result = dwarf2_find_containing_comp_unit (sect_offset (3), 1, units);
+  SELF_CHECK (units[result] == &three);
+  result = dwarf2_find_containing_comp_unit (sect_offset (5), 1, units);
+  SELF_CHECK (units[result] == &four);
+}
+
+}
+}
+
+#endif /* GDB_SELF_TEST */
+
 /* Initialize dwarf2_cu CU, owned by PER_CU.  */
 
 dwarf2_cu::dwarf2_cu (struct dwarf2_per_cu_data *per_cu_)
@@ -24690,5 +24760,7 @@ Warning: This option must be enabled before gdb reads the file."),
 #if GDB_SELF_TEST
   selftests::register_test ("dw2_expand_symtabs_matching",
 			    selftests::dw2_expand_symtabs_matching::run_test);
+  selftests::register_test ("dwarf2_find_containing_comp_unit",
+			    selftests::find_containing_comp_unit::run_test);
 #endif
 }
Simon Marchi Feb. 20, 2020, 3:44 p.m. | #5
On 2020-02-19 7:12 p.m., Tom Tromey wrote:
> Simon> By refactoring the code a bit, we could maybe factor out the

> Simon> meat of this function into one that operates on an std::vector<dwarf2_per_cu_data> instead

> Simon> of on a dwarf2_per_objfile.  It should then be feasible to create an std::vector with

> Simon> dwarf2_per_cu_data elements out of thin air to unit-test the function, including edge

> Simon> cases like this.

> 

> Oops, I meant to attach the patch in that last email.

> 

> Let me know what you think.

> 

> Tom


Thanks, that LGTM.  If you want to use std::binary_search, that would be fine
as well.

Simon
Tom Tromey Feb. 20, 2020, 4:49 p.m. | #6
>>>>> "Simon" == Simon Marchi <simark@simark.ca> writes:


Simon> Thanks, that LGTM.  If you want to use std::binary_search, that would be fine
Simon> as well.

I tried this but it wasn't notably simpler and it meant changes to the
error handling, so I dropped it.

Tom
Simon Marchi via Gdb-patches March 7, 2020, 7:12 p.m. | #7
Re std::binary_search, we do have GDB::binary_search which is more
useful/easier to use.

On Thu, Feb 20, 2020, 10:50 Tom Tromey <tom@tromey.com> wrote:

> >>>>> "Simon" == Simon Marchi <simark@simark.ca> writes:

>

> Simon> Thanks, that LGTM.  If you want to use std::binary_search, that

> would be fine

> Simon> as well.

>

> I tried this but it wasn't notably simpler and it meant changes to the

> error handling, so I dropped it.

>

> Tom

>

Patch

diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index e74383e01dc..e373cc0af2c 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -24171,7 +24171,7 @@  dwarf2_find_containing_comp_unit (sect_offset sect_off,
       mid_cu = dwarf2_per_objfile->all_comp_units[mid];
       if (mid_cu->is_dwz > offset_in_dwz
 	  || (mid_cu->is_dwz == offset_in_dwz
-	      && mid_cu->sect_off + mid_cu->length >= sect_off))
+	      && mid_cu->sect_off + mid_cu->length > sect_off))
 	high = mid;
       else
 	low = mid + 1;