[AArch64] Fix AArch64 disassembler mapping symbol search

Message ID 20190307114318.GA11640@arm.com
State New
Headers show
Series
  • [AArch64] Fix AArch64 disassembler mapping symbol search
Related show

Commit Message

Tamar Christina March 7, 2019, 11:43 a.m.
Hi All,

My previous patch for AArch64 was not enough to catch all the cases where
disassembling an out-of-order section could go wrong.  It had missed the case
DATA sections could be incorrectly disassembled as TEXT.

Out of order here refers to an object file where sections are not listed in a
monotonic increasing VMA order.

The ELF ABI for AArch64 [1] specifies the following for mapping symbols:

  1) A text section must always have a corresponding mapping symbol at it's
     start.
  2) Data sections do not require any mapping symbols.
  3) The range of a mapping symbol extends from the address it starts on up to
     the next mapping symbol (exclusive) or section end (inclusive).

However there is no defined order between a symbol and it's corresponding
mapping symbol in the symbol table.  This means that while in general we look
up for a corresponding mapping symbol, we have to make at least one check of
the symbol below the address being disassembled.

When disassembling different PCs within the same section, the search for mapping
symbol can be cached somewhat.  We know that the mapping symbol corresponding to
the current PC is either the previous one used, or one at the same address as
the current PC.

However this optimization and mapping symbol search must stop as soon as we
reach the end or start of the section.  Furthermore if we're only disassembling
a part of a section, the search is a allowed to search further than the current
chunk, but is not allowed to search past it (The mapping symbol if there, must
be at the same address, so in practice we usually stop at PC+4).

lastly, since only data sections don't require a mapping symbol the default
mapping type should be DATA and not INSN as previously defined, however if the
binary has had all its symbols stripped than this isn't very useful.  To fix this
we determine the default based on the section flags.  This will allow the
disassembler to be more useful on stripped binaries.  If there is no section than
we assume you to be disassembling INSN.

[1] https://developer.arm.com/docs/ihi0056/latest/elf-for-the-arm-64-bit-architecture-aarch64-abi-2018q4#aaelf64-section4-5-4

build on native hardware and regtested on
  aarch64-none-elf, aarch64-none-elf (32 bit host),
  aarch64-none-linux-gnu, aarch64-none-linux-gnu (32 bit host)

Cross-compiled and regtested on
  aarch64-none-linux-gnu, aarch64_be-none-linux-gnu

and no issues (checked ld, gas and binutils).

Ok for master? and for backport to binutils-2.32?

Thanks,
Tamar

binutils/ChangeLog:

2019-03-07  Tamar Christina  <tamar.christina@arm.com>

	* testsuite/binutils-all/aarch64/in-order.d: New test.
	* testsuite/binutils-all/aarch64/out-of-order.d: Disassemble data as
	well.

opcodes/ChangeLog:

2019-03-07  Tamar Christina  <tamar.christina@arm.com>

	* aarch64-dis.c (print_insn_aarch64): Update the mapping symbol search
	order.

--

Comments

Nick Clifton March 15, 2019, 9:54 a.m. | #1
Hi Tamar,

> Ok for master? and for backport to binutils-2.32?


Approved for both.

> binutils/ChangeLog:

> 

> 2019-03-07  Tamar Christina  <tamar.christina@arm.com>

> 

> 	* testsuite/binutils-all/aarch64/in-order.d: New test.

> 	* testsuite/binutils-all/aarch64/out-of-order.d: Disassemble data as

> 	well.

> 

> opcodes/ChangeLog:

> 

> 2019-03-07  Tamar Christina  <tamar.christina@arm.com>

> 

> 	* aarch64-dis.c (print_insn_aarch64): Update the mapping symbol search

> 	order.

> 


Cheers
  Nick

Patch

diff --git a/binutils/testsuite/binutils-all/aarch64/in-order.d b/binutils/testsuite/binutils-all/aarch64/in-order.d
new file mode 100644
index 0000000000000000000000000000000000000000..090337f141db270172580b279d7a8e7c2be52913
--- /dev/null
+++ b/binutils/testsuite/binutils-all/aarch64/in-order.d
@@ -0,0 +1,28 @@ 
+#PROG: objcopy
+#source: out-of-order.s
+#ld: -e v1 -Ttext-segment=0x400000
+#objdump: -d
+#name: Check if disassembler can handle sections in default order
+
+.*: +file format .*aarch64.*
+
+Disassembly of section \.func1:
+
+0000000000400000 <v1>:
+  400000:	8b010000 	add	x0, x0, x1
+  400004:	00000000 	\.word	0x00000000
+
+Disassembly of section .func2:
+
+0000000000400008 <\.func2>:
+  400008:	8b010000 	add	x0, x0, x1
+
+Disassembly of section \.func3:
+
+000000000040000c <\.func3>:
+  40000c:	8b010000 	add	x0, x0, x1
+  400010:	8b010000 	add	x0, x0, x1
+  400014:	8b010000 	add	x0, x0, x1
+  400018:	8b010000 	add	x0, x0, x1
+  40001c:	8b010000 	add	x0, x0, x1
+  400020:	00000000 	\.word	0x00000000
diff --git a/binutils/testsuite/binutils-all/aarch64/out-of-order.d b/binutils/testsuite/binutils-all/aarch64/out-of-order.d
index 410f37f68ea21f9e16e2319b5048c123cec99910..f78adec2162b0714f2f461111651f1cc6e561e13 100644
--- a/binutils/testsuite/binutils-all/aarch64/out-of-order.d
+++ b/binutils/testsuite/binutils-all/aarch64/out-of-order.d
@@ -1,10 +1,20 @@ 
 #PROG: objcopy
 #ld: -T out-of-order.T
-#objdump: -d
+#objdump: -D
 #name: Check if disassembler can handle sections in different order than header
 
 .*: +file format .*aarch64.*
 
+Disassembly of section \.global:
+
+00000000ffe00000 <\.global>:
+    ffe00000:	00000001 	\.word	0x00000001
+    ffe00004:	00000000 	\.word	0x00000000
+    ffe00008:	00000001 	\.word	0x00000001
+    ffe0000c:	00000000 	\.word	0x00000000
+    ffe00010:	00000001 	\.word	0x00000001
+    ffe00014:	00000000 	\.word	0x00000000
+
 Disassembly of section \.func2:
 
 0000000004018280 <\.func2>:
@@ -25,3 +35,8 @@  Disassembly of section \.func3:
  401500c:	8b010000 	add	x0, x0, x1
  4015010:	8b010000 	add	x0, x0, x1
  4015014:	00000000 	\.word	0x00000000
+
+Disassembly of section \.rodata:
+
+0000000004015018 <\.rodata>:
+ 4015018:	00000004 	\.word	0x00000004
diff --git a/opcodes/aarch64-dis.c b/opcodes/aarch64-dis.c
index fc7e95d066807f2909d1a1c891d7a89a5f31204b..1f931d09327580ee9d257901655b3dad77b93837 100644
--- a/opcodes/aarch64-dis.c
+++ b/opcodes/aarch64-dis.c
@@ -3318,14 +3318,26 @@  print_insn_aarch64 (bfd_vma pc,
   /* Aarch64 instructions are always little-endian */
   info->endian_code = BFD_ENDIAN_LITTLE;
 
+  /* Default to DATA.  A text section is required by the ABI to contain an
+     INSN mapping symbol at the start.  A data section has no such
+     requirement, hence if no mapping symbol is found the section must
+     contain only data.  This however isn't very useful if the user has
+     fully stripped the binaries.  If this is the case use the section
+     attributes to determine the default.  If we have no section default to
+     INSN as well, as we may be disassembling some raw bytes on a baremetal
+     HEX file or similar.  */
+  enum map_type type = MAP_DATA;
+  if ((info->section && info->section->flags & SEC_CODE) || !info->section)
+    type = MAP_INSN;
+
   /* First check the full symtab for a mapping symbol, even if there
      are no usable non-mapping symbols for this address.  */
   if (info->symtab_size != 0
       && bfd_asymbol_flavour (*info->symtab) == bfd_target_elf_flavour)
     {
-      enum map_type type = MAP_INSN;
       int last_sym = -1;
-      bfd_vma addr;
+      bfd_vma addr, section_vma = 0;
+      bfd_boolean can_use_search_opt_p;
       int n;
 
       if (pc <= last_mapping_addr)
@@ -3334,13 +3346,20 @@  print_insn_aarch64 (bfd_vma pc,
       /* Start scanning at the start of the function, or wherever
 	 we finished last time.  */
       n = info->symtab_pos + 1;
+
       /* If the last stop offset is different from the current one it means we
 	 are disassembling a different glob of bytes.  As such the optimization
 	 would not be safe and we should start over.  */
-      if (n < last_mapping_sym && info->stop_offset == last_stop_offset)
+      can_use_search_opt_p = last_mapping_sym >= 0
+			     && info->stop_offset == last_stop_offset;
+
+      if (n >= last_mapping_sym && can_use_search_opt_p)
 	n = last_mapping_sym;
 
-      /* Scan up to the location being disassembled.  */
+      /* Look down while we haven't passed the location being disassembled.
+	 The reason for this is that there's no defined order between a symbol
+	 and an mapping symbol that may be at the same address.  We may have to
+	 look at least one position ahead.  */
       for (; n < info->symtab_size; n++)
 	{
 	  addr = bfd_asymbol_value (info->symtab[n]);
@@ -3356,13 +3375,24 @@  print_insn_aarch64 (bfd_vma pc,
       if (!found)
 	{
 	  n = info->symtab_pos;
-	  if (n < last_mapping_sym)
+	  if (n >= last_mapping_sym && can_use_search_opt_p)
 	    n = last_mapping_sym;
 
 	  /* No mapping symbol found at this address.  Look backwards
-	     for a preceeding one.  */
+	     for a preceeding one, but don't go pass the section start
+	     otherwise a data section with no mapping symbol can pick up
+	     a text mapping symbol of a preceeding section.  The documentation
+	     says section can be NULL, in which case we will seek up all the
+	     way to the top.  */
+	  if (info->section)
+	    section_vma = info->section->vma;
+
 	  for (; n >= 0; n--)
 	    {
+	      addr = bfd_asymbol_value (info->symtab[n]);
+	      if (addr < section_vma)
+		break;
+
 	      if (get_sym_code_type (info, n, &type))
 		{
 		  last_sym = n;
@@ -3400,6 +3430,8 @@  print_insn_aarch64 (bfd_vma pc,
 	    size = (pc & 1) ? 1 : 2;
 	}
     }
+  else
+    last_type = type;
 
   if (last_type == MAP_DATA)
     {