bfd/riscv: tighten matching rules in riscv_scan

Message ID 20200623224800.2281105-1-andrew.burgess@embecosm.com
State New
Headers show
Series
  • bfd/riscv: tighten matching rules in riscv_scan
Related show

Commit Message

Andrew Burgess June 23, 2020, 10:48 p.m.
The following GDB behaviour was observed:

  (gdb) x/1i 0x0001014a
     0x1014a <main+8>:	jal	0x10132 <foo>
  (gdb) show architecture
  The target architecture is set automatically (currently riscv:rv32)
  (gdb) set architecture riscv:rv32
  The target architecture is assumed to be riscv:rv32
  (gdb) x/1i 0x0001014a
     0x1014a <main+8>:	0x37e5
  (gdb)

Notice that initially we can disassemble the instruction (it's a
compressed jal instruction), but after setting the architecture we can
no longer disassemble the instruction.

This is particularly puzzling as GDB initially thought the
architecture was 'riscv:rv32', but when we force the architecture to
be that, the disassembly stops working.

This issue was introduced with this commit:

  commit c35d018b1a5ec604e49a807402c4205530b25ca8
  Date:   Mon Jan 27 15:19:30 2020 -0800

      RISC-V: Fix gdbserver problem with handling arch strings.

In this commit we try to make riscv_scan handle cases where we see
architecture strings like 'riscv:rv32imc' (for example).  Normally
this wouldn't match as bfd_default_scan requires an exact match, so we
extended riscv_scan to ignore trailing characters.

Unfortunately the default riscv arch is called 'riscv', is 64-bit,
and has its mach type set to 0, which I think is intended to pair with
code is riscv-dis.c:riscv_disassemble_insn that tries to guess if we
are 32 or 64 bit.

What happens then is that 'riscv:rv32' is first tested against 'riscv'
using bfd_default_scan, this doesn't match, we then compare this to
'riscv', but allowing trailing characters to be ignored, this matches,
and our 'riscv:rv32' matches against the default (64-bit)
architecture.

The solution I propose is to prevent the default architecture from
taking part in this "ignore trailing characters" extra match case,
only the more specific 'riscv:rv32' and 'riscv:rv64' get this extra
matching.

bfd/ChangeLog:

	* cpu-riscv.c (bfd_riscv_arch): Add declaration.
	(riscv_scan): Don't allow shorter matches using the default
	architecture.
---
 bfd/ChangeLog   |  6 ++++++
 bfd/cpu-riscv.c | 21 +++++++++++++++++----
 2 files changed, 23 insertions(+), 4 deletions(-)

-- 
2.25.4

Comments

Nelson Chu June 24, 2020, 3:56 a.m. | #1
Hi Andrew,

LGTM.  FYI, just a minor suggestion as follows.  However, it would be
better to get Jim's approval, too :)

On Wed, Jun 24, 2020 at 6:48 AM Andrew Burgess
<andrew.burgess@embecosm.com> wrote:
> +  if (info != &bfd_riscv_arch

> +      && strncasecmp (string, info->printable_name,

> +                      strlen (info->printable_name)) == 0)

>      return TRUE;

>

>    return FALSE;


It seems that the default machine number is zero, and it will cause
the problem mentioned above.  I would suggest check the machine number
or the default boolean directly, that is,

+  if (info->mach != 0
+      && strncasecmp (string, info->printable_name,
+                      strlen (info->printable_name)) == 0)

or

+  if (!info->default
+      && strncasecmp (string, info->printable_name,
+                      strlen (info->printable_name)) == 0)

Then you don't need to add the forward declaration for bfd_riscv_arch before.

However, I also see the bfd/cpu-arm.c, their scan function seems good.
Maybe we can refer their implementation as follows,

/* First test for an exact match.  */
  if (strcasecmp (string, info->printable_name) == 0)
    return TRUE;
...
/* Finally check for the default architecture.  */
  if (strcasecmp (string, "arm") == 0)
    return info->the_default;


Thanks
Nelson
Andreas Schwab June 24, 2020, 6:09 a.m. | #2
On Jun 23 2020, Andrew Burgess wrote:

> The solution I propose is to prevent the default architecture from

> taking part in this "ignore trailing characters" extra match case,

> only the more specific 'riscv:rv32' and 'riscv:rv64' get this extra

> matching.


How about rejecting the match if a colon follows?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
Andrew Burgess June 24, 2020, 8:03 a.m. | #3
* Andreas Schwab <schwab@linux-m68k.org> [2020-06-24 08:09:51 +0200]:

> On Jun 23 2020, Andrew Burgess wrote:

> 

> > The solution I propose is to prevent the default architecture from

> > taking part in this "ignore trailing characters" extra match case,

> > only the more specific 'riscv:rv32' and 'riscv:rv64' get this extra

> > matching.

> 

> How about rejecting the match if a colon follows?


I assume you mean something like this (untested) code:

  if (strncasecmp (string, info->printable_name,
                   strlen (info->printable_name)) == 0
      && string[strlen (info->printable_name)] != ':')
    return TRUE;

I prefer my original solution over this as this would mean that the
string 'riscvBLAH' would match again the 'riscv' architecture.

My assumption that 'riscv' is supposed to be a generic RISC-V
architecture, with smart detection of 32/64 bit, and include as many
non-conflicting core extensions as possible, so I'm not sure having
extension characters after the 'riscv' string makes sense.

At the end of the day though, the above would resolve my immediate
problem, so if you feel strongly that it's a better solution I'm
willing to go with that.

Thanks,
Andrew


> 

> Andreas.

> 

> -- 

> Andreas Schwab, schwab@linux-m68k.org

> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1

> "And now for something completely different."
Andrew Burgess June 24, 2020, 8:17 a.m. | #4
* Nelson Chu <nelson.chu@sifive.com> [2020-06-24 11:56:12 +0800]:

> Hi Andrew,

> 

> LGTM.  FYI, just a minor suggestion as follows.  However, it would be

> better to get Jim's approval, too :)

> 

> On Wed, Jun 24, 2020 at 6:48 AM Andrew Burgess

> <andrew.burgess@embecosm.com> wrote:

> > +  if (info != &bfd_riscv_arch

> > +      && strncasecmp (string, info->printable_name,

> > +                      strlen (info->printable_name)) == 0)

> >      return TRUE;

> >

> >    return FALSE;

> 

> It seems that the default machine number is zero, and it will cause

> the problem mentioned above.  I would suggest check the machine number

> or the default boolean directly, that is,

> 

> +  if (info->mach != 0

> +      && strncasecmp (string, info->printable_name,

> +                      strlen (info->printable_name)) == 0)

> 

> or

> 

> +  if (!info->default

> +      && strncasecmp (string, info->printable_name,

> +                      strlen (info->printable_name)) == 0)


I assume you mean '!info->the_default' here.  I like this more than my
solution, and I prefer this over the ARM code.

If Andreas and Jim approve then I'd go with this and drop the forward
declaration.  An updated patch is included below.

Thanks,
Andrew

---

commit fa3009badfd41c68cb88f5f666bdfa271558352c
Author: Andrew Burgess <andrew.burgess@embecosm.com>
Date:   Tue Jun 23 23:22:43 2020 +0100

    bfd/riscv: tighten matching rules in riscv_scan
    
    The following GDB behaviour was observed:
    
      (gdb) x/1i 0x0001014a
         0x1014a <main+8>:  jal     0x10132 <foo>
      (gdb) show architecture
      The target architecture is set automatically (currently riscv:rv32)
      (gdb) set architecture riscv:rv32
      The target architecture is assumed to be riscv:rv32
      (gdb) x/1i 0x0001014a
         0x1014a <main+8>:  0x37e5
      (gdb)
    
    Notice that initially we can disassemble the instruction (it's a
    compressed jal instruction), but after setting the architecture we can
    no longer disassemble the instruction.
    
    This is particularly puzzling as GDB initially thought the
    architecture was 'riscv:rv32', but when we force the architecture to
    be that, the disassembly stops working.
    
    This issue was introduced with this commit:
    
      commit c35d018b1a5ec604e49a807402c4205530b25ca8
      Date:   Mon Jan 27 15:19:30 2020 -0800
    
          RISC-V: Fix gdbserver problem with handling arch strings.
    
    In this commit we try to make riscv_scan handle cases where we see
    architecture strings like 'riscv:rv32imc' (for example).  Normally
    this wouldn't match as bfd_default_scan requires an exact match, so we
    extended riscv_scan to ignore trailing characters.
    
    Unfortunately the default riscv arch is called 'riscv', is 64-bit,
    and has its mach type set to 0, which I think is intended to pair with
    code is riscv-dis.c:riscv_disassemble_insn that tries to guess if we
    are 32 or 64 bit.
    
    What happens then is that 'riscv:rv32' is first tested against 'riscv'
    using bfd_default_scan, this doesn't match, we then compare this to
    'riscv', but allowing trailing characters to be ignored, this matches,
    and our 'riscv:rv32' matches against the default (64-bit)
    architecture.
    
    The solution I propose is to prevent the default architecture from
    taking part in this "ignore trailing characters" extra match case,
    only the more specific 'riscv:rv32' and 'riscv:rv64' get this extra
    matching.
    
    bfd/ChangeLog:
    
            * cpu-riscv.c (riscv_scan): Don't allow shorter matches using the
            default architecture.

diff --git a/bfd/cpu-riscv.c b/bfd/cpu-riscv.c
index b5c972ff4dc..22067ab29be 100644
--- a/bfd/cpu-riscv.c
+++ b/bfd/cpu-riscv.c
@@ -47,10 +47,20 @@ riscv_scan (const struct bfd_arch_info *info, const char *string)
   if (bfd_default_scan (info, string))
     return TRUE;
 
-  /* The string might have extra characters for supported subsets.  So allow
-     a match that ignores trailing characters in string.  */
-  if (strncasecmp (string, info->printable_name,
-		   strlen (info->printable_name)) == 0)
+  /* The incoming STRING might take the form of riscv:rvXXzzz, where XX is
+     32 or 64, and zzz are one or more extension characters.  As we
+     currently only have 3 architectures defined, 'riscv', 'riscv:rv32',
+     and 'riscv:rv64', we would like to ignore the zzz for the purpose of
+     matching here.
+
+     However, we don't want the default 'riscv' to match over a more
+     specific 'riscv:rv32' or 'riscv:rv64', so in the case of the default
+     architecture (with the shorter 'riscv' name) we don't allow any
+     special matching, but for the 'riscv:rvXX' cases, we allow a match
+     with any additional trailing characters being ignored.  */
+  if (!info->the_default
+      && strncasecmp (string, info->printable_name,
+                      strlen (info->printable_name)) == 0)
     return TRUE;
 
   return FALSE;
Andreas Schwab June 24, 2020, 8:29 a.m. | #5
It was just a random suggestion.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
Jim Wilson June 24, 2020, 5:54 p.m. | #6
On Wed, Jun 24, 2020 at 1:17 AM Andrew Burgess
<andrew.burgess@embecosm.com> wrote:
> If Andreas and Jim approve then I'd go with this and drop the forward

> declaration.  An updated patch is included below.


Yes, this looks good to me also.

Jim
Andrew Burgess June 24, 2020, 6:16 p.m. | #7
* Jim Wilson <jimw@sifive.com> [2020-06-24 10:54:25 -0700]:

> On Wed, Jun 24, 2020 at 1:17 AM Andrew Burgess

> <andrew.burgess@embecosm.com> wrote:

> > If Andreas and Jim approve then I'd go with this and drop the forward

> > declaration.  An updated patch is included below.

> 

> Yes, this looks good to me also.


I've gone ahead and pushed this patch now.

Thanks,
Andrew

Patch

diff --git a/bfd/cpu-riscv.c b/bfd/cpu-riscv.c
index b5c972ff4dc..a0e2d85e222 100644
--- a/bfd/cpu-riscv.c
+++ b/bfd/cpu-riscv.c
@@ -39,6 +39,9 @@  riscv_compatible (const bfd_arch_info_type *a, const bfd_arch_info_type *b)
   return a;
 }
 
+/* Forward declaration of this global.  */
+extern const bfd_arch_info_type bfd_riscv_arch;
+
 /* Return TRUE if STRING matches the architecture described by INFO.  */
 
 static bfd_boolean
@@ -47,10 +50,20 @@  riscv_scan (const struct bfd_arch_info *info, const char *string)
   if (bfd_default_scan (info, string))
     return TRUE;
 
-  /* The string might have extra characters for supported subsets.  So allow
-     a match that ignores trailing characters in string.  */
-  if (strncasecmp (string, info->printable_name,
-		   strlen (info->printable_name)) == 0)
+  /* The incoming STRING might take the form of riscv:rvXXzzz, where XX is
+     32 or 64, and zzz are one or more extension characters.  As we
+     currently only have 3 architectures defined, 'riscv', 'riscv:rv32',
+     and 'riscv:rv64', we would like to ignore the zzz for the purpose of
+     matching here.
+
+     However, we don't want the default 'riscv' to match over a more
+     specific 'riscv:rv32' or 'riscv:rv64', so in the case of the default
+     architecture (with the shorter 'riscv' name) we don't allow any
+     special matching, but for the 'riscv:rvXX' cases, we allow a match
+     with any additional trailing characters being ignored.  */
+  if (info != &bfd_riscv_arch
+      && strncasecmp (string, info->printable_name,
+                      strlen (info->printable_name)) == 0)
     return TRUE;
 
   return FALSE;