[RISC-V] Add support for AddressSanitizer on RISC-V GCC

Message ID 20200730123006.104-1-cooper.joshua@linux.alibaba.com
State New
Headers show
Series
  • [RISC-V] Add support for AddressSanitizer on RISC-V GCC
Related show

Commit Message

will schmidt via Gcc-patches July 30, 2020, 12:30 p.m.
From: cooper.joshua <cooper.joshua@linux.alibaba.com>



    gcc/

    

        * config/riscv/riscv.c (asan_shadow_offset): Implement the offset of asan shadow memory for risc-v.

        (asan_shadow_offset): new macro definition.



    libsanitizer/

    

        * sanitizer_common/sanitizer_common.h (ModuleArch): New enumerator.

        (ModuleArchToString): New architecture option.

        * sanitizer_common/sanitizer_platform.h: New macro definition.

        * sanitizer_common/sanitizer_symbolizer_libcdep.cpp (GetArgV): New architecture option.

---

 gcc/config/riscv/riscv.c                                      | 11 +++++++++++

 libsanitizer/sanitizer_common/sanitizer_common.h              |  5 ++++-

 libsanitizer/sanitizer_common/sanitizer_platform.h            |  6 ++++++

 .../sanitizer_common/sanitizer_symbolizer_libcdep.cpp         |  2 ++

 4 files changed, 23 insertions(+), 1 deletion(-)



-- 

2.7.4

Comments

Martin Liška July 30, 2020, 1:28 p.m. | #1
Hello.

What's the reason for sending the same patch multiple times
from a different sender?

Thanks,
Martin
Jim Wilson Aug. 5, 2020, 12:09 a.m. | #2
On Thu, Jul 30, 2020 at 6:28 AM Martin Liška <mliska@suse.cz> wrote:
> What's the reason for sending the same patch multiple times

> from a different sender?


I see 3 in the gcc.gnu.org email archive, and I saw 3 on the NNTP feed
from gmane, but it seems only one of them ended up in my gmail inbox.
The other two appear to have problems with mail headers.  Maybe they
resent because of bounces.  Alibaba is fairly new to gcc development,
so I'd just chalk this up as newbies trying to get the procedure right
with tools that they aren't familiar with.  Few people still use email
the same way that we do for patches.

I am curious about the names though.  The first one was from "shaj
<shajun.sj@alibaba-inc.com>" and the last two are from "cooper.joshua
<cooper.joshua@linux.alibaba.com>".  If more than one person
contributed to the patch then it should include all names in the
changelog entry for correct attribution.

Jim
Jim Wilson Aug. 5, 2020, 12:34 a.m. | #3
On Thu, Jul 30, 2020 at 5:31 AM Joshua via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
> +/* Implement TARGET_ASAN_SHADOW_OFFSET.  */

> +

> +static unsigned HOST_WIDE_INT

> +riscv_asan_shadow_offset (void)

> +{

> +  return HOST_WIDE_INT_UC (0x10000000);

> +}


Is there a reason why you used 0x10000000?

Looking at other targets, it appears the convention is 1<<29 for
32-bit targets, and a number larger than 1<<32 for 64-bit targets.  I
think the RISC-V Linux port has a minimum of 39-bit virtual addresses
(SV39) suggesting that this should be 1<<36 for 64-bit targets.  I can
test the 32-bit support on qemu, and the 64-bit support on hardware,
but my hardware is doing other stuff today.  I should be able to try
testing this tomorrow.

Otherwise the gcc stuff is pretty simple and looks OK.  We just need
to double check these numbers.

> diff --git a/libsanitizer/sanitizer_common/sanitizer_common.h b/libsanitizer/sanitizer_common/sanitizer_common.h

> index ac16e0e..ea7dff7 100644

> --- a/libsanitizer/sanitizer_common/sanitizer_common.h

> +++ b/libsanitizer/sanitizer_common/sanitizer_common.h

> @@ -649,7 +649,8 @@ enum ModuleArch {

>    kModuleArchARMV7,

>    kModuleArchARMV7S,

>    kModuleArchARMV7K,

> -  kModuleArchARM64

> +  kModuleArchARM64,

> +  kModuleArchRISCV

>  };

>


Libsanitizer patches should go upstream and then be pulled down into
gcc.  I haven't done libsanitizer work before so I'm not sure of the
exact details.  I would expect to find the info here:
    https://gcc.gnu.org/codingconventions.html#upstream
but it doesn't mention libsanitizer.  I found the rules in the
libsanitizer/README.gcc and HOWTO_MERGE files.  As expected it says to
submit upstream first.

You are adding a SANITIZER_RISCV macro but not using it.  It isn't
clear why you need this, unless maybe it is just for completeness.  it
looks harmless though, and might be useful later.  This is something
for upstream reviewers to decide though.

In sanitizer_common.h I see a comment
// When adding a new architecture, don't forget to also update
// script/asan_symbolize.py and sanitizer_symbolizer_libcdep.cpp.
but I don't see any script/asan_symbolize.py file.  So maybe the
comment should be fixed.  Or if there is a file in the llvm tree that
we don't import into gcc, then you will need a patch for it.  if the
comment is wrong, then there is a similar comment in
sanitizer_symbolizer_libcdep.cpp that needs to be fixed too.  If the
file is gone, the comment fix can be a separate patch.

Otherwise this stuff looks pretty simple and obvious but it needs to
be submitted upstream.

Jim
will schmidt via Gcc-patches Aug. 5, 2020, 3:54 a.m. | #4
Hi Joshua, Jim:

> > +/* Implement TARGET_ASAN_SHADOW_OFFSET.  */

> > +

> > +static unsigned HOST_WIDE_INT

> > +riscv_asan_shadow_offset (void)

> > +{

> > +  return HOST_WIDE_INT_UC (0x10000000);

> > +}

>

> Is there a reason why you used 0x10000000?

>

> Looking at other targets, it appears the convention is 1<<29 for

> 32-bit targets, and a number larger than 1<<32 for 64-bit targets.  I

> think the RISC-V Linux port has a minimum of 39-bit virtual addresses

> (SV39) suggesting that this should be 1<<36 for 64-bit targets.  I can

> test the 32-bit support on qemu, and the 64-bit support on hardware,

> but my hardware is doing other stuff today.  I should be able to try

> testing this tomorrow.

>

> Otherwise the gcc stuff is pretty simple and looks OK.  We just need

> to double check these numbers.


Default offset is 1ULL << 44 for 64 bit target and 1ULL << 29 for 32
bit target in LLVM[1, 2],
I am not talking about we should use those values, just remind that we
should sync this offset value to LLVM :)

[1] https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp#L96
[2] https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/asan/asan_mapping.h#L159

Patch

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c

index bfb3885..05669c2 100644

--- a/gcc/config/riscv/riscv.c

+++ b/gcc/config/riscv/riscv.c

@@ -5245,6 +5245,14 @@  riscv_gpr_save_operation_p (rtx op)linux.alibaba

   return true;
 }
 
+/* Implement TARGET_ASAN_SHADOW_OFFSET.  */

+

+static unsigned HOST_WIDE_INT

+riscv_asan_shadow_offset (void)

+{

+  return HOST_WIDE_INT_UC (0x10000000);

+}

+

 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -5428,6 +5436,9 @@  riscv_gpr_save_operation_p (rtx op)

 #undef TARGET_NEW_ADDRESS_PROFITABLE_P
 #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p
 
+#undef TARGET_ASAN_SHADOW_OFFSET

+#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset

+

 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-riscv.h"
diff --git a/libsanitizer/sanitizer_common/sanitizer_common.h b/libsanitizer/sanitizer_common/sanitizer_common.h

index ac16e0e..ea7dff7 100644

--- a/libsanitizer/sanitizer_common/sanitizer_common.h

+++ b/libsanitizer/sanitizer_common/sanitizer_common.h

@@ -649,7 +649,8 @@  enum ModuleArch {

   kModuleArchARMV7,
   kModuleArchARMV7S,
   kModuleArchARMV7K,
-  kModuleArchARM64

+  kModuleArchARM64,

+  kModuleArchRISCV

 };
 
 // Opens the file 'file_name" and reads up to 'max_len' bytes.
@@ -693,6 +694,8 @@  inline const char *ModuleArchToString(ModuleArch arch) {

       return "armv7k";
     case kModuleArchARM64:
       return "arm64";
+    case kModuleArchRISCV:

+      return "riscv";

   }
   CHECK(0 && "Invalid module arch");
   return "";
diff --git a/libsanitizer/sanitizer_common/sanitizer_platform.h b/libsanitizer/sanitizer_common/sanitizer_platform.h

index c68bfa2..bf52490 100644

--- a/libsanitizer/sanitizer_common/sanitizer_platform.h

+++ b/libsanitizer/sanitizer_common/sanitizer_platform.h

@@ -126,6 +126,12 @@ 

 # define FIRST_32_SECOND_64(a, b) (a)
 #endif
 
+#if defined(__riscv__)

+# define SANITIZER_RISCV 1

+#else

+# define SANITIZER_RISCV 0

+#endif

+

 #if defined(__x86_64__) && !defined(_LP64)
 # define SANITIZER_X32 1
 #else
diff --git a/libsanitizer/sanitizer_common/sanitizer_symbolizer_libcdep.cpp b/libsanitizer/sanitizer_common/sanitizer_symbolizer_libcdep.cpp

index 490c6fe..408f57d 100644

--- a/libsanitizer/sanitizer_common/sanitizer_symbolizer_libcdep.cpp

+++ b/libsanitizer/sanitizer_common/sanitizer_symbolizer_libcdep.cpp

@@ -270,6 +270,8 @@  class LLVMSymbolizerProcess : public SymbolizerProcess {

     const char* const kSymbolizerArch = "--default-arch=s390x";
 #elif defined(__s390__)
     const char* const kSymbolizerArch = "--default-arch=s390";
+#elif defined(__riscv__)

+    const char* const kSymbolizerArch = "--default-arch=riscv";

 #else
     const char* const kSymbolizerArch = "--default-arch=unknown";
 #endif