[1/2,nvptx,libgomp] Add GOMP_NVPTX_JIT=-O[0-4] in nvptx libgomp plugin

Message ID 6fd1bfa8-621b-206f-1a84-c57589ce8708@mentor.com
State New
Headers show
Series
  • [1/2,nvptx,libgomp] Add GOMP_NVPTX_JIT=-O[0-4] in nvptx libgomp plugin
Related show

Commit Message

Tom de Vries Jan. 24, 2018, 9:58 a.m.
Hi,

The nvptx target PR83589 - "[nvptx] mode-transitions.c and 
private-variables.{c,f90} execution FAILs at GOMP_NVPTX_JIT=-O0" is a 
JIT bug.

I've written a workaround for this JIT bug (the second patch in this 
series).

I've only managed to reproduce the JIT bug at JIT optimization level 
-O0. But given that the JIT is a black box, I have no way of knowing if 
it only can occur at -O0, so I have to assume it can also occur at the 
current JIT optimization level used in libgomp (where we don't set it 
explicity, so we use the default, which supposedly maps onto -O4). So, I 
think we need this workaround in trunk.

But, in order to test the workaround I need a means to run a libgomp 
tests with JIT optimization level -O0.

This patch is a pruned-down and standalone version of "Handle 
GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=<n>} in libgomp nvptx plugin"  ( 
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00172.html ), modified to 
contain just the support for changing the optimization level of the JIT.

Bootstrapped and reg-tested on x86_64.
Build and reg-tested on x86_64 with nvptx accelerator.

I realize this is not really a stage4 patch, so: OK for stage1?

[ I will commit the workaround in stage4. Without this patch, it works 
fine, it's just that the test-case does not function as a regression 
test, given that it will not trigger the JIT bug if you disable the 
workaround. ]

Thanks,
- Tom

Comments

Jakub Jelinek Jan. 24, 2018, 10:03 a.m. | #1
On Wed, Jan 24, 2018 at 10:58:57AM +0100, Tom de Vries wrote:
> I realize this is not really a stage4 patch, so: OK for stage1?


Ok.

	Jakub

Patch

[nvptx, libgomp] Add GOMP_NVPTX_JIT=-O[0-4] in nvptx libgomp plugin

2018-01-24  Tom de Vries  <tom@codesourcery.com>

	* plugin/cuda/cuda.h (CUjit_option): Add CU_JIT_OPTIMIZATION_LEVEL.
	* plugin/plugin-nvptx.c (_GNU_SOURCE): Define.
	(process_GOMP_NVPTX_JIT): New function.
	(link_ptx): Use process_GOMP_NVPTX_JIT.

---
 libgomp/plugin/cuda/cuda.h    |  1 +
 libgomp/plugin/plugin-nvptx.c | 56 ++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h
index edad4c6..4799825 100644
--- a/libgomp/plugin/cuda/cuda.h
+++ b/libgomp/plugin/cuda/cuda.h
@@ -88,6 +88,7 @@  typedef enum {
   CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES = 4,
   CU_JIT_ERROR_LOG_BUFFER = 5,
   CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES = 6,
+  CU_JIT_OPTIMIZATION_LEVEL = 7,
   CU_JIT_LOG_VERBOSE = 12
 } CUjit_option;
 
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 9ae6095..2b875ae 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -31,6 +31,7 @@ 
    is not clear as to what that state might be.  Or how one might
    propagate it from one thread to another.  */
 
+#define _GNU_SOURCE
 #include "openacc.h"
 #include "config.h"
 #include "libgomp-plugin.h"
@@ -138,6 +139,8 @@  init_cuda_lib (void)
 # define init_cuda_lib() true
 #endif
 
+#include "secure_getenv.h"
+
 /* Convenience macros for the frequently used CUDA library call and
    error handling sequence as well as CUDA library calls that
    do the error checking themselves or don't do it at all.  */
@@ -876,12 +879,42 @@  notify_var (const char *var_name, const char *env_var)
     GOMP_PLUGIN_debug (0, "%s: '%s'\n", var_name, env_var);
 }
 
+static void
+process_GOMP_NVPTX_JIT (intptr_t *gomp_nvptx_o)
+{
+  const char *var_name = "GOMP_NVPTX_JIT";
+  const char *env_var = secure_getenv (var_name);
+  notify_var (var_name, env_var);
+
+  if (env_var == NULL)
+    return;
+
+  const char *c = env_var;
+  while (*c != '\0')
+    {
+      while (*c == ' ')
+	c++;
+
+      if (c[0] == '-' && c[1] == 'O'
+	  && '0' <= c[2] && c[2] <= '4'
+	  && (c[3] == '\0' || c[3] == ' '))
+	{
+	  *gomp_nvptx_o = c[2] - '0';
+	  c += 3;
+	  continue;
+	}
+
+      GOMP_PLUGIN_error ("Error parsing %s", var_name);
+      break;
+    }
+}
+
 static bool
 link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
 	  unsigned num_objs)
 {
-  CUjit_option opts[6];
-  void *optvals[6];
+  CUjit_option opts[7];
+  void *optvals[7];
   float elapsed = 0.0;
   char elog[1024];
   char ilog[16384];
@@ -908,7 +941,24 @@  link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
   opts[5] = CU_JIT_LOG_VERBOSE;
   optvals[5] = (void *) 1;
 
-  CUDA_CALL (cuLinkCreate, 6, opts, optvals, &linkstate);
+  static intptr_t gomp_nvptx_o = -1;
+
+  static bool init_done = false;
+  if (!init_done)
+    {
+      process_GOMP_NVPTX_JIT (&gomp_nvptx_o);
+      init_done = true;
+  }
+
+  int nopts = 6;
+  if (gomp_nvptx_o != -1)
+    {
+      opts[nopts] = CU_JIT_OPTIMIZATION_LEVEL;
+      optvals[nopts] = (void *) gomp_nvptx_o;
+      nopts++;
+    }
+
+  CUDA_CALL (cuLinkCreate, nopts, opts, optvals, &linkstate);
 
   for (; num_objs--; ptx_objs++)
     {