[nvptx] Don't emit barriers for empty loops -- fix

Message ID 6ee3fb54-9204-bf67-e643-7daf6052629f@suse.de
State New
Headers show
Series
  • [nvptx] Don't emit barriers for empty loops -- fix
Related show

Commit Message

Tom de Vries Jan. 7, 2019, 9:01 a.m.
[ was: Re: [nvptx] vector length patch series ]

On 14-12-18 20:58, Tom de Vries wrote:
> 0022-nvptx-openacc-Don-t-emit-barriers-for-empty-loops.patch


Committed without test-case.

Thanks,
- Tom

Patch

[nvptx] Don't emit barriers for empty loops -- fix

When compiling an empty loop:
...
  long long v1;
  #pragma acc parallel num_gangs (640) num_workers(1) vector_length (128)
  #pragma acc loop
    for (v1 = 0; v1 < 20; v1 += 2)
        ;
...
the compiler emits two subsequent bar.syncs.  This triggers some bug on my
quadro m1200 (I'm assuming in the ptxas/JIT compiler) that hangs the testcase.

This patch works around the bug by doing an optimization: we detect that this is
an empty loop (a forked immediately followed by a joining), and don't emit the
barriers.

The patch does not include the test-case yet, since vector_length (128) is not
yet supported at this point.

2018-12-17  Tom de Vries  <tdevries@suse.de>

	PR target/85381
	* config/nvptx/nvptx.c (nvptx_process_pars): Don't emit barriers for
	empty loops.

---
 gcc/config/nvptx/nvptx.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 2166f37b182..26c80716603 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4636,9 +4636,12 @@  nvptx_process_pars (parallel *par)
     {
       nvptx_shared_propagate (false, is_call, par->forked_block,
 			      par->forked_insn, !worker);
-      bool empty = nvptx_shared_propagate (true, is_call,
-					   par->forked_block, par->fork_insn,
-					   !worker);
+      bool no_prop_p
+	= nvptx_shared_propagate (true, is_call, par->forked_block,
+				  par->fork_insn, !worker);
+      bool empty_loop_p
+	= !is_call && (NEXT_INSN (par->forked_insn)
+		       && NEXT_INSN (par->forked_insn) == par->joining_insn);
       rtx barrier = GEN_INT (0);
       int threads = 0;
 
@@ -4648,7 +4651,11 @@  nvptx_process_pars (parallel *par)
 	  threads = nvptx_mach_vector_length ();
 	}
 
-      if (!empty || !is_call)
+      if (no_prop_p && empty_loop_p)
+	;
+      else if (no_prop_p && is_call)
+	;
+      else
 	{
 	  /* Insert begin and end synchronizations.  */
 	  emit_insn_before (nvptx_cta_sync (barrier, threads),