[06/11,nvptx] only use one bar.sync barriers in OpenACC offloaded code

Message ID aac2d55e04d89efc4dc8e516aa691c63ece0c7e2.1532464999.git.cesar@codesourcery.com
State New
Headers show
  • Initial vector length changes
Related show

Commit Message

Cesar Philippidis July 24, 2018, 8:47 p.m.
From: Cesar Philippidis <cesar@codesourcery.com>

This patch teaches nvptx_single to always use barrier '0' for CTA
synchronization. This started off as a cosmetic change, but later on
each large vector (i.e. one that larger than a PTX warp) will need to
use its own unique thread barrier to avoid thread divergence.
Consequently, this patch begins the process of teaching the nvptx
state propagator how to use a common thread barrier for each
propagation level.

2018-XX-YY  Cesar Philippidis  <cesar@codesourcery.com>

	* config/nvptx/nvptx.c (nvptx_single): Always pass false to
	(nvptx_process_pars): Likewise.

(cherry picked from openacc-gcc-7-branch commit



diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 4d46d89..1f954a6 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4374,7 +4374,7 @@  nvptx_single (unsigned mask, basic_block from, basic_block to)
 	  /* This barrier is needed to avoid worker zero clobbering
 	     the broadcast buffer before all the other workers have
 	     had a chance to read this instance of it.  */
-	  emit_insn_before (nvptx_cta_sync (true), tail);
+	  emit_insn_before (nvptx_cta_sync (false), tail);
       extract_insn (tail);
@@ -4501,7 +4501,7 @@  nvptx_process_pars (parallel *par)
 	  /* Insert begin and end synchronizations.  */
 	  emit_insn_before (nvptx_cta_sync (false), par->forked_insn);
-	  emit_insn_before (nvptx_cta_sync (true), par->join_insn);
+	  emit_insn_before (nvptx_cta_sync (false), par->join_insn);
   else if (par->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR))