+ [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||

Message ID da832c1c-fd40-c73e-b0b7-a5f003ad1e52@codesourcery.com
State New
Headers show
Series
  • + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||
Related show

Commit Message

Tobias Burnus May 6, 2021, 10:17 a.m.
The complex/float && and || reduction patch missed a target testcase
(→ attached) which revealed that also a SIMT needed some special
handling, but just runs on non-SIMT systems.

The omp-low.c patch is rather simple - and I think it semantically
okay.
[Note to the change: It looks more completed than it is:
- moving 'zero' decl out of the 'if' block
- moving that if block before the 'if (sctx.is_simt)' block
-  'if (is_fp_and_or)' to the 'if (sctx.is_simt)' block.]

I think at least the testcase should be added, possibly also
the omp-low.c change – albeit I get a later ICE (see below),
which needs either an XFAIL or a fix.

  * * *

ICE with NVPTX:

When the device lto1 starts, it fails when expanding the
intrinsic XCHG_BFLY function.

We have 'ivar' = complex float, which at rtx level is
converted to a concatenation (via gen_reg_rtx()).
In omp-low.c:
   IFN_GOMP_SIMT_XCHG_BFLY (TREE_TYPE(ivar), ivar, simt_lane)

Later in expand_GOMP_SIMT_XCHG_BFLY, we call:
371       expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
which fails by running into unreachable of 'expand_insn'
7844      if (!maybe_expand_insn (icode, nops, ops))
7845        gcc_unreachable ();

icode = CODE_FOR_omp_simt_xchg_bfly
nops = 3

(gdb) p ops[0]->type
$3 = EXPAND_OUTPUT

(gdb) p debug(ops[0]->value)
(concat:SC (reg:SF 85)
     (reg:SF 86))

(gdb) p ops[1]->type
$5 = EXPAND_INPUT

(gdb) p debug(ops[1]->value)
(concat:SC (reg:SF 26 [ orfc ])
     (reg:SF 27 [ orfc+4 ]))

(gdb) p ops[2]->type
$7 = EXPAND_INPUT

(gdb) p debug(ops[2]->value)
(reg:SI 52 [ _74 ])

The mentioned concat happens in


How to fix this? Or does this fall into the same category as
PR100321 (fixed by: r12-395, Disable SIMT for user-defined reduction) with its
follow-up PR 100408?

Small testcase is:

_Complex float rcf[1024];
int
reduction_or ()
{
   _Complex float orfc = 0;
   for (int i=0; i < 1024; ++i)
     orfc = orfc || rcf[i];
   return __real__ orfc;
}

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf

Comments

Jeff Law via Gcc-patches May 6, 2021, 10:30 a.m. | #1
On Thu, May 06, 2021 at 12:17:03PM +0200, Tobias Burnus wrote:
> OpenMP: Fix SIMT for complex/float reduction with && and ||

> 

> gcc/ChangeLog:

> 

> 	* omp-low.c (lower_rec_input_clauses): Also handle SIMT part

> 	for complex/float recution with && and ||.

> 

> libgomp/ChangeLog:

> 

> 	* testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing

> 	complex/floating-point || + && recduction with 'omp target'.


As the float/complex ||/&& reductions are IMHO just conformance issues, not
something anyone would actually use in meaningful code - floats or complex
aren't the most obvious or efficient holders of boolean values - I think
punting SIMT on those isn't a workaround, but the right solution.

	Jakub
Tom de Vries May 6, 2021, 1:12 p.m. | #2
On 5/6/21 12:30 PM, Jakub Jelinek wrote:
> On Thu, May 06, 2021 at 12:17:03PM +0200, Tobias Burnus wrote:

>> OpenMP: Fix SIMT for complex/float reduction with && and ||

>>

>> gcc/ChangeLog:

>>

>> 	* omp-low.c (lower_rec_input_clauses): Also handle SIMT part

>> 	for complex/float recution with && and ||.

>>

>> libgomp/ChangeLog:

>>

>> 	* testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing

>> 	complex/floating-point || + && recduction with 'omp target'.

> 

> As the float/complex ||/&& reductions are IMHO just conformance issues, not

> something anyone would actually use in meaningful code - floats or complex

> aren't the most obvious or efficient holders of boolean values - I think

> punting SIMT on those isn't a workaround, but the right solution.

> 


Ack.

WIP patch below tries that approach and fixes the ICE, but this simple
example still doesn't work:
...
int
main ()
{
  float andf = 1;

  #pragma omp target parallel reduction(&&: andf)
  for (int i=0; i < 1024; ++i)
    andf = andf && 0.0;

  if ((int)andf != 0)
    __builtin_abort ();

  return 0;
}
...

Thanks,
- Tom
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 26ceaf74b2d..d8f2487054f 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4389,14 +4389,28 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
 	{
 	  for (tree c = gimple_omp_for_clauses (ctx->stmt); c;
 	       c = OMP_CLAUSE_CHAIN (c))
-	    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
-		&& OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
-	      {
-		/* UDR reductions are not supported yet for SIMT, disable
-		   SIMT.  */
-		sctx->max_vf = 1;
-		break;
-	      }
+	    {
+	      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
+		continue;
+
+	      if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
+		{
+		  /* UDR reductions are not supported yet for SIMT, disable
+		     SIMT.  */
+		  sctx->max_vf = 1;
+		  break;
+		}
+
+	      if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c))
+		  && TREE_CODE (TREE_TYPE (new_var)) != BOOLEAN_TYPE)
+		{
+		  /* Doing boolean operations on non-boolean types is
+		     for conformance only, it's not worth supporting this
+		     for SIMT.  */
+		  sctx->max_vf = 1;
+		  break;
+		}
+	    }
 	}
       if (maybe_gt (sctx->max_vf, 1U))
 	{
Jeff Law via Gcc-patches May 6, 2021, 1:22 p.m. | #3
On Thu, May 06, 2021 at 03:12:59PM +0200, Tom de Vries wrote:
> +	      if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c))

> +		  && TREE_CODE (TREE_TYPE (new_var)) != BOOLEAN_TYPE)


I would use && !INTEGRAL_TYPE_P (TREE_TYPE (new_var))
Especially in C code using || or && with int or other non-_Bool types
will pretty frequent.
Of course, if that doesn't work for SIMT either, it needs further work
and punting on those could be a temporary workaround.  But it would be
a preexisting issue, not something introduced with accepting &&/|| for
floating/complex types - we've accepted &&/|| for integral types forever.

	Jakub
Tom de Vries May 6, 2021, 2:05 p.m. | #4
On 5/6/21 3:12 PM, Tom de Vries wrote:
> On 5/6/21 12:30 PM, Jakub Jelinek wrote:

>> On Thu, May 06, 2021 at 12:17:03PM +0200, Tobias Burnus wrote:

>>> OpenMP: Fix SIMT for complex/float reduction with && and ||

>>>

>>> gcc/ChangeLog:

>>>

>>> 	* omp-low.c (lower_rec_input_clauses): Also handle SIMT part

>>> 	for complex/float recution with && and ||.

>>>

>>> libgomp/ChangeLog:

>>>

>>> 	* testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing

>>> 	complex/floating-point || + && recduction with 'omp target'.

>>

>> As the float/complex ||/&& reductions are IMHO just conformance issues, not

>> something anyone would actually use in meaningful code - floats or complex

>> aren't the most obvious or efficient holders of boolean values - I think

>> punting SIMT on those isn't a workaround, but the right solution.

>>

> 

> Ack.

> 

> WIP patch below tries that approach and fixes the ICE, but this simple

> example still doesn't work:

> ...

> int

> main ()

> {

>   float andf = 1;

> 

>   #pragma omp target parallel reduction(&&: andf)

>   for (int i=0; i < 1024; ++i)

>     andf = andf && 0.0;

> 

>   if ((int)andf != 0)

>     __builtin_abort ();

> 

>   return 0;

> }

> ...


Hm, after rewriting things like this:
...
  #pragma omp target map (tofrom: andf)
  #pragma omp parallel reduction(&&: andf)
  for (int i=0; i < 1024; ++i)
    andf = andf && 0.0;
...
it does work.

My limited openmp knowledge is not enough to decide whether the fail of
the first variant is a test-case issue, or a gcc issue.

Thanks,
- Tom
Tobias Burnus May 6, 2021, 2:21 p.m. | #5
On 06.05.21 15:12, Tom de Vries wrote:

> WIP patch below tries that approach and fixes the ICE,

Thanks!
> but this simple example still doesn't work:

> ...

>    #pragma omp target parallel reduction(&&: andf)


Try: map(andf). [Cf. PR99928 with pending patch at
https://gcc.gnu.org/pipermail/gcc-patches/2021-April/567838.html ]

I have now added your WIP patch to my patch, honoring the comment by Jakub.
I also copied the _Complex int example to -6.c to have also a target
version for this.

Lightly tested for now w/ and w/o offloading, will run the testsuite now.

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
OpenMP: Fix SIMT for complex/float reduction with && and ||

2021-05-06  Tobias Burnus <tobias@codesourcery.com>
	    Tom de Vries  <tdevries@suse.de>

gcc/ChangeLog:

	* omp-low.c (lower_rec_simd_input_clauses): Set max_vf = 1 if
	a truth_value_p reduction variable is nonintegral.
	(lower_rec_input_clauses): Also handle SIMT part
	for complex/float recution with && and ||.

libgomp/ChangeLog:

	* testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
	complex/floating-point || + && reduction with 'omp target'.
	* testsuite/libgomp.c-c++-common/reduction-5.c: Likewise.

 gcc/omp-low.c                                      |  58 ++++--
 .../testsuite/libgomp.c-c++-common/reduction-5.c   | 192 ++++++++++++++++++++
 .../testsuite/libgomp.c-c++-common/reduction-6.c   | 195 +++++++++++++++++++++
 3 files changed, 426 insertions(+), 19 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 26ceaf74b2d..c3c72241486 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4389,14 +4389,28 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
 	{
 	  for (tree c = gimple_omp_for_clauses (ctx->stmt); c;
 	       c = OMP_CLAUSE_CHAIN (c))
-	    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
-		&& OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
-	      {
-		/* UDR reductions are not supported yet for SIMT, disable
-		   SIMT.  */
-		sctx->max_vf = 1;
-		break;
+	    {
+	      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
+		continue;
+
+	      if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
+		{
+		  /* UDR reductions are not supported yet for SIMT, disable
+		     SIMT.  */
+		  sctx->max_vf = 1;
+		  break;
+		}
+
+	      if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c))
+		  && !INTEGRAL_TYPE_P (TREE_TYPE (new_var)))
+		{
+		  /* Doing boolean operations on non-boolean types is
+		     for conformance only, it's not worth supporting this
+		     for SIMT.  */
+		  sctx->max_vf = 1;
+		  break;
 	      }
+	    }
 	}
       if (maybe_gt (sctx->max_vf, 1U))
 	{
@@ -6432,28 +6446,34 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 
 		      gimplify_assign (unshare_expr (ivar), x, &llist[0]);
 
-		      if (sctx.is_simt)
-			{
-			  if (!simt_lane)
-			    simt_lane = create_tmp_var (unsigned_type_node);
-			  x = build_call_expr_internal_loc
-			    (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
-			     TREE_TYPE (ivar), 2, ivar, simt_lane);
-			  x = build2 (code, TREE_TYPE (ivar), ivar, x);
-			  gimplify_assign (ivar, x, &llist[2]);
-			}
 		      tree ivar2 = ivar;
 		      tree ref2 = ref;
+		      tree zero = NULL_TREE;
 		      if (is_fp_and_or)
 			{
-			  tree zero = build_zero_cst (TREE_TYPE (ivar));
+			  zero = build_zero_cst (TREE_TYPE (ivar));
 			  ivar2 = fold_build2_loc (clause_loc, NE_EXPR,
 						   integer_type_node, ivar,
 						   zero);
 			  ref2 = fold_build2_loc (clause_loc, NE_EXPR,
 						  integer_type_node, ref, zero);
 			}
-		      x = build2 (code, TREE_TYPE (ref), ref2, ivar2);
+		      if (sctx.is_simt)
+			{
+			  if (!simt_lane)
+			    simt_lane = create_tmp_var (unsigned_type_node);
+			  x = build_call_expr_internal_loc
+			    (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
+			     TREE_TYPE (ivar), 2, ivar, simt_lane);
+			  if (is_fp_and_or)
+			    x = fold_build2_loc (clause_loc, NE_EXPR,
+						 integer_type_node, x, zero);
+			  x = build2 (code, TREE_TYPE (ivar2), ivar2, x);
+			  if (is_fp_and_or)
+			    x = fold_convert (TREE_TYPE (ivar), x);
+			  gimplify_assign (ivar, x, &llist[2]);
+			}
+		      x = build2 (code, TREE_TYPE (ref2), ref2, ivar2);
 		      if (is_fp_and_or)
 			x = fold_convert (TREE_TYPE (ref), x);
 		      ref = build_outer_var_ref (var, ctx);
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
new file mode 100644
index 00000000000..8ac9930b241
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
@@ -0,0 +1,192 @@
+/* C / C++'s logical AND and OR operators take any scalar argument
+   which compares (un)equal to 0 - the result 1 or 0 and of type int.
+
+   In this testcase, the int result is again converted to a floating-poing
+   or complex type.
+
+   While having a floating-point/complex array element with || and && can make
+   sense, having a non-integer/non-bool reduction variable is odd but valid.
+
+   Test: FP reduction variable + FP array - as reduction-1.c but with target  */
+
+#define N 1024
+_Complex float rcf[N];
+_Complex double rcd[N];
+float rf[N];
+double rd[N];
+
+int
+reduction_or ()
+{
+  float orf = 0;
+  double ord = 0;
+  _Complex float orfc = 0;
+  _Complex double ordc = 0;
+
+  #pragma omp target parallel reduction(||: orf) map(orf)
+  for (int i=0; i < N; ++i)
+    orf = orf || rf[i];
+
+  #pragma omp target parallel for reduction(||: ord) map(ord)
+  for (int i=0; i < N; ++i)
+    ord = ord || rcd[i];
+
+  #pragma omp target parallel for simd reduction(||: orfc) map(orfc)
+  for (int i=0; i < N; ++i)
+    orfc = orfc || rcf[i];
+
+  #pragma omp target parallel loop reduction(||: ordc) map(ordc)
+  for (int i=0; i < N; ++i)
+    ordc = ordc || rcd[i];
+
+  return orf + ord + __real__ orfc + __real__ ordc;
+}
+
+int
+reduction_or_teams ()
+{
+  float orf = 0;
+  double ord = 0;
+  _Complex float orfc = 0;
+  _Complex double ordc = 0;
+
+  #pragma omp target teams distribute parallel for reduction(||: orf) map(orf)
+  for (int i=0; i < N; ++i)
+    orf = orf || rf[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(||: ord) map(ord)
+  for (int i=0; i < N; ++i)
+    ord = ord || rcd[i];
+
+  #pragma omp target teams distribute parallel for reduction(||: orfc) map(orfc)
+  for (int i=0; i < N; ++i)
+    orfc = orfc || rcf[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(||: ordc) map(ordc)
+  for (int i=0; i < N; ++i)
+    ordc = ordc || rcd[i];
+
+  return orf + ord + __real__ orfc + __real__ ordc;
+}
+
+int
+reduction_and ()
+{
+  float andf = 1;
+  double andd = 1;
+  _Complex float andfc = 1;
+  _Complex double anddc = 1;
+
+  #pragma omp target parallel reduction(&&: andf) map(andf)
+  for (int i=0; i < N; ++i)
+    andf = andf && rf[i];
+
+  #pragma omp target parallel for reduction(&&: andd) map(andd)
+  for (int i=0; i < N; ++i)
+    andd = andd && rcd[i];
+
+  #pragma omp target parallel for simd reduction(&&: andfc) map(andfc)
+  for (int i=0; i < N; ++i)
+    andfc = andfc && rcf[i];
+
+  #pragma omp target parallel loop reduction(&&: anddc) map(anddc)
+  for (int i=0; i < N; ++i)
+    anddc = anddc && rcd[i];
+
+  return andf + andd + __real__ andfc + __real__ anddc;
+}
+
+int
+reduction_and_teams ()
+{
+  float andf = 1;
+  double andd = 1;
+  _Complex float andfc = 1;
+  _Complex double anddc = 1;
+
+  #pragma omp target teams distribute parallel for reduction(&&: andf) map(andf)
+  for (int i=0; i < N; ++i)
+    andf = andf && rf[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(&&: andd) map(andd)
+  for (int i=0; i < N; ++i)
+    andd = andd && rcd[i];
+
+  #pragma omp target teams distribute parallel for reduction(&&: andfc) map(andfc)
+  for (int i=0; i < N; ++i)
+    andfc = andfc && rcf[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(&&: anddc) map(anddc)
+  for (int i=0; i < N; ++i)
+    anddc = anddc && rcd[i];
+
+  return andf + andd + __real__ andfc + __real__ anddc;
+}
+
+int
+main ()
+{
+  for (int i = 0; i < N; ++i)
+    {
+      rf[i] = 0;
+      rd[i] = 0;
+      rcf[i] = 0;
+      rcd[i] = 0;
+    }
+
+  if (reduction_or () != 0)
+    __builtin_abort ();
+  if (reduction_or_teams () != 0)
+    __builtin_abort ();
+  if (reduction_and () != 0)
+    __builtin_abort ();
+  if (reduction_and_teams () != 0)
+    __builtin_abort ();
+
+  rf[10] = 1.0;
+  rd[15] = 1.0;
+  rcf[10] = 1.0;
+  rcd[15] = 1.0i;
+
+  if (reduction_or () != 4)
+    __builtin_abort ();
+  if (reduction_or_teams () != 4)
+    __builtin_abort ();
+  if (reduction_and () != 0)
+    __builtin_abort ();
+  if (reduction_and_teams () != 0)
+    __builtin_abort ();
+
+  for (int i = 0; i < N; ++i)
+    {
+      rf[i] = 1;
+      rd[i] = 1;
+      rcf[i] = 1;
+      rcd[i] = 1;
+    }
+
+  if (reduction_or () != 4)
+    __builtin_abort ();
+  if (reduction_or_teams () != 4)
+    __builtin_abort ();
+  if (reduction_and () != 4)
+    __builtin_abort ();
+  if (reduction_and_teams () != 4)
+    __builtin_abort ();
+
+  rf[10] = 0.0;
+  rd[15] = 0.0;
+  rcf[10] = 0.0;
+  rcd[15] = 0.0;
+
+  if (reduction_or () != 4)
+    __builtin_abort ();
+  if (reduction_or_teams () != 4)
+    __builtin_abort ();
+  if (reduction_and () != 0)
+    __builtin_abort ();
+  if (reduction_and_teams () != 0)
+    __builtin_abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c
new file mode 100644
index 00000000000..a223d296183
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c
@@ -0,0 +1,195 @@
+/* C / C++'s logical AND and OR operators take any scalar argument
+   which compares (un)equal to 0 - the result 1 or 0 and of type int.
+
+   In this testcase, the int result is again converted to an integer complex
+   type.
+
+   While having a floating-point/complex array element with || and && can make
+   sense, having a complex reduction variable is odd but valid.
+
+   Test: int complex reduction variable + int complex array.
+         as reduction-4.c but with target.  */
+
+#define N 1024
+_Complex char rcc[N];
+_Complex short rcs[N];
+_Complex int rci[N];
+_Complex long long rcl[N];
+
+int
+reduction_or ()
+{
+  _Complex char orc = 0;
+  _Complex short ors = 0;
+  _Complex int ori = 0;
+  _Complex long orl = 0;
+
+  #pragma omp target parallel reduction(||: orc) map(orc)
+  for (int i=0; i < N; ++i)
+    orc = orc || rcl[i];
+
+  #pragma omp target parallel for reduction(||: ors) map(ors)
+  for (int i=0; i < N; ++i)
+    ors = ors || rci[i];
+
+  #pragma omp target parallel for simd reduction(||: ori) map(ori)
+  for (int i=0; i < N; ++i)
+    ori = ori || rcs[i];
+
+  #pragma omp target parallel loop reduction(||: orl) map(orl)
+  for (int i=0; i < N; ++i)
+    orl = orl || rcc[i];
+
+  return __real__ (orc + ors + ori + orl) + __imag__ (orc + ors + ori + orl);
+}
+
+int
+reduction_or_teams ()
+{
+  _Complex char orc = 0;
+  _Complex short ors = 0;
+  _Complex int ori = 0;
+  _Complex long orl = 0;
+
+  #pragma omp target teams distribute parallel for reduction(||: orc) map(orc)
+  for (int i=0; i < N; ++i)
+    orc = orc || rcc[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(||: ors) map(ors)
+  for (int i=0; i < N; ++i)
+    ors = ors || rcs[i];
+
+  #pragma omp target teams distribute parallel for reduction(||: ori) map(ori)
+  for (int i=0; i < N; ++i)
+    ori = ori || rci[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(||: orl) map(orl)
+  for (int i=0; i < N; ++i)
+    orl = orl || rcl[i];
+
+  return __real__ (orc + ors + ori + orl) + __imag__ (orc + ors + ori + orl);
+}
+
+int
+reduction_and ()
+{
+  _Complex char andc = 1;
+  _Complex short ands = 1;
+  _Complex int andi = 1;
+  _Complex long andl = 1;
+
+  #pragma omp target parallel reduction(&&: andc) map(andc)
+  for (int i=0; i < N; ++i)
+    andc = andc && rcc[i];
+
+  #pragma omp target parallel for reduction(&&: ands) map(ands)
+  for (int i=0; i < N; ++i)
+    ands = ands && rcs[i];
+
+  #pragma omp target parallel for simd reduction(&&: andi) map(andi)
+  for (int i=0; i < N; ++i)
+    andi = andi && rci[i];
+
+  #pragma omp target parallel loop reduction(&&: andl) map(andl)
+  for (int i=0; i < N; ++i)
+    andl = andl && rcl[i];
+
+  return __real__ (andc + ands + andi + andl)
+	 + __imag__ (andc + ands + andi + andl);
+}
+
+int
+reduction_and_teams ()
+{
+  _Complex char andc = 1;
+  _Complex short ands = 1;
+  _Complex int andi = 1;
+  _Complex long andl = 1;
+
+  #pragma omp target teams distribute parallel for reduction(&&: andc) map(andc)
+  for (int i=0; i < N; ++i)
+    andc = andc && rcl[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(&&: ands) map(ands)
+  for (int i=0; i < N; ++i)
+    ands = ands && rci[i];
+
+  #pragma omp target teams distribute parallel for reduction(&&: andi) map(andi)
+  for (int i=0; i < N; ++i)
+    andi = andi && rcs[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(&&: andl) map(andl)
+  for (int i=0; i < N; ++i)
+    andl = andl && rcc[i];
+
+  return __real__ (andc + ands + andi + andl)
+	 + __imag__ (andc + ands + andi + andl);
+}
+
+int
+main ()
+{
+  for (int i = 0; i < N; ++i)
+    {
+      rcc[i] = 0;
+      rcs[i] = 0;
+      rci[i] = 0;
+      rcl[i] = 0;
+    }
+
+  if (reduction_or () != 0)
+    __builtin_abort ();
+  if (reduction_or_teams () != 0)
+    __builtin_abort ();
+  if (reduction_and () != 0)
+    __builtin_abort ();
+  if (reduction_and_teams () != 0)
+    __builtin_abort ();
+
+  rcc[10] = 1.0;
+  rcs[15] = 1.0i;
+  rci[10] = 1.0;
+  rcl[15] = 1.0i;
+
+  if (reduction_or () != 4)
+    __builtin_abort ();
+  if (reduction_or_teams () != 4)
+    __builtin_abort ();
+  if (reduction_and () != 0)
+    __builtin_abort ();
+  if (reduction_and_teams () != 0)
+    __builtin_abort ();
+
+  for (int i = 0; i < N; ++i)
+    {
+      rcc[i] = 1;
+      rcs[i] = 1i;
+      rci[i] = 1;
+      rcl[i] = 1 + 1i;
+    }
+
+  if (reduction_or () != 4)
+    __builtin_abort ();
+  if (reduction_or_teams () != 4)
+    __builtin_abort ();
+  if (reduction_and () != 4)
+    __builtin_abort ();
+  if (reduction_and_teams () != 4)
+    __builtin_abort ();
+
+  rcc[10] = 0.0;
+  rcs[15] = 0.0;
+  rci[10] = 0.0;
+  rcl[15] = 0.0;
+
+  if (reduction_or () != 4)
+    __builtin_abort ();
+  if (reduction_or_teams () != 4)
+    __builtin_abort ();
+  if (reduction_and () != 0)
+    __builtin_abort ();
+  if (reduction_and_teams () != 0)
+    __builtin_abort ();
+
+  return 0;
+}
Jeff Law via Gcc-patches May 6, 2021, 2:32 p.m. | #6
On Thu, May 06, 2021 at 04:21:40PM +0200, Tobias Burnus wrote:
> 	* omp-low.c (lower_rec_simd_input_clauses): Set max_vf = 1 if

> 	a truth_value_p reduction variable is nonintegral.

> 	(lower_rec_input_clauses): Also handle SIMT part

> 	for complex/float recution with && and ||.


s/recution/reduction/

> --- a/gcc/omp-low.c

> +++ b/gcc/omp-low.c

> @@ -4389,14 +4389,28 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,

>  	{

>  	  for (tree c = gimple_omp_for_clauses (ctx->stmt); c;

>  	       c = OMP_CLAUSE_CHAIN (c))

> -	    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION

> -		&& OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))

> -	      {

> -		/* UDR reductions are not supported yet for SIMT, disable

> -		   SIMT.  */

> -		sctx->max_vf = 1;

> -		break;

> +	    {

> +	      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)

> +		continue;

> +

> +	      if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))

> +		{

> +		  /* UDR reductions are not supported yet for SIMT, disable

> +		     SIMT.  */

> +		  sctx->max_vf = 1;

> +		  break;

> +		}

> +

> +	      if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c))

> +		  && !INTEGRAL_TYPE_P (TREE_TYPE (new_var)))

> +		{

> +		  /* Doing boolean operations on non-boolean types is

> +		     for conformance only, it's not worth supporting this

> +		     for SIMT.  */


This comment needs to be adjusted to talk about non-integral types.

> +		  sctx->max_vf = 1;

> +		  break;

>  	      }

> +	    }

>  	}

>        if (maybe_gt (sctx->max_vf, 1U))

>  	{

> @@ -6432,28 +6446,34 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,

>  

>  		      gimplify_assign (unshare_expr (ivar), x, &llist[0]);

>  

> -		      if (sctx.is_simt)

> -			{

> -			  if (!simt_lane)

> -			    simt_lane = create_tmp_var (unsigned_type_node);

> -			  x = build_call_expr_internal_loc

> -			    (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,

> -			     TREE_TYPE (ivar), 2, ivar, simt_lane);

> -			  x = build2 (code, TREE_TYPE (ivar), ivar, x);

> -			  gimplify_assign (ivar, x, &llist[2]);

> -			}

>  		      tree ivar2 = ivar;

>  		      tree ref2 = ref;

> +		      tree zero = NULL_TREE;

>  		      if (is_fp_and_or)

>  			{

> -			  tree zero = build_zero_cst (TREE_TYPE (ivar));

> +			  zero = build_zero_cst (TREE_TYPE (ivar));

>  			  ivar2 = fold_build2_loc (clause_loc, NE_EXPR,

>  						   integer_type_node, ivar,

>  						   zero);

>  			  ref2 = fold_build2_loc (clause_loc, NE_EXPR,

>  						  integer_type_node, ref, zero);

>  			}

> -		      x = build2 (code, TREE_TYPE (ref), ref2, ivar2);

> +		      if (sctx.is_simt)

> +			{

> +			  if (!simt_lane)

> +			    simt_lane = create_tmp_var (unsigned_type_node);

> +			  x = build_call_expr_internal_loc

> +			    (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,

> +			     TREE_TYPE (ivar), 2, ivar, simt_lane);

> +			  if (is_fp_and_or)

> +			    x = fold_build2_loc (clause_loc, NE_EXPR,

> +						 integer_type_node, x, zero);

> +			  x = build2 (code, TREE_TYPE (ivar2), ivar2, x);

> +			  if (is_fp_and_or)

> +			    x = fold_convert (TREE_TYPE (ivar), x);

> +			  gimplify_assign (ivar, x, &llist[2]);

> +			}

> +		      x = build2 (code, TREE_TYPE (ref2), ref2, ivar2);

>  		      if (is_fp_and_or)

>  			x = fold_convert (TREE_TYPE (ref), x);

>  		      ref = build_outer_var_ref (var, ctx);


Is this hunk still needed when the first hunk is in?
I mean, this is in code guarded with
is_simd && lower_rec_simd_input_clauses (...) and that function
will return false for if (known_eq (sctx->max_vf, 1U)) which the first hunk
ensures.
So sctx.is_simt && is_fp_and_or shouldn't be true in that code.

	Jakub
Tobias Burnus May 7, 2021, 10:05 a.m. | #7
On 06.05.21 16:32, Jakub Jelinek wrote:

> s/recution/reduction/

Fixed.
> This comment needs to be adjusted to talk about non-integral types.

Fixed.
> Is this hunk still needed when the first hunk is in?


No - and now removed.

Updated code attached.

Tobias


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
OpenMP: Fix SIMT for complex/float reduction with && and ||

2021-05-07  Tobias Burnus  <tobias@codesourcery.com>
	    Tom de Vries  <tdevries@suse.de>

gcc/ChangeLog:

	* omp-low.c (lower_rec_simd_input_clauses): Set max_vf = 1 if
	a truth_value_p reduction variable is nonintegral.

libgomp/ChangeLog:

	* testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
	complex/floating-point || + && reduction with 'omp target'.
	* testsuite/libgomp.c-c++-common/reduction-5.c: Likewise.

 gcc/omp-low.c                                      |  28 ++-
 .../testsuite/libgomp.c-c++-common/reduction-5.c   | 193 ++++++++++++++++++++
 .../testsuite/libgomp.c-c++-common/reduction-6.c   | 196 +++++++++++++++++++++
 3 files changed, 410 insertions(+), 7 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 26ceaf74b2d..2325cfcfc34 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4389,14 +4389,28 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
 	{
 	  for (tree c = gimple_omp_for_clauses (ctx->stmt); c;
 	       c = OMP_CLAUSE_CHAIN (c))
-	    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
-		&& OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
-	      {
-		/* UDR reductions are not supported yet for SIMT, disable
-		   SIMT.  */
-		sctx->max_vf = 1;
-		break;
+	    {
+	      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
+		continue;
+
+	      if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
+		{
+		  /* UDR reductions are not supported yet for SIMT, disable
+		     SIMT.  */
+		  sctx->max_vf = 1;
+		  break;
+		}
+
+	      if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c))
+		  && !INTEGRAL_TYPE_P (TREE_TYPE (new_var)))
+		{
+		  /* Doing boolean operations on non-integral types is
+		     for conformance only, it's not worth supporting this
+		     for SIMT.  */
+		  sctx->max_vf = 1;
+		  break;
 	      }
+	    }
 	}
       if (maybe_gt (sctx->max_vf, 1U))
 	{
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
new file mode 100644
index 00000000000..21540512e23
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
@@ -0,0 +1,193 @@
+/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */
+/* C / C++'s logical AND and OR operators take any scalar argument
+   which compares (un)equal to 0 - the result 1 or 0 and of type int.
+
+   In this testcase, the int result is again converted to a floating-poing
+   or complex type.
+
+   While having a floating-point/complex array element with || and && can make
+   sense, having a non-integer/non-bool reduction variable is odd but valid.
+
+   Test: FP reduction variable + FP array - as reduction-1.c but with target  */
+
+#define N 1024
+_Complex float rcf[N];
+_Complex double rcd[N];
+float rf[N];
+double rd[N];
+
+int
+reduction_or ()
+{
+  float orf = 0;
+  double ord = 0;
+  _Complex float orfc = 0;
+  _Complex double ordc = 0;
+
+  #pragma omp target parallel reduction(||: orf) map(orf)
+  for (int i=0; i < N; ++i)
+    orf = orf || rf[i];
+
+  #pragma omp target parallel for reduction(||: ord) map(ord)
+  for (int i=0; i < N; ++i)
+    ord = ord || rcd[i];
+
+  #pragma omp target parallel for simd reduction(||: orfc) map(orfc)
+  for (int i=0; i < N; ++i)
+    orfc = orfc || rcf[i];
+
+  #pragma omp target parallel loop reduction(||: ordc) map(ordc)
+  for (int i=0; i < N; ++i)
+    ordc = ordc || rcd[i];
+
+  return orf + ord + __real__ orfc + __real__ ordc;
+}
+
+int
+reduction_or_teams ()
+{
+  float orf = 0;
+  double ord = 0;
+  _Complex float orfc = 0;
+  _Complex double ordc = 0;
+
+  #pragma omp target teams distribute parallel for reduction(||: orf) map(orf)
+  for (int i=0; i < N; ++i)
+    orf = orf || rf[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(||: ord) map(ord)
+  for (int i=0; i < N; ++i)
+    ord = ord || rcd[i];
+
+  #pragma omp target teams distribute parallel for reduction(||: orfc) map(orfc)
+  for (int i=0; i < N; ++i)
+    orfc = orfc || rcf[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(||: ordc) map(ordc)
+  for (int i=0; i < N; ++i)
+    ordc = ordc || rcd[i];
+
+  return orf + ord + __real__ orfc + __real__ ordc;
+}
+
+int
+reduction_and ()
+{
+  float andf = 1;
+  double andd = 1;
+  _Complex float andfc = 1;
+  _Complex double anddc = 1;
+
+  #pragma omp target parallel reduction(&&: andf) map(andf)
+  for (int i=0; i < N; ++i)
+    andf = andf && rf[i];
+
+  #pragma omp target parallel for reduction(&&: andd) map(andd)
+  for (int i=0; i < N; ++i)
+    andd = andd && rcd[i];
+
+  #pragma omp target parallel for simd reduction(&&: andfc) map(andfc)
+  for (int i=0; i < N; ++i)
+    andfc = andfc && rcf[i];
+
+  #pragma omp target parallel loop reduction(&&: anddc) map(anddc)
+  for (int i=0; i < N; ++i)
+    anddc = anddc && rcd[i];
+
+  return andf + andd + __real__ andfc + __real__ anddc;
+}
+
+int
+reduction_and_teams ()
+{
+  float andf = 1;
+  double andd = 1;
+  _Complex float andfc = 1;
+  _Complex double anddc = 1;
+
+  #pragma omp target teams distribute parallel for reduction(&&: andf) map(andf)
+  for (int i=0; i < N; ++i)
+    andf = andf && rf[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(&&: andd) map(andd)
+  for (int i=0; i < N; ++i)
+    andd = andd && rcd[i];
+
+  #pragma omp target teams distribute parallel for reduction(&&: andfc) map(andfc)
+  for (int i=0; i < N; ++i)
+    andfc = andfc && rcf[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(&&: anddc) map(anddc)
+  for (int i=0; i < N; ++i)
+    anddc = anddc && rcd[i];
+
+  return andf + andd + __real__ andfc + __real__ anddc;
+}
+
+int
+main ()
+{
+  for (int i = 0; i < N; ++i)
+    {
+      rf[i] = 0;
+      rd[i] = 0;
+      rcf[i] = 0;
+      rcd[i] = 0;
+    }
+
+  if (reduction_or () != 0)
+    __builtin_abort ();
+  if (reduction_or_teams () != 0)
+    __builtin_abort ();
+  if (reduction_and () != 0)
+    __builtin_abort ();
+  if (reduction_and_teams () != 0)
+    __builtin_abort ();
+
+  rf[10] = 1.0;
+  rd[15] = 1.0;
+  rcf[10] = 1.0;
+  rcd[15] = 1.0i;
+
+  if (reduction_or () != 4)
+    __builtin_abort ();
+  if (reduction_or_teams () != 4)
+    __builtin_abort ();
+  if (reduction_and () != 0)
+    __builtin_abort ();
+  if (reduction_and_teams () != 0)
+    __builtin_abort ();
+
+  for (int i = 0; i < N; ++i)
+    {
+      rf[i] = 1;
+      rd[i] = 1;
+      rcf[i] = 1;
+      rcd[i] = 1;
+    }
+
+  if (reduction_or () != 4)
+    __builtin_abort ();
+  if (reduction_or_teams () != 4)
+    __builtin_abort ();
+  if (reduction_and () != 4)
+    __builtin_abort ();
+  if (reduction_and_teams () != 4)
+    __builtin_abort ();
+
+  rf[10] = 0.0;
+  rd[15] = 0.0;
+  rcf[10] = 0.0;
+  rcd[15] = 0.0;
+
+  if (reduction_or () != 4)
+    __builtin_abort ();
+  if (reduction_or_teams () != 4)
+    __builtin_abort ();
+  if (reduction_and () != 0)
+    __builtin_abort ();
+  if (reduction_and_teams () != 0)
+    __builtin_abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c
new file mode 100644
index 00000000000..27d9ef6b635
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c
@@ -0,0 +1,196 @@
+/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */
+/* C / C++'s logical AND and OR operators take any scalar argument
+   which compares (un)equal to 0 - the result 1 or 0 and of type int.
+
+   In this testcase, the int result is again converted to an integer complex
+   type.
+
+   While having a floating-point/complex array element with || and && can make
+   sense, having a complex reduction variable is odd but valid.
+
+   Test: int complex reduction variable + int complex array.
+         as reduction-4.c but with target.  */
+
+#define N 1024
+_Complex char rcc[N];
+_Complex short rcs[N];
+_Complex int rci[N];
+_Complex long long rcl[N];
+
+int
+reduction_or ()
+{
+  _Complex char orc = 0;
+  _Complex short ors = 0;
+  _Complex int ori = 0;
+  _Complex long orl = 0;
+
+  #pragma omp target parallel reduction(||: orc) map(orc)
+  for (int i=0; i < N; ++i)
+    orc = orc || rcl[i];
+
+  #pragma omp target parallel for reduction(||: ors) map(ors)
+  for (int i=0; i < N; ++i)
+    ors = ors || rci[i];
+
+  #pragma omp target parallel for simd reduction(||: ori) map(ori)
+  for (int i=0; i < N; ++i)
+    ori = ori || rcs[i];
+
+  #pragma omp target parallel loop reduction(||: orl) map(orl)
+  for (int i=0; i < N; ++i)
+    orl = orl || rcc[i];
+
+  return __real__ (orc + ors + ori + orl) + __imag__ (orc + ors + ori + orl);
+}
+
+int
+reduction_or_teams ()
+{
+  _Complex char orc = 0;
+  _Complex short ors = 0;
+  _Complex int ori = 0;
+  _Complex long orl = 0;
+
+  #pragma omp target teams distribute parallel for reduction(||: orc) map(orc)
+  for (int i=0; i < N; ++i)
+    orc = orc || rcc[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(||: ors) map(ors)
+  for (int i=0; i < N; ++i)
+    ors = ors || rcs[i];
+
+  #pragma omp target teams distribute parallel for reduction(||: ori) map(ori)
+  for (int i=0; i < N; ++i)
+    ori = ori || rci[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(||: orl) map(orl)
+  for (int i=0; i < N; ++i)
+    orl = orl || rcl[i];
+
+  return __real__ (orc + ors + ori + orl) + __imag__ (orc + ors + ori + orl);
+}
+
+int
+reduction_and ()
+{
+  _Complex char andc = 1;
+  _Complex short ands = 1;
+  _Complex int andi = 1;
+  _Complex long andl = 1;
+
+  #pragma omp target parallel reduction(&&: andc) map(andc)
+  for (int i=0; i < N; ++i)
+    andc = andc && rcc[i];
+
+  #pragma omp target parallel for reduction(&&: ands) map(ands)
+  for (int i=0; i < N; ++i)
+    ands = ands && rcs[i];
+
+  #pragma omp target parallel for simd reduction(&&: andi) map(andi)
+  for (int i=0; i < N; ++i)
+    andi = andi && rci[i];
+
+  #pragma omp target parallel loop reduction(&&: andl) map(andl)
+  for (int i=0; i < N; ++i)
+    andl = andl && rcl[i];
+
+  return __real__ (andc + ands + andi + andl)
+	 + __imag__ (andc + ands + andi + andl);
+}
+
+int
+reduction_and_teams ()
+{
+  _Complex char andc = 1;
+  _Complex short ands = 1;
+  _Complex int andi = 1;
+  _Complex long andl = 1;
+
+  #pragma omp target teams distribute parallel for reduction(&&: andc) map(andc)
+  for (int i=0; i < N; ++i)
+    andc = andc && rcl[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(&&: ands) map(ands)
+  for (int i=0; i < N; ++i)
+    ands = ands && rci[i];
+
+  #pragma omp target teams distribute parallel for reduction(&&: andi) map(andi)
+  for (int i=0; i < N; ++i)
+    andi = andi && rcs[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(&&: andl) map(andl)
+  for (int i=0; i < N; ++i)
+    andl = andl && rcc[i];
+
+  return __real__ (andc + ands + andi + andl)
+	 + __imag__ (andc + ands + andi + andl);
+}
+
+int
+main ()
+{
+  for (int i = 0; i < N; ++i)
+    {
+      rcc[i] = 0;
+      rcs[i] = 0;
+      rci[i] = 0;
+      rcl[i] = 0;
+    }
+
+  if (reduction_or () != 0)
+    __builtin_abort ();
+  if (reduction_or_teams () != 0)
+    __builtin_abort ();
+  if (reduction_and () != 0)
+    __builtin_abort ();
+  if (reduction_and_teams () != 0)
+    __builtin_abort ();
+
+  rcc[10] = 1.0;
+  rcs[15] = 1.0i;
+  rci[10] = 1.0;
+  rcl[15] = 1.0i;
+
+  if (reduction_or () != 4)
+    __builtin_abort ();
+  if (reduction_or_teams () != 4)
+    __builtin_abort ();
+  if (reduction_and () != 0)
+    __builtin_abort ();
+  if (reduction_and_teams () != 0)
+    __builtin_abort ();
+
+  for (int i = 0; i < N; ++i)
+    {
+      rcc[i] = 1;
+      rcs[i] = 1i;
+      rci[i] = 1;
+      rcl[i] = 1 + 1i;
+    }
+
+  if (reduction_or () != 4)
+    __builtin_abort ();
+  if (reduction_or_teams () != 4)
+    __builtin_abort ();
+  if (reduction_and () != 4)
+    __builtin_abort ();
+  if (reduction_and_teams () != 4)
+    __builtin_abort ();
+
+  rcc[10] = 0.0;
+  rcs[15] = 0.0;
+  rci[10] = 0.0;
+  rcl[15] = 0.0;
+
+  if (reduction_or () != 4)
+    __builtin_abort ();
+  if (reduction_or_teams () != 4)
+    __builtin_abort ();
+  if (reduction_and () != 0)
+    __builtin_abort ();
+  if (reduction_and_teams () != 0)
+    __builtin_abort ();
+
+  return 0;
+}
Jeff Law via Gcc-patches May 7, 2021, 10:06 a.m. | #8
On Fri, May 07, 2021 at 12:05:11PM +0200, Tobias Burnus wrote:
> 2021-05-07  Tobias Burnus  <tobias@codesourcery.com>

> 	    Tom de Vries  <tdevries@suse.de>

> 

> gcc/ChangeLog:

> 

> 	* omp-low.c (lower_rec_simd_input_clauses): Set max_vf = 1 if

> 	a truth_value_p reduction variable is nonintegral.

> 

> libgomp/ChangeLog:

> 

> 	* testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing

> 	complex/floating-point || + && reduction with 'omp target'.

> 	* testsuite/libgomp.c-c++-common/reduction-5.c: Likewise.

> 

>  gcc/omp-low.c                                      |  28 ++-

>  .../testsuite/libgomp.c-c++-common/reduction-5.c   | 193 ++++++++++++++++++++

>  .../testsuite/libgomp.c-c++-common/reduction-6.c   | 196 +++++++++++++++++++++

>  3 files changed, 410 insertions(+), 7 deletions(-)


Ok, thanks.

	Jakub
Tom de Vries May 7, 2021, 10:08 a.m. | #9
On 5/7/21 12:05 PM, Tobias Burnus wrote:
> On 06.05.21 16:32, Jakub Jelinek wrote:

> 

>> s/recution/reduction/

> Fixed.

>> This comment needs to be adjusted to talk about non-integral types.

> Fixed.

>> Is this hunk still needed when the first hunk is in?

> 

> No - and now removed.

> 

> Updated code attached.

> 



> libgomp/ChangeLog:

> 

> 	* testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing

> 	complex/floating-point || + && reduction with 'omp target'.

> 	* testsuite/libgomp.c-c++-common/reduction-5.c: Likewise.


5 -> 6.

Otherwise, LGTM.

Thanks,
- Tom
Thomas Schwinge May 18, 2021, 11:07 a.m. | #10
Hi!

On 2021-05-07T12:05:11+0200, Tobias Burnus <tobias@codesourcery.com> wrote:
> --- /dev/null

> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c

> @@ -0,0 +1,193 @@

> +/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */


> --- /dev/null

> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c

> @@ -0,0 +1,196 @@

> +/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */


Causes issues if more than nvptx offloading compilation is enabled.  Thus
pushed "'libgomp.c-c++-common/reduction-{5,6}.c': Restrict '-latomic' to
nvptx offloading compilation" to master branch in commit
937fa5fb7840c19c96b1fdf1ce678699649a6c5e, see attached.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
From 937fa5fb7840c19c96b1fdf1ce678699649a6c5e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>

Date: Mon, 17 May 2021 08:05:40 +0200
Subject: [PATCH] 'libgomp.c-c++-common/reduction-{5,6}.c': Restrict '-latomic'
 to nvptx offloading compilation

Fix-up for recent commit 33b647956caa977d1ae489f9baed9cef70b4f382
"OpenMP: Fix SIMT for complex/float reduction with && and ||"; see
commit d42088e453042f4f8ba9190a7e29efd937ea2181 "Avoid -latomic for amdgcn
offloading".

	libgomp/
	* testsuite/libgomp.c-c++-common/reduction-5.c: Restrict
	'-latomic' to nvptx offloading compilation.
	* testsuite/libgomp.c-c++-common/reduction-6.c: Likewise.
---
 libgomp/testsuite/libgomp.c-c++-common/reduction-5.c | 2 +-
 libgomp/testsuite/libgomp.c-c++-common/reduction-6.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
index 21540512e23..31fa2670312 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */
+/* { dg-additional-options "-foffload=nvptx-none=-latomic" { target { offload_target_nvptx } } } */
 /* C / C++'s logical AND and OR operators take any scalar argument
    which compares (un)equal to 0 - the result 1 or 0 and of type int.
 
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c
index 27d9ef6b635..727e11e4edf 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */
+/* { dg-additional-options "-foffload=nvptx-none=-latomic" { target { offload_target_nvptx } } } */
 /* C / C++'s logical AND and OR operators take any scalar argument
    which compares (un)equal to 0 - the result 1 or 0 and of type int.
 
-- 
2.30.2

Patch

OpenMP: Fix SIMT for complex/float reduction with && and ||

gcc/ChangeLog:

	* omp-low.c (lower_rec_input_clauses): Also handle SIMT part
	for complex/float recution with && and ||.

libgomp/ChangeLog:

	* testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
	complex/floating-point || + && recduction with 'omp target'.

 gcc/omp-low.c                                      |  30 ++--
 .../testsuite/libgomp.c-c++-common/reduction-5.c   | 192 +++++++++++++++++++++
 2 files changed, 210 insertions(+), 12 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 26ceaf7..46220c5 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -6432,28 +6432,34 @@  lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 
 		      gimplify_assign (unshare_expr (ivar), x, &llist[0]);
 
-		      if (sctx.is_simt)
-			{
-			  if (!simt_lane)
-			    simt_lane = create_tmp_var (unsigned_type_node);
-			  x = build_call_expr_internal_loc
-			    (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
-			     TREE_TYPE (ivar), 2, ivar, simt_lane);
-			  x = build2 (code, TREE_TYPE (ivar), ivar, x);
-			  gimplify_assign (ivar, x, &llist[2]);
-			}
 		      tree ivar2 = ivar;
 		      tree ref2 = ref;
+		      tree zero = NULL_TREE;
 		      if (is_fp_and_or)
 			{
-			  tree zero = build_zero_cst (TREE_TYPE (ivar));
+			  zero = build_zero_cst (TREE_TYPE (ivar));
 			  ivar2 = fold_build2_loc (clause_loc, NE_EXPR,
 						   integer_type_node, ivar,
 						   zero);
 			  ref2 = fold_build2_loc (clause_loc, NE_EXPR,
 						  integer_type_node, ref, zero);
 			}
-		      x = build2 (code, TREE_TYPE (ref), ref2, ivar2);
+		      if (sctx.is_simt)
+			{
+			  if (!simt_lane)
+			    simt_lane = create_tmp_var (unsigned_type_node);
+			  x = build_call_expr_internal_loc
+			    (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
+			     TREE_TYPE (ivar), 2, ivar, simt_lane);
+			  if (is_fp_and_or)
+			    x = fold_build2_loc (clause_loc, NE_EXPR,
+						 integer_type_node, x, zero);
+			  x = build2 (code, TREE_TYPE (ivar2), ivar2, x);
+			  if (is_fp_and_or)
+			    x = fold_convert (TREE_TYPE (ivar), x);
+			  gimplify_assign (ivar, x, &llist[2]);
+			}
+		      x = build2 (code, TREE_TYPE (ref2), ref2, ivar2);
 		      if (is_fp_and_or)
 			x = fold_convert (TREE_TYPE (ref), x);
 		      ref = build_outer_var_ref (var, ctx);
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
new file mode 100644
index 0000000..346c882
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
@@ -0,0 +1,192 @@ 
+/* C / C++'s logical AND and OR operators take any scalar argument
+   which compares (un)equal to 0 - the result 1 or 0 and of type int.
+
+   In this testcase, the int result is again converted to a floating-poing
+   or complex type.
+
+   While having a floating-point/complex array element with || and && can make
+   sense, having a non-integer/non-bool reduction variable is odd but valid.
+
+   Test: FP reduction variable + FP array.  */
+
+#define N 1024
+_Complex float rcf[N];
+_Complex double rcd[N];
+float rf[N];
+double rd[N];
+
+int
+reduction_or ()
+{
+  float orf = 0;
+  double ord = 0;
+  _Complex float orfc = 0;
+  _Complex double ordc = 0;
+
+  #pragma omp target parallel reduction(||: orf)
+  for (int i=0; i < N; ++i)
+    orf = orf || rf[i];
+
+  #pragma omp target parallel for reduction(||: ord)
+  for (int i=0; i < N; ++i)
+    ord = ord || rcd[i];
+
+  #pragma omp target parallel for simd reduction(||: orfc)
+  for (int i=0; i < N; ++i)
+    orfc = orfc || rcf[i];
+
+  #pragma omp target parallel loop reduction(||: ordc)
+  for (int i=0; i < N; ++i)
+    ordc = ordc || rcd[i];
+
+  return orf + ord + __real__ orfc + __real__ ordc;
+}
+
+int
+reduction_or_teams ()
+{
+  float orf = 0;
+  double ord = 0;
+  _Complex float orfc = 0;
+  _Complex double ordc = 0;
+
+  #pragma omp target teams distribute parallel for reduction(||: orf)
+  for (int i=0; i < N; ++i)
+    orf = orf || rf[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(||: ord)
+  for (int i=0; i < N; ++i)
+    ord = ord || rcd[i];
+
+  #pragma omp target teams distribute parallel for reduction(||: orfc)
+  for (int i=0; i < N; ++i)
+    orfc = orfc || rcf[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(||: ordc)
+  for (int i=0; i < N; ++i)
+    ordc = ordc || rcd[i];
+
+  return orf + ord + __real__ orfc + __real__ ordc;
+}
+
+int
+reduction_and ()
+{
+  float andf = 1;
+  double andd = 1;
+  _Complex float andfc = 1;
+  _Complex double anddc = 1;
+
+  #pragma omp target parallel reduction(&&: andf)
+  for (int i=0; i < N; ++i)
+    andf = andf && rf[i];
+
+  #pragma omp target parallel for reduction(&&: andd)
+  for (int i=0; i < N; ++i)
+    andd = andd && rcd[i];
+
+  #pragma omp target parallel for simd reduction(&&: andfc)
+  for (int i=0; i < N; ++i)
+    andfc = andfc && rcf[i];
+
+  #pragma omp target parallel loop reduction(&&: anddc)
+  for (int i=0; i < N; ++i)
+    anddc = anddc && rcd[i];
+
+  return andf + andd + __real__ andfc + __real__ anddc;
+}
+
+int
+reduction_and_teams ()
+{
+  float andf = 1;
+  double andd = 1;
+  _Complex float andfc = 1;
+  _Complex double anddc = 1;
+
+  #pragma omp target teams distribute parallel for reduction(&&: andf)
+  for (int i=0; i < N; ++i)
+    andf = andf && rf[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(&&: andd)
+  for (int i=0; i < N; ++i)
+    andd = andd && rcd[i];
+
+  #pragma omp target teams distribute parallel for reduction(&&: andfc)
+  for (int i=0; i < N; ++i)
+    andfc = andfc && rcf[i];
+
+  #pragma omp target teams distribute parallel for simd reduction(&&: anddc)
+  for (int i=0; i < N; ++i)
+    anddc = anddc && rcd[i];
+
+  return andf + andd + __real__ andfc + __real__ anddc;
+}
+
+int
+main ()
+{
+  for (int i = 0; i < N; ++i)
+    {
+      rf[i] = 0;
+      rd[i] = 0;
+      rcf[i] = 0;
+      rcd[i] = 0;
+    }
+
+  if (reduction_or () != 0)
+    __builtin_abort ();
+  if (reduction_or_teams () != 0)
+    __builtin_abort ();
+  if (reduction_and () != 0)
+    __builtin_abort ();
+  if (reduction_and_teams () != 0)
+    __builtin_abort ();
+
+  rf[10] = 1.0;
+  rd[15] = 1.0;
+  rcf[10] = 1.0;
+  rcd[15] = 1.0i;
+
+  if (reduction_or () != 4)
+    __builtin_abort ();
+  if (reduction_or_teams () != 4)
+    __builtin_abort ();
+  if (reduction_and () != 0)
+    __builtin_abort ();
+  if (reduction_and_teams () != 0)
+    __builtin_abort ();
+
+  for (int i = 0; i < N; ++i)
+    {
+      rf[i] = 1;
+      rd[i] = 1;
+      rcf[i] = 1;
+      rcd[i] = 1;
+    }
+
+  if (reduction_or () != 4)
+    __builtin_abort ();
+  if (reduction_or_teams () != 4)
+    __builtin_abort ();
+  if (reduction_and () != 4)
+    __builtin_abort ();
+  if (reduction_and_teams () != 4)
+    __builtin_abort ();
+
+  rf[10] = 0.0;
+  rd[15] = 0.0;
+  rcf[10] = 0.0;
+  rcd[15] = 0.0;
+
+  if (reduction_or () != 4)
+    __builtin_abort ();
+  if (reduction_or_teams () != 4)
+    __builtin_abort ();
+  if (reduction_and () != 0)
+    __builtin_abort ();
+  if (reduction_and_teams () != 0)
+    __builtin_abort ();
+
+  return 0;
+}