[PR90743] Fortran 'allocatable' with OpenACC data/OpenMP 'target' 'map' clauses

Message ID 87y32ggz9o.fsf@euler.schwinge.homeip.net
State New
Headers show
Series
  • [PR90743] Fortran 'allocatable' with OpenACC data/OpenMP 'target' 'map' clauses
Related show

Commit Message

Thomas Schwinge June 5, 2019, 9:25 a.m.
Hi!

After having learned from PR90741 "Unreachable second '__builtin_malloc'
for scalar 'allocatable'", I then in context of PR90743 "Device-side
'malloc' for Fortran 'allocatable' scalar" had a look at what OpenMP 5.0
is saying about Fortran 'allocatable' in 'map' clauses, and suggest to
document that with the following test case.

I'm also making OpenACC do the same by fixing one thing.  (I'm aware we
should really be using 'gomp_map_val', but that's a different change;
likely in context of PR90596 "'GOACC_parallel_keyed' should use
'GOMP_MAP_VARS_TARGET'".)

Is the attached patch OK?  If approving this patch, please respond with
"Reviewed-by: NAME <EMAIL>" so that your effort will be recorded in the
commit log, see <https://gcc.gnu.org/wiki/Reviewed-by>.


Grüße
 Thomas

Comments

Jakub Jelinek June 5, 2019, 10 a.m. | #1
On Wed, Jun 05, 2019 at 11:25:07AM +0200, Thomas Schwinge wrote:
> 	libgomp/

> 	PR fortran/90743

> 	* oacc-parallel.c (GOACC_parallel_keyed): Handle NULL case.

> 	* testsuite/libgomp.fortran/target-allocatable-1.f90: New file.

> 	* testsuite/libgomp.oacc-fortran/allocatable-1.f90: New file.

> ---

>  libgomp/oacc-parallel.c                       |  9 ++-

>  .../libgomp.fortran/target-allocatable-1.f90  |  8 +++

>  .../libgomp.oacc-fortran/allocatable-1.f90    | 70 +++++++++++++++++++

>  3 files changed, 84 insertions(+), 3 deletions(-)

>  create mode 100644 libgomp/testsuite/libgomp.fortran/target-allocatable-1.f90

>  create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/allocatable-1.f90

> 

> diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c

> index e56330f6226b..0c2cfa05a438 100644

> --- a/libgomp/oacc-parallel.c

> +++ b/libgomp/oacc-parallel.c

> @@ -325,9 +325,12 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),

>    

>    devaddrs = gomp_alloca (sizeof (void *) * mapnum);

>    for (i = 0; i < mapnum; i++)

> -    devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start

> -			    + tgt->list[i].key->tgt_offset

> -			    + tgt->list[i].offset);

> +    if (tgt->list[i].key != NULL)

> +      devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start

> +			      + tgt->list[i].key->tgt_offset

> +			      + tgt->list[i].offset);

> +    else

> +      devaddrs[i] = NULL;


I don't know what does OpenACC require for allocatables, so can't comment on
this (and it falls under OpenACC maintainance).

> +

> +  !$omp target map(to: a) map(tofrom: b, c, d) map(from: e)

> +  !$acc parallel copyin(a) copy(b, c, d) copyout(e)


Is mixing OpenMP and OpenACC construct this way defined at all?
I see we reject OpenMP constructs inside of OpenACC contexts, and
using OpenACC constructs inside host OpenMP constructs should be generally
fine too, but mixing OpenMP offloading constructs with OpenACC constructs
sounds wrong.

	Jakub
Thomas Schwinge June 5, 2019, 10:26 a.m. | #2
Hi Jakub!

On Wed, 5 Jun 2019 12:00:25 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Jun 05, 2019 at 11:25:07AM +0200, Thomas Schwinge wrote:

> > +  !$omp target map(to: a) map(tofrom: b, c, d) map(from: e)

> > +  !$acc parallel copyin(a) copy(b, c, d) copyout(e)

> 

> Is mixing OpenMP and OpenACC construct this way defined at all?


It's not.  I'm using this just to avoid duplicating the test case file,
that is, '-fopenacc' and '-fopenmp' aren't enabled at the same time.

> I see we reject OpenMP constructs inside of OpenACC contexts


ACK.

> and

> using OpenACC constructs inside host OpenMP constructs should be generally

> fine too


ACK.  (Should, but currently isn't.)

> but mixing OpenMP offloading constructs with OpenACC constructs

> sounds wrong.


ACK.  (Unless that gets defined by the two standards.)


Grüße
 Thomas
Jakub Jelinek June 5, 2019, 10:32 a.m. | #3
On Wed, Jun 05, 2019 at 12:26:32PM +0200, Thomas Schwinge wrote:
> Hi Jakub!

> 

> On Wed, 5 Jun 2019 12:00:25 +0200, Jakub Jelinek <jakub@redhat.com> wrote:

> > On Wed, Jun 05, 2019 at 11:25:07AM +0200, Thomas Schwinge wrote:

> > > +  !$omp target map(to: a) map(tofrom: b, c, d) map(from: e)

> > > +  !$acc parallel copyin(a) copy(b, c, d) copyout(e)

> > 

> > Is mixing OpenMP and OpenACC construct this way defined at all?

> 

> It's not.  I'm using this just to avoid duplicating the test case file,

> that is, '-fopenacc' and '-fopenmp' aren't enabled at the same time.


I think it is better to duplicate the test, it avoids confusion.

	Jakub
Thomas Schwinge June 21, 2019, 11:46 a.m. | #4
Hi!

On Wed, 05 Jun 2019 11:25:07 +0200, I wrote:
> I [...] had a look at what OpenMP 5.0

> is saying about Fortran 'allocatable' in 'map' clauses [...]


Will you (Jakub, and/or Fortran guys), please have a look at the attached
test case.

If I disable the '!$omp declare target (ar)', then this works with
offloading enabled, otherwise it fails, and here is my understanding what
happens.

With '!$omp declare target (ar)' active, the array descriptor globally
gets allocated on the device, via the 'offload_vars' handling in
'gcc/omp-offload.c' etc.

Then, in 'gcc/omp-low.c:scan_sharing_clauses', the 'GOMP_MAP_TO_PSET(ar)'
(which would update the array descriptor on the device after the
host-side 'allocate') gets removed, because 'Global variables with "omp
declare target" attribute don't need to be copied, the receiver side will
use them directly'.  So, on the device, we'll still have the dummy
(unallocated) array descriptor, and will thus fail.

But even with that behavior disabled, we'll still fail, because in
'libgomp/target.c:gomp_map_vars_internal', when processing the
'GOMP_MAP_TO_PSET(ar)', we find that 'ar' has already been mapped
(globally, as mentioned above), so for 'n && n->refcount !=
REFCOUNT_LINK', we'll call 'gomp_map_vars_existing', and that one again
won't update device-side the array descriptor, because it's not
'GOMP_MAP_ALWAYS_TO_P'.

Indeed, if I use '!$omp declare target link(ar)' ('link' added), then
things seem to work as expected.

Unless I got something wrong, at which level do you suggest this should
be fixed?


Grüße
 Thomas
! { dg-do run }

module mod
  implicit none
  integer, parameter :: n = 40
  integer, allocatable :: ar(:,:,:)
  !$omp declare target (ar)
end module mod

program main
  use mod
  implicit none

  integer :: i
  integer, parameter :: ar_rank = rank(ar)
  integer :: ar_rank_v
  integer :: ar_size_v
  integer, dimension(ar_rank) :: ar_extent_v
  integer, dimension(ar_rank) :: ar_shape_v

  allocate (ar(n,0:n+1,-1:n+2))
  !$omp target enter data map(alloc:ar)

  !$omp target map(from:ar_rank_v, ar_size_v, ar_extent_v, ar_shape_v)
  ar_rank_v = rank(ar)
  ar_size_v = size(ar)
  do i = 1, ar_rank
     ar_extent_v(i) = size(ar, i)
  end do
  ar_shape_v = shape(ar)
  !$omp end target

  print *, ar_rank_v
  if (ar_rank_v /= rank(ar)) error stop
  print *, ar_size_v
  if (ar_size_v /= size(ar)) error stop
  print *, ar_extent_v
  if (any (ar_extent_v /= shape(ar))) error stop
  print *, ar_shape_v
  if (any (ar_shape_v /= shape(ar))) error stop
end program main
Thomas Schwinge June 21, 2019, 2:11 p.m. | #5
Hi!

On Fri, 21 Jun 2019 13:46:09 +0200, I wrote:
> On Wed, 05 Jun 2019 11:25:07 +0200, I wrote:

> > I [...] had a look at what OpenMP 5.0

> > is saying about Fortran 'allocatable' in 'map' clauses [...]

> 

> Will you (Jakub, and/or Fortran guys), please have a look at the attached

> test case.

> 

> If I disable the '!$omp declare target (ar)', then this works with

> offloading enabled, otherwise it fails, and here is my understanding what

> happens.

> 

> With '!$omp declare target (ar)' active, the array descriptor globally

> gets allocated on the device, via the 'offload_vars' handling in

> 'gcc/omp-offload.c' etc.

> 

> Then, in 'gcc/omp-low.c:scan_sharing_clauses', the 'GOMP_MAP_TO_PSET(ar)'

> (which would update the array descriptor on the device after the

> host-side 'allocate') gets removed, because 'Global variables with "omp

> declare target" attribute don't need to be copied, the receiver side will

> use them directly'.  So, on the device, we'll still have the dummy

> (unallocated) array descriptor, and will thus fail.

> 

> But even with that behavior disabled, we'll still fail, because in

> 'libgomp/target.c:gomp_map_vars_internal', when processing the

> 'GOMP_MAP_TO_PSET(ar)', we find that 'ar' has already been mapped

> (globally, as mentioned above), so for 'n && n->refcount !=

> REFCOUNT_LINK', we'll call 'gomp_map_vars_existing', and that one again

> won't update device-side the array descriptor, because it's not

> 'GOMP_MAP_ALWAYS_TO_P'.

> 

> Indeed, if I use '!$omp declare target link(ar)' ('link' added), then

> things seem to work as expected.

> 

> Unless I got something wrong, at which level do you suggest this should

> be fixed?


So, handling 'declare'd 'allocatable's via the 'link' case does seem to
make some sense to me -- but not yet fully thought through, for example,
how that'd relate to the (default) 'to' property of '!$omp declare target
[to](ar)'?  Is there even something in OpenMP that we'd indeed copy data
to the device (as opposed to just allocate)?  Because, in OpenACC, we
actually do have '!$acc declare copyin', etc.  (But what would that mean
with an 'allocatable'?)

Anyway, I tried a WIP patch, and I'm also attaching the previous test
case rewritten to add a '!$omp declare target' routine 'ar_properties' to
read the device-side array properties.  However, compiling that for
offloading results in:

    [...]/target-array-properties-allocatable-declare-1-2.f90:6: error: variable 'ar' has been referenced in offloaded code but hasn't been marked to be included in the offloaded code
        6 |   integer, allocatable :: ar(:,:,:)
          | 
    lto1: fatal error: errors during merging of translation units

Same, if I use an explicit '!$omp declare target link(ar)'.  (I thought
that was the very point of the 'link' clause, that you could then
reference such a variable in offloaded code; I shall re-read the
standards again.  Or, is that just another bug?)

(There doesn't seem to be any Fortran libgomp test case exercising the
'!$omp declare target link'?)

For the record, for the corresponding OpenACC code, we get:

    [...]/array-properties-allocatable-declare-1-2.f90:6:0:
    
        6 |   integer, allocatable :: ar(:,:,:)
          | 
    Error: 'ar' with 'link' clause used in 'routine' function
    [...]/array-properties-allocatable-declare-1-2.f90:6:0: Error: 'ar' with 'link' clause used in 'routine' function
    [...]/array-properties-allocatable-declare-1-2.f90:6:0: Error: 'ar' with 'link' clause used in 'routine' function

..., which is different -- but not better.

WIP patch:

diff --git gcc/fortran/trans-common.c gcc/fortran/trans-common.c
index debdbd98ac0..5c2cd9b539e 100644
--- gcc/fortran/trans-common.c
+++ gcc/fortran/trans-common.c
@@ -452,16 +452,18 @@ build_common_decl (gfc_common_head *com, tree union_type, bool is_init)
       DECL_USER_ALIGN (decl) = 0;
       GFC_DECL_COMMON_OR_EQUIV (decl) = 1;
 
       gfc_set_decl_location (decl, &com->where);
 
       if (com->threadprivate)
 	set_decl_tls_model (decl, decl_default_tls_model (decl));
 
+      /* Not possible to get an 'allocatable' here, no special handling
+	 required.  */
       if (com->omp_declare_target_link)
 	DECL_ATTRIBUTES (decl)
 	  = tree_cons (get_identifier ("omp declare target link"),
 		       NULL_TREE, DECL_ATTRIBUTES (decl));
       else if (com->omp_declare_target)
 	DECL_ATTRIBUTES (decl)
 	  = tree_cons (get_identifier ("omp declare target"),
 		       NULL_TREE, DECL_ATTRIBUTES (decl));
diff --git gcc/fortran/trans-decl.c gcc/fortran/trans-decl.c
index 8803525f6ae..029962f65c6 100644
--- gcc/fortran/trans-decl.c
+++ gcc/fortran/trans-decl.c
@@ -1429,18 +1429,26 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list)
       tree c = build_omp_clause (UNKNOWN_LOCATION, code);
       OMP_CLAUSE_CHAIN (c) = clauses;
       clauses = c;
 
       tree dims = oacc_build_routine_dims (clauses);
       list = oacc_replace_fn_attrib_attr (list, dims);
     }
 
+  /* For an 'allocatable', it's the user's responsibility to 'allocate' it, and
+     create a device copy, so here, we handle it like the 'link' case.  */
   if (sym_attr.omp_declare_target_link
-      || sym_attr.oacc_declare_link)
+      || sym_attr.oacc_declare_link
+      || ((sym_attr.omp_declare_target
+	  || sym_attr.oacc_declare_create
+	  || sym_attr.oacc_declare_copyin
+	  || sym_attr.oacc_declare_deviceptr
+	  || sym_attr.oacc_declare_device_resident)
+	  && sym_attr.allocatable))
     list = tree_cons (get_identifier ("omp declare target link"),
 		      NULL_TREE, list);
   else if (sym_attr.omp_declare_target
 	   || sym_attr.oacc_declare_create
 	   || sym_attr.oacc_declare_copyin
 	   || sym_attr.oacc_declare_deviceptr
 	   || sym_attr.oacc_declare_device_resident)
     list = tree_cons (get_identifier ("omp declare target"),
@@ -6473,17 +6481,22 @@ find_module_oacc_declare_clauses (gfc_symbol *sym)
 	map_op = OMP_MAP_DEVICE_RESIDENT;
 
       if (sym->attr.oacc_declare_create
 	  || sym->attr.oacc_declare_copyin
 	  || sym->attr.oacc_declare_deviceptr
 	  || sym->attr.oacc_declare_device_resident)
 	{
 	  sym->attr.referenced = 1;
-	  add_clause (sym, map_op);
+	  /* For an 'allocatable', it's the user's responsibility to 'allocate'
+	     it, and create a device copy, so here, we handle it like the
+	     'link' case, and don't add it to the outer OpenACC 'data'
+	     construct.  */
+	  if (!sym->attr.allocatable)
+	    add_clause (sym, map_op);
 	}
     }
 }
 
 
 void
 finish_oacc_declare (gfc_namespace *ns, gfc_symbol *sym, bool block)
 {


Grüße
 Thomas
! { dg-do run }

module mod
  implicit none
  integer, parameter :: n = 40
  integer, allocatable :: ar(:,:,:)
  integer, parameter :: ar_rank = rank(ar)
  integer :: ar_rank_v
  integer :: ar_size_v
  integer, dimension(ar_rank) :: ar_extent_v
  integer, dimension(ar_rank) :: ar_shape_v
  !$omp declare target (ar, ar_rank_v, ar_size_v, ar_extent_v, ar_shape_v)
end module mod

subroutine ar_properties
  use mod
  implicit none
  !$omp declare target

  integer :: i

  ar_rank_v = rank(ar)
  ar_size_v = size(ar)
  do i = 1, ar_rank
     ar_extent_v(i) = size(ar, i)
  end do
  ar_shape_v = shape(ar)
end subroutine ar_properties

program main
  use mod
  implicit none

  allocate (ar(n,0:n+1,-1:n+2))
  !$omp target enter data map(alloc:ar)

  !$omp target map(from:ar_rank_v, ar_size_v, ar_extent_v, ar_shape_v)
  call ar_properties
  !$omp end target

  !$omp target update from(ar_rank_v, ar_size_v, ar_extent_v, ar_shape_v)

  print *, ar_rank_v
  if (ar_rank_v /= rank(ar)) error stop
  print *, ar_size_v
  if (ar_size_v /= size(ar)) error stop
  print *, ar_extent_v
  if (any (ar_extent_v /= shape(ar))) error stop
  print *, ar_shape_v
  if (any (ar_shape_v /= shape(ar))) error stop
end program main

Patch

From a55b64d3dbc7e6c9c649f7f3d23e72a0a8712ee0 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Tue, 4 Jun 2019 20:25:41 +0200
Subject: [PATCH] [PR90743] Fortran 'allocatable' with OpenACC data/OpenMP
 'target' 'map' clauses

Test what OpenMP 5.0 has to say on this topic.  And make OpenACC do the same.

	libgomp/
	PR fortran/90743
	* oacc-parallel.c (GOACC_parallel_keyed): Handle NULL case.
	* testsuite/libgomp.fortran/target-allocatable-1.f90: New file.
	* testsuite/libgomp.oacc-fortran/allocatable-1.f90: New file.
---
 libgomp/oacc-parallel.c                       |  9 ++-
 .../libgomp.fortran/target-allocatable-1.f90  |  8 +++
 .../libgomp.oacc-fortran/allocatable-1.f90    | 70 +++++++++++++++++++
 3 files changed, 84 insertions(+), 3 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.fortran/target-allocatable-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/allocatable-1.f90

diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index e56330f6226b..0c2cfa05a438 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -325,9 +325,12 @@  GOACC_parallel_keyed (int flags_m, void (*fn) (void *),
   
   devaddrs = gomp_alloca (sizeof (void *) * mapnum);
   for (i = 0; i < mapnum; i++)
-    devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
-			    + tgt->list[i].key->tgt_offset
-			    + tgt->list[i].offset);
+    if (tgt->list[i].key != NULL)
+      devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
+			      + tgt->list[i].key->tgt_offset
+			      + tgt->list[i].offset);
+    else
+      devaddrs[i] = NULL;
   if (aq == NULL)
     acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, dims,
 				tgt);
diff --git a/libgomp/testsuite/libgomp.fortran/target-allocatable-1.f90 b/libgomp/testsuite/libgomp.fortran/target-allocatable-1.f90
new file mode 100644
index 000000000000..9f0b7f0c3b53
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target-allocatable-1.f90
@@ -0,0 +1,8 @@ 
+! Test 'allocatable' with OpenMP 'target' 'map' clauses.
+
+! { dg-do run }
+! { dg-additional-options "-cpp" }
+! { dg-additional-options "-DACC_MEM_SHARED=0" { target offload_device_nonshared_as } }
+! { dg-additional-options "-DACC_MEM_SHARED=1" { target offload_device_shared_as } }
+
+#include "../libgomp.oacc-fortran/allocatable-1.f90"
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/allocatable-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/allocatable-1.f90
new file mode 100644
index 000000000000..0941f71f8c30
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/allocatable-1.f90
@@ -0,0 +1,70 @@ 
+! Test 'allocatable' with OpenACC data clauses.
+
+! This is also 'include'd from '../libgomp.fortran/target-allocatable-1.f90'.
+
+! { dg-do run }
+! { dg-additional-options "-cpp" }
+
+program main
+  implicit none
+  integer, allocatable :: a, b, c, d, e
+
+  allocate (a)
+  a = 11
+
+  b = 25 ! Implicit allocation.
+
+  c = 52 ! Implicit allocation.
+
+  !No 'allocate (d)' here.
+
+  !No 'allocate (e)' here.
+
+  !$omp target map(to: a) map(tofrom: b, c, d) map(from: e)
+  !$acc parallel copyin(a) copy(b, c, d) copyout(e)
+
+  if (.not. allocated (a)) stop 1
+  if (a .ne. 11) stop 2
+  a = 33
+ 
+  if (.not. allocated (b)) stop 3
+  if (b .ne. 25) stop 4
+ 
+  if (.not. allocated (c)) stop 5
+  if (c .ne. 52) stop 6
+  c = 10
+ 
+  if (allocated (d)) stop 7
+  d = 42 ! Implicit allocation, but on device only.
+  if (.not. allocated (d)) stop 8
+  deallocate (d) ! OpenMP requires must be "unallocated upon exit from the region".
+
+  if (allocated (e)) stop 9
+  e = 24 ! Implicit allocation, but on device only.
+  if (.not. allocated (e)) stop 10
+  deallocate (e) ! OpenMP requires must be "unallocated upon exit from the region".
+
+  !$acc end parallel
+  !$omp end target
+
+  if (.not. allocated (a)) stop 20
+#if ACC_MEM_SHARED
+  if (a .ne. 33) stop 21
+#else
+  if (a .ne. 11) stop 22
+#endif
+  deallocate (a)
+
+  if (.not. allocated (b)) stop 23
+  if (b .ne. 25) stop 24
+  deallocate (b)
+
+  if (.not. allocated (c)) stop 25
+  if (c .ne. 10) stop 26
+  deallocate (c)
+
+  if (allocated (d)) stop 27
+
+  if (allocated (e)) stop 28
+
+end program main
-- 
2.17.1