[07/13] OpenACC 2.6 deep copy: libgomp parts

Message ID 65540b92dff74db1f15af930f87f7096d03e7efe.1576648001.git.julian@codesourcery.com
State New
Headers show
Series
  • OpenACC 2.6 manual deep copy support
Related show

Commit Message

Julian Brown Dec. 18, 2019, 6:03 a.m.
This patch has been broken out of the "OpenACC 2.6 manual deep copy
support" patch, last posted here:

  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02376.html

This part contains the libgomp runtime support for the GOMP_MAP_ATTACH and
GOMP_MAP_DETACH mapping kinds (etc.), as introduced by the front-end
patches following in this series.

Tested alongside other patches in this series with offloading to
NVPTX. OK?

Thanks,

Julian

ChangeLog

	include/
	* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_4, GOMP_MAP_DEEP_COPY):
	Define.
	(gomp_map_kind): Add GOMP_MAP_ATTACH, GOMP_MAP_DETACH,
	GOMP_MAP_FORCE_DETACH.

	libgomp/
	* libgomp.h (struct target_var_desc): Add do_detach flag.
	* oacc-init.c (acc_shutdown_1): Free aux block if present.
	* oacc-mem.c (find_group_last): Add SIZES parameter. Support
	struct components.  Tidy up and add some new checks.
	(goacc_enter_data_internal): Update call to find_group_last.
	(goacc_exit_data_internal): Support detach operations and
	GOMP_MAP_STRUCT.
	(GOACC_enter_exit_data): Handle initial GOMP_MAP_STRUCT or
	GOMP_MAP_FORCE_PRESENT in finalization detection code.  Handle
	attach/detach in enter/exit data detection code.
	* target.c (gomp_map_vars_existing): Initialise do_detach field of
	tgt_var_desc.
	(gomp_map_vars_internal): Support attach.
	(gomp_unmap_vars_internal): Support detach.
---
 include/gomp-constants.h |  10 ++++
 libgomp/libgomp.h        |   2 +
 libgomp/oacc-mem.c       | 121 +++++++++++++++++++++++++++++++++------
 libgomp/target.c         |  51 ++++++++++++++++-
 4 files changed, 166 insertions(+), 18 deletions(-)

-- 
2.23.0

Comments

Thomas Schwinge Dec. 21, 2019, 11:01 p.m. | #1
Hi!

On 2019-12-17T22:03:47-0800, Julian Brown <julian@codesourcery.com> wrote:
> This part contains the libgomp runtime support for the GOMP_MAP_ATTACH and

> GOMP_MAP_DETACH mapping kinds (etc.), as introduced by the front-end

> patches following in this series.


This (r279625) regressed the same OpenMP 'omp declare target link'
functionality/test case that I previously discussed in
<http://mid.mail-archive.com/87pniuuhkj.fsf@euler.schwinge.homeip.net>,
and/or
<http://mid.mail-archive.com/87k18vu1zr.fsf@euler.schwinge.homeip.net>:

    PASS: libgomp.c/target-link-1.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c/target-link-1.c execution test

(With nvptx offloading configured, as mentioned before, this test case
doesn't even build -- <https://gcc.gnu.org/PR81689> -- so, yes, this is
clearly insufficient test coverage of the 'omp declare target link'
functionality, but still we shouldn't regress it.)

What's causing the regression is:

> --- a/libgomp/target.c

> +++ b/libgomp/target.c

> @@ -1247,10 +1281,12 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,


|  		k->aux = NULL;
|  		if (n && n->refcount == REFCOUNT_LINK)
|  		  {
|  		    /* Replace target address of the pointer with target address
|  		       of mapped object in the splay tree.  */
|  		    splay_tree_remove (mem_map, n);
|  		    k->aux
|  		      = gomp_malloc_cleared (sizeof (struct splay_tree_aux));
|  		    k->aux->link_key = n;
|  		  }
|  		size_t align = (size_t) 1 << (kind >> rshift);
|  		tgt->list[i].key = k;
|  		k->tgt = tgt;
|  		if (field_tgt_clear != FIELD_TGT_EMPTY)
|  		  {
|  		    k->tgt_offset = k->host_start - field_tgt_base
|  				    + field_tgt_offset;
|  		    if (i == field_tgt_clear)
|  		      field_tgt_clear = FIELD_TGT_EMPTY;
|  		  }
|  		else
|  		  {
|  		    tgt_size = (tgt_size + align - 1) & ~(align - 1);
|  		    k->tgt_offset = tgt_size;
|  		    tgt_size += k->host_end - k->host_start;
|  		  }
>  		tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);

>  		tgt->list[i].always_copy_from

>  		  = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);

> +		tgt->list[i].do_detach = false;

>  		tgt->list[i].offset = 0;

>  		tgt->list[i].length = k->host_end - k->host_start;

>  		k->refcount = 1;

>  		k->virtual_refcount = 0;

> +		k->aux = NULL;

>  		tgt->refcount++;

>  		array->left = NULL;

>  		array->right = NULL;


... that latter 'k->aux = NULL' assignment, which invalidates what the
'REFCOUNT_LINK' handling earlier set up.

I had intentionally left out this assignment in my "In
'libgomp/target.c', 'struct splay_tree_key_s', use 'struct
splay_tree_aux' for infrequently-used or API-specific data" patch,
<http://mid.mail-archive.com/87k16uykb7.fsf@euler.schwinge.homeip.net>,
and you also don't have that assignment in your r279620 "Use aux struct
in libgomp for infrequently-used/API-specific data" commit,
<http://mid.mail-archive.com/80e0dba326a4414fd2dbe8401dbd8d8f08445129.1576648001.git.julian@codesourcery.com>,
so curious why it now appears here -- hopefully just an oversight.

See attached "[OMP] Restore 'omp declare target link' handling";
committed to trunk in r279701.


Grüße
 Thomas
From dc669e8f45a5b1e61e746b6d2c6a23480fdd904f Mon Sep 17 00:00:00 2001
From: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>

Date: Sat, 21 Dec 2019 22:58:43 +0000
Subject: [PATCH] [OMP] Restore 'omp declare target link' handling

    PASS: libgomp.c/target-link-1.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c/target-link-1.c execution test

We need to revert one line of code change from r279625.

	libgomp/
	* target.c (gomp_map_vars_internal): Restore 'omp declare target
	link' handling.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@279701 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog | 5 +++++
 libgomp/target.c  | 1 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 81b9d6788a1..7bc7d41da42 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,8 @@
+2019-12-21  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* target.c (gomp_map_vars_internal): Restore 'omp declare target
+	link' handling.
+
 2019-12-19  Julian Brown  <julian@codesourcery.com>
 
 	* testsuite/libgomp.oacc-fortran/class-ptr-param.f95: New test.
diff --git a/libgomp/target.c b/libgomp/target.c
index 50a9c2b1df3..bf30716cd85 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1129,7 +1129,6 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		tgt->list[i].length = k->host_end - k->host_start;
 		k->refcount = 1;
 		k->virtual_refcount = 0;
-		k->aux = NULL;
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
-- 
2.17.1
Julian Brown Jan. 3, 2020, 12:24 p.m. | #2
Hi,

On Sun, 22 Dec 2019 00:01:10 +0100
Thomas Schwinge <thomas@codesourcery.com> wrote:

> I had intentionally left out this assignment in my "In

> 'libgomp/target.c', 'struct splay_tree_key_s', use 'struct

> splay_tree_aux' for infrequently-used or API-specific data" patch,

> <http://mid.mail-archive.com/87k16uykb7.fsf@euler.schwinge.homeip.net>,

> and you also don't have that assignment in your r279620 "Use aux

> struct in libgomp for infrequently-used/API-specific data" commit,

> <http://mid.mail-archive.com/80e0dba326a4414fd2dbe8401dbd8d8f08445129.1576648001.git.julian@codesourcery.com>,

> so curious why it now appears here -- hopefully just an oversight.


This was just an oversight (or a mismerge, perhaps). Thanks for fixing!

Cheers,

Julian
Thomas Schwinge May 20, 2020, 9:37 a.m. | #3
Hi Julian!

Moving this over, from the "Fix component mappings with derived types for
OpenACC" thread,
<http://mid.mail-archive.com/20200110014945.5643ace5@squid.athome>, where
you propose to change this 'GOMP_MAP_STRUCT' handling code:

On 2019-12-17T22:03:47-0800, Julian Brown <julian@codesourcery.com> wrote:
> --- a/libgomp/oacc-mem.c

> +++ b/libgomp/oacc-mem.c


> @@ -1075,6 +1119,39 @@ goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,

>             gomp_remove_var_async (acc_dev, n, aq);

>         }

>         break;

> +

> +     case GOMP_MAP_STRUCT:

> +       {

> +         int elems = sizes[i];

> +         for (int j = 1; j <= elems; j++)

> +           {

> +             struct splay_tree_key_s k;

> +             k.host_start = (uintptr_t) hostaddrs[i + j];

> +             k.host_end = k.host_start + sizes[i + j];

> +             splay_tree_key str;

> +             str = splay_tree_lookup (&acc_dev->mem_map, &k);

> +             if (str)

> +               {

> +                 if (finalize)

> +                   {

> +                     str->refcount -= str->virtual_refcount;

> +                     str->virtual_refcount = 0;

> +                   }

> +                 if (str->virtual_refcount > 0)

> +                   {

> +                     str->refcount--;

> +                     str->virtual_refcount--;

> +                   }

> +                 else if (str->refcount > 0)

> +                   str->refcount--;

> +                 if (str->refcount == 0)

> +                   gomp_remove_var_async (acc_dev, str, aq);

> +               }

> +           }

> +         i += elems;

> +       }

> +       break;

> +

>       default:

>         gomp_fatal (">>>> goacc_exit_data_internal UNHANDLED kind 0x%.2x",

>                         kind);


... into an "empty 'case GOMP_MAP_STRUCT:' as a no-op, so that we don't
run into 'default:' case 'goacc_exit_data_internal UNHANDLED kind'" (my
words/interpretation).

Further citing myself,
<http://mid.mail-archive.com/87ftbw9kqh.fsf@euler.schwinge.homeip.net>:

| Is my understanding correct that "fixed" GCC won't generate such
| 'GOMP_MAP_STRUCT' anymore (I have't studied in detail), and this empty
| 'case GOMP_MAP_STRUCT:' only remains in here for backwards compatibility?
| In this case, please add a comment to the code, stating this.

My guess was wrong: running a quick test, I do see that we still generate
'GOMP_MAP_STRUCT' for OpenACC unmap:

    --- libgomp/oacc-mem.c
    +++ libgomp/oacc-mem.c
    @@ -1163,6 +1165,7 @@ goacc_exit_data_internal
          break;

        case GOMP_MAP_STRUCT:
    +     assert (!"GOMP_MAP_STRUCT");
          break;

        default:
          gomp_fatal (">>>> goacc_exit_data_internal UNHANDLED kind 0x%.2x",
                          kind);

... regresses:

  - 'libgomp.oacc-c-c++-common/deep-copy-7.c'
  - 'libgomp.oacc-c-c++-common/deep-copy-8.c'
  - 'libgomp.oacc-fortran/classtypes-2.f95'
  - 'libgomp.oacc-fortran/deep-copy-4.f90'
  - 'libgomp.oacc-fortran/deep-copy-5.f90'
  - 'libgomp.oacc-fortran/deep-copy-6.f90'
  - 'libgomp.oacc-fortran/derivedtype-2.f95'

| Otherwise,
| please add a comment why "do nothing" is appropriate for
| 'GOMP_MAP_STRUCT'.

I suppose we still need to unmap the "'GOMP_MAP_STRUCT' components", but
can do that individually, outside of the 'GOMP_MAP_STRUCT' context.
That'd then also explain...

| In particular, for both scenarios, why we don't need
| to skip the following 'sizes[i]' mappings?

... this question.

But one step back.  Why generate 'GOMP_MAP_STRUCT' for unmap, if we then
just skip it in libgomp handling?  Cross checking, OpenMP
'libgomp/target.c:gomp_exit_data' also doesn't expect to see any
'GOMP_MAP_STRUCT'.

For example, 'libgomp.oacc-c-c++-common/deep-copy-7.c':

    #pragma acc exit data copyout(v.b[:n]) finalize
    #pragma acc exit data delete(v.a)

'deep-copy-7.c.004t.original':

    #pragma acc exit data finalize map(from:*v.b [len: (sizetype) n * 4]) map(attach_detach:v.b [bias: 0]);
    #pragma acc exit data map(release:v.a);

'deep-copy-7.c.005t.gimple':

    #pragma omp target oacc_enter_exit_data finalize map(struct:v [len: 1]) map(delete:v.b [len: 8]) map(force_from:*_15 [len: _14]) map(force_detach:v.b [bias: 0])
    #pragma omp target oacc_enter_exit_data map(struct:v [len: 1]) map(release:v.a [len: 4])

I haven't studied 'GOMP_MAP_STRUCT' in detail (so it may be easy to prove
me wrong), but for OpenACC 'exit data' etc., these 'map(struct:[...])'
seem "pointless" to me, given that in libgomp we (intend to) just skip
them?

Well, and quickly I find that's exactly what OpenMP 'target exit data' is
doing, and doing the same for OpenACC 'exit data':

    --- gcc/gimplify.c
    +++ gcc/gimplify.c
    @@ -10406,7 +10406,8 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
                }
            }
          else if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT
    -              && code == OMP_TARGET_EXIT_DATA)
    +              && (code == OMP_TARGET_EXIT_DATA
    +                  || code == OACC_EXIT_DATA))
            remove = true;
          else if (DECL_SIZE (decl)
                   && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST

..., 'deep-copy-7.c.005t.gimple' gets simplified as expected:

    -        #pragma omp target oacc_enter_exit_data finalize map(struct:v [len: 1]) map(delete:v.b [len: 8]) map(force_from:*_15 [len: _14]) map(force_detach:v.b [bias: 0])
    +        #pragma omp target oacc_enter_exit_data finalize map(delete:v.b [len: 8]) map(force_from:*_15 [len: _14]) map(force_detach:v.b [bias: 0])
    -        #pragma omp target oacc_enter_exit_data map(struct:v [len: 1]) map(release:v.a [len: 4])
    +        #pragma omp target oacc_enter_exit_data map(release:v.a [len: 4])

..., and all the "'assert' regressions" mentioned above again disappear,
and so 'GOMP_MAP_STRUCT' handling could be removed from
'libgomp/oacc-mem.c:goacc_exit_data_internal' (and 'default:'
'gomp_fatal' would then catch any such cases).

But of course, given that GCC 10.1 now does generate these
'GOMP_MAP_STRUCT's, we do have to support them in one way or another...


Grüße
 Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Thomas Schwinge May 20, 2020, 2:52 p.m. | #4
Hi!

On 2019-12-17T22:03:47-0800, Julian Brown <julian@codesourcery.com> wrote:
> --- a/libgomp/oacc-mem.c

> +++ b/libgomp/oacc-mem.c


>  static int

> -find_group_last (int pos, size_t mapnum, unsigned short *kinds)

> +find_group_last (int pos, size_t mapnum, size_t *sizes, unsigned short *kinds)

>  {

>    unsigned char kind0 = kinds[pos] & 0xff;

> -  int first_pos = pos, last_pos = pos;

> +  int first_pos = pos;

>

> -  if (kind0 == GOMP_MAP_TO_PSET)

> +  switch (kind0)

>      {

> +    case GOMP_MAP_TO_PSET:

>        while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER)

> -     last_pos = ++pos;

> +     pos++;

>        /* We expect at least one GOMP_MAP_POINTER after a GOMP_MAP_TO_PSET.  */

> -      assert (last_pos > first_pos);

> -    }

> -  else

> -    {

> +      assert (pos > first_pos);

> +      break;

> +

> +    case GOMP_MAP_STRUCT:

> +      pos += sizes[pos];

> +      break;

> +

> +    case GOMP_MAP_POINTER:

> +    case GOMP_MAP_ALWAYS_POINTER:

> +      /* These mappings are only expected after some other mapping.  If we

> +      see one by itself, something has gone wrong.  */

> +      gomp_fatal ("unexpected mapping");

> +      break;

> +

> +    default:

>        /* GOMP_MAP_ALWAYS_POINTER can only appear directly after some other

>        mapping.  */

> -      if (pos + 1 < mapnum

> -       && (kinds[pos + 1] & 0xff) == GOMP_MAP_ALWAYS_POINTER)

> -     return pos + 1;

> +      if (pos + 1 < mapnum)

> +     {

> +       unsigned char kind1 = kinds[pos + 1] & 0xff;

> +       if (kind1 == GOMP_MAP_ALWAYS_POINTER)

> +         return pos + 1;

> +     }

>

> -      /* We can have one or several GOMP_MAP_POINTER mappings after a to/from

> +      /* We can have zero or more GOMP_MAP_POINTER mappings after a to/from

>        (etc.) mapping.  */

>        while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER)

> -     last_pos = ++pos;

> +     pos++;

>      }

>

> -  return last_pos;

> +  return pos;

>  }


So this now causes grouped (!) mapping of all of 'GOMP_MAP_STRUCT', that
is, all its "members" at once.

This, I suppose, mandated the removal of (some of) the 'is_tgt_unmapped'
checking (unfortunately committed not here, but as part of r279621
"OpenACC reference count overhaul"), where we had unmapping code
(conceptually) similar to:

    bool is_tgt_unmapped = gomp_remove_var (acc_dev, n);
    assert (is_tgt_unmapped);

I'd introduced this a little bit earlier, finding this a simple yet
effective run-time, low-overhead consistency checking of (certain aspects
of) reference counting -- so just noting here that it's somewhat bad that
we can't have this anymore "just" because of 'GOMP_MAP_STRUCT'.  (Maybe
there is a way to get it back; that's for later?)

Anyway, the code changes were incomplete, consider:

    #include <assert.h>
    #include <openacc.h>

    struct s
    {
      char a;
      char b;
    };

    int main ()
    {
      struct s s;

    #pragma acc enter data create(s.a, s.b)
      assert (acc_is_present (&s.a, sizeof s.a));
      assert (acc_is_present (&s.b, sizeof s.b));

    #if 0
      // works
    # pragma acc exit data delete(s.a)
    # pragma acc exit data delete(s.b)
    #else
      acc_delete (&s.a, sizeof s.a); // fails
      acc_delete (&s.b, sizeof s.b);
    #endif
      assert (!acc_is_present (&s.a, sizeof s.a));
      assert (!acc_is_present (&s.b, sizeof s.b));

      return 0;
    }

The 'acc_delete' variant exercises a code path that still contains the
'is_tgt_unmapped' checking, and that triggers for 'acc_delete ([s.a])'
then, as 's.a' has been mapped in one group together with 's.b', which
remains mapped until 'acc_delete ([s.b])', and thus
"[...]/libgomp/oacc-mem.c:726: goacc_exit_datum: Assertion
`is_tgt_unmapped' failed".  Unless anybody has a better plan for this
whole topic, I'll commit the obvious (removal), later (next week).

    --- libgomp/oacc-mem.c
    +++ libgomp/oacc-mem.c
    @@ -724,7 +724,11 @@ goacc_exit_datum (void *h, size_t s, unsigned short kind, int async)
           else
        {
          bool is_tgt_unmapped = gomp_remove_var (acc_dev, n);
    +#if 0
          assert (is_tgt_unmapped);
    +#else
    +     (void) is_tgt_unmapped;
    +#endif
        }
         }


Grüße
 Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Julian Brown May 20, 2020, 7:11 p.m. | #5
On Wed, 20 May 2020 16:52:02 +0200
Thomas Schwinge <thomas@codesourcery.com> wrote:

> Hi!

> 

> On 2019-12-17T22:03:47-0800, Julian Brown <julian@codesourcery.com>

> wrote:

> > --- a/libgomp/oacc-mem.c

> > +++ b/libgomp/oacc-mem.c  

> 

> >  static int

> > -find_group_last (int pos, size_t mapnum, unsigned short *kinds)

> > +find_group_last (int pos, size_t mapnum, size_t *sizes, unsigned

> > short *kinds) {

> >    unsigned char kind0 = kinds[pos] & 0xff;

> > -  int first_pos = pos, last_pos = pos;

> > +  int first_pos = pos;

> >  

> > -  if (kind0 == GOMP_MAP_TO_PSET)

> > +  switch (kind0)

> >      {

> > +    case GOMP_MAP_TO_PSET:

> >        while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) ==

> > GOMP_MAP_POINTER)

> > -	last_pos = ++pos;

> > +	pos++;

> >        /* We expect at least one GOMP_MAP_POINTER after a

> > GOMP_MAP_TO_PSET.  */

> > -      assert (last_pos > first_pos);

> > -    }

> > -  else

> > -    {

> > +      assert (pos > first_pos);

> > +      break;

> > +

> > +    case GOMP_MAP_STRUCT:

> > +      pos += sizes[pos];

> > +      break;

> > +

> > +    case GOMP_MAP_POINTER:

> > +    case GOMP_MAP_ALWAYS_POINTER:

> > +      /* These mappings are only expected after some other

> > mapping.  If we

> > +	 see one by itself, something has gone wrong.  */

> > +      gomp_fatal ("unexpected mapping");

> > +      break;

> > +

> > +    default:

> >        /* GOMP_MAP_ALWAYS_POINTER can only appear directly after

> > some other mapping.  */

> > -      if (pos + 1 < mapnum

> > -	  && (kinds[pos + 1] & 0xff) == GOMP_MAP_ALWAYS_POINTER)

> > -	return pos + 1;

> > +      if (pos + 1 < mapnum)

> > +	{

> > +	  unsigned char kind1 = kinds[pos + 1] & 0xff;

> > +	  if (kind1 == GOMP_MAP_ALWAYS_POINTER)

> > +	    return pos + 1;

> > +	}

> >  

> > -      /* We can have one or several GOMP_MAP_POINTER mappings

> > after a to/from

> > +      /* We can have zero or more GOMP_MAP_POINTER mappings after

> > a to/from (etc.) mapping.  */

> >        while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) ==

> > GOMP_MAP_POINTER)

> > -	last_pos = ++pos;

> > +	pos++;

> >      }

> >  

> > -  return last_pos;

> > +  return pos;

> >  }  

> 

> So this now causes grouped (!) mapping of all of 'GOMP_MAP_STRUCT',

> that is, all its "members" at once.

> 

> This, I suppose, mandated the removal of (some of) the

> 'is_tgt_unmapped' checking (unfortunately committed not here, but as

> part of r279621 "OpenACC reference count overhaul"), where we had

> unmapping code (conceptually) similar to:

> 

>     bool is_tgt_unmapped = gomp_remove_var (acc_dev, n);

>     assert (is_tgt_unmapped);

> 

> I'd introduced this a little bit earlier, finding this a simple yet

> effective run-time, low-overhead consistency checking of (certain

> aspects of) reference counting -- so just noting here that it's

> somewhat bad that we can't have this anymore "just" because of

> 'GOMP_MAP_STRUCT'.  (Maybe there is a way to get it back; that's for

> later?)


I'm actually looking at this now as part of revisiting the refcounting
work. I'm seeing what I can come up with in terms of being able to keep
the runtime test (and fixing the other part you mentioned).

Thanks,

Julian
Thomas Schwinge June 4, 2020, 6:53 p.m. | #6
Hi!

On 2019-12-17T22:03:47-0800, Julian Brown <julian@codesourcery.com> wrote:
> This part contains the libgomp runtime support for the GOMP_MAP_ATTACH and

> GOMP_MAP_DETACH mapping kinds (etc.)


> --- a/libgomp/oacc-mem.c

> +++ b/libgomp/oacc-mem.c


> @@ -1075,6 +1119,39 @@ goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,


> +     case GOMP_MAP_STRUCT:

> +       {

> +         int elems = sizes[i];

> +         for (int j = 1; j <= elems; j++)

> +           {

> +             struct splay_tree_key_s k;

> +             k.host_start = (uintptr_t) hostaddrs[i + j];

> +             k.host_end = k.host_start + sizes[i + j];

> +             splay_tree_key str;

> +             str = splay_tree_lookup (&acc_dev->mem_map, &k);

> +             if (str)

> +               {

> +                 if (finalize)

> +                   {

> +                     str->refcount -= str->virtual_refcount;

> +                     str->virtual_refcount = 0;

> +                   }

> +                 if (str->virtual_refcount > 0)

> +                   {

> +                     str->refcount--;

> +                     str->virtual_refcount--;

> +                   }

> +                 else if (str->refcount > 0)

> +                   str->refcount--;

> +                 if (str->refcount == 0)

> +                   gomp_remove_var_async (acc_dev, str, aq);

> +               }

> +           }

> +         i += elems;

> +       }

> +       break;


I'm aware that this 'GOMP_MAP_STRUCT' special handling shouldn't have
been there to begin with, and is now scheduled to go away (yay!), but
while testing a few things while reviewing (reverse-engineering the
intentions of) these fix-up patches, I quickly ran into cases where
OpenACC code that I understand to be valid failed, exactly here.  I've
pushed "[OpenACC 'exit data'] Evaluate 'finalize' individually for
'GOMP_MAP_STRUCT' entries" to master branch in commit
a02f1adbfe619ab19cf142438e0a02950d3594da, and releases/gcc-10 branch in
commit 5a1b479aedd83d0362f870f480a24a011e703de4, and then "[OpenACC 'exit
data'] Evaluate 'copyfrom' individually for 'GOMP_MAP_STRUCT' entries" to
master branch in commit 2c838a3e4ea06c69c856d074ae5b0400e08ae3c2, and
releases/gcc-10 branch in commit
4664ca1bc40318dbe60591cfe6d31c3d36d439c3, see attached.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
From a02f1adbfe619ab19cf142438e0a02950d3594da Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>

Date: Thu, 4 Jun 2020 16:01:07 +0200
Subject: [PATCH] [OpenACC 'exit data'] Evaluate 'finalize' individually for
 'GOMP_MAP_STRUCT' entries

Currently, we don't at all evaluate 'finalize' for 'GOMP_MAP_STRUCT' entries.
Fix this by copying/adapting the corresponding non-'GOMP_MAP_STRUCT' code.

	libgomp/
	* oacc-mem.c (goacc_exit_data_internal) <GOMP_MAP_STRUCT>:
	Evaluate 'finalize' individually for each entry.
	* testsuite/libgomp.oacc-c-c++-common/struct-1.c: New file.
	* testsuite/libgomp.oacc-c-c++-common/struct-refcount-1.c: Remove
	file.
---
 libgomp/oacc-mem.c                            |  10 ++
 .../libgomp.oacc-c-c++-common/struct-1.c      | 146 ++++++++++++++++++
 .../struct-refcount-1.c                       |  47 ------
 3 files changed, 156 insertions(+), 47 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c
 delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/struct-refcount-1.c

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index b7c85cf5976f..a34f4cf0e918 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -1184,6 +1184,16 @@ goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	    int elems = sizes[i];
 	    for (int j = 1; j <= elems; j++)
 	      {
+		assert (i + j < mapnum);
+
+		kind = kinds[i + j] & 0xff;
+
+		finalize = false;
+		if (kind == GOMP_MAP_FORCE_FROM
+		    || kind == GOMP_MAP_DELETE
+		    || kind == GOMP_MAP_FORCE_DETACH)
+		  finalize = true;
+
 		struct splay_tree_key_s k;
 		k.host_start = (uintptr_t) hostaddrs[i + j];
 		k.host_end = k.host_start + sizes[i + j];
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c
new file mode 100644
index 000000000000..285be84f244b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c
@@ -0,0 +1,146 @@
+/* Test dynamic refcount of separate structure members.  */
+
+#include <assert.h>
+#include <stdbool.h>
+#include <openacc.h>
+
+struct s
+{
+  signed char a;
+  float b;
+};
+
+static void test(unsigned variant)
+{
+  struct s s;
+
+#pragma acc enter data create(s.a, s.b)
+  assert(acc_is_present(&s.a, sizeof s.a));
+  assert(acc_is_present(&s.b, sizeof s.b));
+
+  if (variant & 4)
+    {
+      if (variant & 8)
+	{
+#pragma acc enter data create(s.b)
+	}
+      else
+	acc_create(&s.b, sizeof s.b);
+      assert(acc_is_present(&s.a, sizeof s.a));
+      assert(acc_is_present(&s.b, sizeof s.b));
+
+      if (variant & 16)
+	{
+#pragma acc enter data create(s.a)
+	}
+      else
+	acc_create(&s.a, sizeof s.a);
+      assert(acc_is_present(&s.a, sizeof s.a));
+      assert(acc_is_present(&s.b, sizeof s.b));
+
+      if (variant & 32)
+	{
+#pragma acc enter data create(s.a)
+	  acc_create(&s.b, sizeof s.b);
+#pragma acc enter data create(s.b)
+#pragma acc enter data create(s.b)
+	  acc_create(&s.a, sizeof s.a);
+	  acc_create(&s.a, sizeof s.a);
+	  acc_create(&s.a, sizeof s.a);
+	}
+      assert(acc_is_present(&s.a, sizeof s.a));
+      assert(acc_is_present(&s.b, sizeof s.b));
+    }
+
+#pragma acc parallel \
+  copy(s.a, s.b)
+  {
+  }
+
+  if (variant & 32)
+    {
+      if (variant & 1)
+	{
+#pragma acc exit data delete(s.a) finalize
+	}
+      else
+	acc_delete_finalize(&s.a, sizeof s.a);
+    }
+  else
+    {
+      if (variant & 1)
+	{
+#pragma acc exit data delete(s.a)
+	}
+      else
+	acc_delete(&s.a, sizeof s.a);
+      if (variant & 4)
+	{
+	  assert(acc_is_present(&s.a, sizeof s.a));
+	  assert(acc_is_present(&s.b, sizeof s.b));
+	  if (variant & 1)
+	    {
+#pragma acc exit data delete(s.a)
+	    }
+	  else
+	    acc_delete(&s.a, sizeof s.a);
+	}
+    }
+#if ACC_MEM_SHARED
+  assert(acc_is_present(&s.a, sizeof s.a));
+  assert(acc_is_present(&s.b, sizeof s.b));
+#else
+  assert(!acc_is_present(&s.a, sizeof s.a));
+  assert(acc_is_present(&s.b, sizeof s.b));
+#endif
+
+  if (variant & 32)
+    {
+      if (variant & 2)
+	{
+#pragma acc exit data delete(s.b) finalize
+	}
+      else
+	acc_delete_finalize(&s.b, sizeof s.b);
+    }
+  else
+    {
+      if (variant & 2)
+	{
+#pragma acc exit data delete(s.b)
+	}
+      else
+	acc_delete(&s.b, sizeof s.b);
+      if (variant & 4)
+	{
+#if ACC_MEM_SHARED
+	  assert(acc_is_present(&s.a, sizeof s.a));
+	  assert(acc_is_present(&s.b, sizeof s.b));
+#else
+	  assert(!acc_is_present(&s.a, sizeof s.a));
+	  assert(acc_is_present(&s.b, sizeof s.b));
+#endif
+	  if (variant & 2)
+	    {
+#pragma acc exit data delete(s.b)
+	    }
+	  else
+	    acc_delete(&s.b, sizeof s.b);
+	}
+    }
+#if ACC_MEM_SHARED
+  assert(acc_is_present(&s.a, sizeof s.a));
+  assert(acc_is_present(&s.b, sizeof s.b));
+#else
+  assert(!acc_is_present(&s.a, sizeof s.a));
+  assert(!acc_is_present(&s.b, sizeof s.b));
+#endif
+}
+
+int main()
+{
+  for (unsigned variant = 0; variant < 64; ++variant)
+    test(variant);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-refcount-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-refcount-1.c
deleted file mode 100644
index bde5890d6676..000000000000
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-refcount-1.c
+++ /dev/null
@@ -1,47 +0,0 @@
-/* Test dynamic unmapping of separate structure members.  */
-
-/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */
-
-#include <assert.h>
-#include <stdbool.h>
-#include <openacc.h>
-
-struct s
-{
-  char a;
-  float b;
-};
-
-void test (bool use_directives)
-{
-  struct s s;
-
-#pragma acc enter data create(s.a, s.b)
-  assert (acc_is_present (&s.a, sizeof s.a));
-  assert (acc_is_present (&s.b, sizeof s.b));
-
-  if (use_directives)
-    {
-#pragma acc exit data delete(s.a)
-    }
-  else
-    acc_delete (&s.a, sizeof s.a);
-  assert (!acc_is_present (&s.a, sizeof s.a));
-  assert (acc_is_present (&s.b, sizeof s.b));
-  if (use_directives)
-    {
-#pragma acc exit data delete(s.b)
-    }
-  else
-    acc_delete (&s.b, sizeof s.b);
-  assert (!acc_is_present (&s.a, sizeof s.a));
-  assert (!acc_is_present (&s.b, sizeof s.b));
-}
-
-int main ()
-{
-  test (true);
-  test (false);
-
-  return 0;
-}
-- 
2.26.2
From 5a1b479aedd83d0362f870f480a24a011e703de4 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>

Date: Thu, 4 Jun 2020 16:01:07 +0200
Subject: [PATCH] [OpenACC 'exit data'] Evaluate 'finalize' individually for
 'GOMP_MAP_STRUCT' entries

Currently, we don't at all evaluate 'finalize' for 'GOMP_MAP_STRUCT' entries.
Fix this by copying/adapting the corresponding non-'GOMP_MAP_STRUCT' code.

	libgomp/
	* oacc-mem.c (goacc_exit_data_internal) <GOMP_MAP_STRUCT>:
	Evaluate 'finalize' individually for each entry.
	* testsuite/libgomp.oacc-c-c++-common/struct-1.c: New file.
	* testsuite/libgomp.oacc-c-c++-common/struct-refcount-1.c: Remove
	file.

(cherry picked from commit a02f1adbfe619ab19cf142438e0a02950d3594da)
---
 libgomp/oacc-mem.c                            |  10 ++
 .../libgomp.oacc-c-c++-common/struct-1.c      | 146 ++++++++++++++++++
 .../struct-refcount-1.c                       |  47 ------
 3 files changed, 156 insertions(+), 47 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c
 delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/struct-refcount-1.c

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index b7c85cf5976f..a34f4cf0e918 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -1184,6 +1184,16 @@ goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	    int elems = sizes[i];
 	    for (int j = 1; j <= elems; j++)
 	      {
+		assert (i + j < mapnum);
+
+		kind = kinds[i + j] & 0xff;
+
+		finalize = false;
+		if (kind == GOMP_MAP_FORCE_FROM
+		    || kind == GOMP_MAP_DELETE
+		    || kind == GOMP_MAP_FORCE_DETACH)
+		  finalize = true;
+
 		struct splay_tree_key_s k;
 		k.host_start = (uintptr_t) hostaddrs[i + j];
 		k.host_end = k.host_start + sizes[i + j];
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c
new file mode 100644
index 000000000000..285be84f244b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c
@@ -0,0 +1,146 @@
+/* Test dynamic refcount of separate structure members.  */
+
+#include <assert.h>
+#include <stdbool.h>
+#include <openacc.h>
+
+struct s
+{
+  signed char a;
+  float b;
+};
+
+static void test(unsigned variant)
+{
+  struct s s;
+
+#pragma acc enter data create(s.a, s.b)
+  assert(acc_is_present(&s.a, sizeof s.a));
+  assert(acc_is_present(&s.b, sizeof s.b));
+
+  if (variant & 4)
+    {
+      if (variant & 8)
+	{
+#pragma acc enter data create(s.b)
+	}
+      else
+	acc_create(&s.b, sizeof s.b);
+      assert(acc_is_present(&s.a, sizeof s.a));
+      assert(acc_is_present(&s.b, sizeof s.b));
+
+      if (variant & 16)
+	{
+#pragma acc enter data create(s.a)
+	}
+      else
+	acc_create(&s.a, sizeof s.a);
+      assert(acc_is_present(&s.a, sizeof s.a));
+      assert(acc_is_present(&s.b, sizeof s.b));
+
+      if (variant & 32)
+	{
+#pragma acc enter data create(s.a)
+	  acc_create(&s.b, sizeof s.b);
+#pragma acc enter data create(s.b)
+#pragma acc enter data create(s.b)
+	  acc_create(&s.a, sizeof s.a);
+	  acc_create(&s.a, sizeof s.a);
+	  acc_create(&s.a, sizeof s.a);
+	}
+      assert(acc_is_present(&s.a, sizeof s.a));
+      assert(acc_is_present(&s.b, sizeof s.b));
+    }
+
+#pragma acc parallel \
+  copy(s.a, s.b)
+  {
+  }
+
+  if (variant & 32)
+    {
+      if (variant & 1)
+	{
+#pragma acc exit data delete(s.a) finalize
+	}
+      else
+	acc_delete_finalize(&s.a, sizeof s.a);
+    }
+  else
+    {
+      if (variant & 1)
+	{
+#pragma acc exit data delete(s.a)
+	}
+      else
+	acc_delete(&s.a, sizeof s.a);
+      if (variant & 4)
+	{
+	  assert(acc_is_present(&s.a, sizeof s.a));
+	  assert(acc_is_present(&s.b, sizeof s.b));
+	  if (variant & 1)
+	    {
+#pragma acc exit data delete(s.a)
+	    }
+	  else
+	    acc_delete(&s.a, sizeof s.a);
+	}
+    }
+#if ACC_MEM_SHARED
+  assert(acc_is_present(&s.a, sizeof s.a));
+  assert(acc_is_present(&s.b, sizeof s.b));
+#else
+  assert(!acc_is_present(&s.a, sizeof s.a));
+  assert(acc_is_present(&s.b, sizeof s.b));
+#endif
+
+  if (variant & 32)
+    {
+      if (variant & 2)
+	{
+#pragma acc exit data delete(s.b) finalize
+	}
+      else
+	acc_delete_finalize(&s.b, sizeof s.b);
+    }
+  else
+    {
+      if (variant & 2)
+	{
+#pragma acc exit data delete(s.b)
+	}
+      else
+	acc_delete(&s.b, sizeof s.b);
+      if (variant & 4)
+	{
+#if ACC_MEM_SHARED
+	  assert(acc_is_present(&s.a, sizeof s.a));
+	  assert(acc_is_present(&s.b, sizeof s.b));
+#else
+	  assert(!acc_is_present(&s.a, sizeof s.a));
+	  assert(acc_is_present(&s.b, sizeof s.b));
+#endif
+	  if (variant & 2)
+	    {
+#pragma acc exit data delete(s.b)
+	    }
+	  else
+	    acc_delete(&s.b, sizeof s.b);
+	}
+    }
+#if ACC_MEM_SHARED
+  assert(acc_is_present(&s.a, sizeof s.a));
+  assert(acc_is_present(&s.b, sizeof s.b));
+#else
+  assert(!acc_is_present(&s.a, sizeof s.a));
+  assert(!acc_is_present(&s.b, sizeof s.b));
+#endif
+}
+
+int main()
+{
+  for (unsigned variant = 0; variant < 64; ++variant)
+    test(variant);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-refcount-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-refcount-1.c
deleted file mode 100644
index bde5890d6676..000000000000
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-refcount-1.c
+++ /dev/null
@@ -1,47 +0,0 @@
-/* Test dynamic unmapping of separate structure members.  */
-
-/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */
-
-#include <assert.h>
-#include <stdbool.h>
-#include <openacc.h>
-
-struct s
-{
-  char a;
-  float b;
-};
-
-void test (bool use_directives)
-{
-  struct s s;
-
-#pragma acc enter data create(s.a, s.b)
-  assert (acc_is_present (&s.a, sizeof s.a));
-  assert (acc_is_present (&s.b, sizeof s.b));
-
-  if (use_directives)
-    {
-#pragma acc exit data delete(s.a)
-    }
-  else
-    acc_delete (&s.a, sizeof s.a);
-  assert (!acc_is_present (&s.a, sizeof s.a));
-  assert (acc_is_present (&s.b, sizeof s.b));
-  if (use_directives)
-    {
-#pragma acc exit data delete(s.b)
-    }
-  else
-    acc_delete (&s.b, sizeof s.b);
-  assert (!acc_is_present (&s.a, sizeof s.a));
-  assert (!acc_is_present (&s.b, sizeof s.b));
-}
-
-int main ()
-{
-  test (true);
-  test (false);
-
-  return 0;
-}
-- 
2.26.2
From 2c838a3e4ea06c69c856d074ae5b0400e08ae3c2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>

Date: Thu, 4 Jun 2020 16:13:35 +0200
Subject: [PATCH] [OpenACC 'exit data'] Evaluate 'copyfrom' individually for
 'GOMP_MAP_STRUCT' entries

Currently, we don't at all evaluate 'copyfrom' for 'GOMP_MAP_STRUCT' entries.
Fix this by copying/adapting the corresponding non-'GOMP_MAP_STRUCT' code.

	libgomp/
	* oacc-mem.c (goacc_exit_data_internal) <GOMP_MAP_STRUCT>:
	Evaluate 'copyfrom' individually for each entry.
	* testsuite/libgomp.oacc-c-c++-common/struct-1.c: Update.
---
 libgomp/oacc-mem.c                            | 16 ++++
 .../libgomp.oacc-c-c++-common/struct-1.c      | 93 +++++++++++++------
 2 files changed, 83 insertions(+), 26 deletions(-)

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index a34f4cf0e918..11419e692aa2 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -1194,6 +1194,12 @@ goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 		    || kind == GOMP_MAP_FORCE_DETACH)
 		  finalize = true;
 
+		copyfrom = false;
+		if (kind == GOMP_MAP_FROM
+		    || kind == GOMP_MAP_FORCE_FROM
+		    || kind == GOMP_MAP_ALWAYS_FROM)
+		  copyfrom = true;
+
 		struct splay_tree_key_s k;
 		k.host_start = (uintptr_t) hostaddrs[i + j];
 		k.host_end = k.host_start + sizes[i + j];
@@ -1216,6 +1222,16 @@ goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 		    else if (str->refcount > 0
 			     && str->refcount != REFCOUNT_INFINITY)
 		      str->refcount--;
+
+		    if (copyfrom
+			&& (kind != GOMP_MAP_FROM || str->refcount == 0))
+		      gomp_copy_dev2host (acc_dev, aq, (void *) k.host_start,
+					  (void *) (str->tgt->tgt_start
+						    + str->tgt_offset
+						    + k.host_start
+						    - str->host_start),
+					  k.host_end - k.host_start);
+
 		    if (str->refcount == 0)
 		      {
 			if (aq)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c
index 285be84f244b..543aaa153064 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c
@@ -1,4 +1,4 @@
-/* Test dynamic refcount of separate structure members.  */
+/* Test dynamic refcount and copy behavior of separate structure members.  */
 
 #include <assert.h>
 #include <stdbool.h>
@@ -12,41 +12,45 @@ struct s
 
 static void test(unsigned variant)
 {
-  struct s s;
+  struct s s = { .a = 73, .b = -22 };
 
-#pragma acc enter data create(s.a, s.b)
+#pragma acc enter data copyin(s.a, s.b)
   assert(acc_is_present(&s.a, sizeof s.a));
   assert(acc_is_present(&s.b, sizeof s.b));
 
+  /* To verify that any following 'copyin' doesn't 'copyin' again.  */
+  s.a = -s.a;
+  s.b = -s.b;
+
   if (variant & 4)
     {
       if (variant & 8)
 	{
-#pragma acc enter data create(s.b)
+#pragma acc enter data copyin(s.b)
 	}
       else
-	acc_create(&s.b, sizeof s.b);
+	acc_copyin(&s.b, sizeof s.b);
       assert(acc_is_present(&s.a, sizeof s.a));
       assert(acc_is_present(&s.b, sizeof s.b));
 
       if (variant & 16)
 	{
-#pragma acc enter data create(s.a)
+#pragma acc enter data copyin(s.a)
 	}
       else
-	acc_create(&s.a, sizeof s.a);
+	acc_copyin(&s.a, sizeof s.a);
       assert(acc_is_present(&s.a, sizeof s.a));
       assert(acc_is_present(&s.b, sizeof s.b));
 
       if (variant & 32)
 	{
-#pragma acc enter data create(s.a)
-	  acc_create(&s.b, sizeof s.b);
-#pragma acc enter data create(s.b)
-#pragma acc enter data create(s.b)
-	  acc_create(&s.a, sizeof s.a);
-	  acc_create(&s.a, sizeof s.a);
-	  acc_create(&s.a, sizeof s.a);
+#pragma acc enter data copyin(s.a)
+	  acc_copyin(&s.b, sizeof s.b);
+#pragma acc enter data copyin(s.b)
+#pragma acc enter data copyin(s.b)
+	  acc_copyin(&s.a, sizeof s.a);
+	  acc_copyin(&s.a, sizeof s.a);
+	  acc_copyin(&s.a, sizeof s.a);
 	}
       assert(acc_is_present(&s.a, sizeof s.a));
       assert(acc_is_present(&s.b, sizeof s.b));
@@ -55,85 +59,122 @@ static void test(unsigned variant)
 #pragma acc parallel \
   copy(s.a, s.b)
   {
+#if ACC_MEM_SHARED
+    if (s.a++ != -73)
+      __builtin_abort();
+    if (s.b-- != 22)
+      __builtin_abort();
+#else
+    if (s.a++ != 73)
+      __builtin_abort();
+    if (s.b-- != -22)
+      __builtin_abort();
+#endif
   }
+#if ACC_MEM_SHARED
+  assert(s.a == -72);
+  assert(s.b == 21);
+#else
+  assert(s.a == -73);
+  assert(s.b == 22);
+#endif
 
   if (variant & 32)
     {
       if (variant & 1)
 	{
-#pragma acc exit data delete(s.a) finalize
+#pragma acc exit data copyout(s.a) finalize
 	}
       else
-	acc_delete_finalize(&s.a, sizeof s.a);
+	acc_copyout_finalize(&s.a, sizeof s.a);
     }
   else
     {
       if (variant & 1)
 	{
-#pragma acc exit data delete(s.a)
+#pragma acc exit data copyout(s.a)
 	}
       else
-	acc_delete(&s.a, sizeof s.a);
+	acc_copyout(&s.a, sizeof s.a);
       if (variant & 4)
 	{
 	  assert(acc_is_present(&s.a, sizeof s.a));
 	  assert(acc_is_present(&s.b, sizeof s.b));
+#if ACC_MEM_SHARED
+	  assert(s.a == -72);
+	  assert(s.b == 21);
+#else
+	  assert(s.a == -73);
+	  assert(s.b == 22);
+#endif
 	  if (variant & 1)
 	    {
-#pragma acc exit data delete(s.a)
+#pragma acc exit data copyout(s.a)
 	    }
 	  else
-	    acc_delete(&s.a, sizeof s.a);
+	    acc_copyout(&s.a, sizeof s.a);
 	}
     }
 #if ACC_MEM_SHARED
   assert(acc_is_present(&s.a, sizeof s.a));
   assert(acc_is_present(&s.b, sizeof s.b));
+  assert(s.a == -72);
+  assert(s.b == 21);
 #else
   assert(!acc_is_present(&s.a, sizeof s.a));
   assert(acc_is_present(&s.b, sizeof s.b));
+  assert(s.a == 74);
+  assert(s.b == 22);
 #endif
 
   if (variant & 32)
     {
       if (variant & 2)
 	{
-#pragma acc exit data delete(s.b) finalize
+#pragma acc exit data copyout(s.b) finalize
 	}
       else
-	acc_delete_finalize(&s.b, sizeof s.b);
+	acc_copyout_finalize(&s.b, sizeof s.b);
     }
   else
     {
       if (variant & 2)
 	{
-#pragma acc exit data delete(s.b)
+#pragma acc exit data copyout(s.b)
 	}
       else
-	acc_delete(&s.b, sizeof s.b);
+	acc_copyout(&s.b, sizeof s.b);
       if (variant & 4)
 	{
 #if ACC_MEM_SHARED
 	  assert(acc_is_present(&s.a, sizeof s.a));
 	  assert(acc_is_present(&s.b, sizeof s.b));
+	  assert(s.a == -72);
+	  assert(s.b == 21);
 #else
 	  assert(!acc_is_present(&s.a, sizeof s.a));
 	  assert(acc_is_present(&s.b, sizeof s.b));
+	  assert(s.a == 74);
+	  assert(s.b == 22);
 #endif
 	  if (variant & 2)
 	    {
-#pragma acc exit data delete(s.b)
+#pragma acc exit data copyout(s.b)
 	    }
 	  else
-	    acc_delete(&s.b, sizeof s.b);
+	    acc_copyout(&s.b, sizeof s.b);
 	}
     }
 #if ACC_MEM_SHARED
   assert(acc_is_present(&s.a, sizeof s.a));
   assert(acc_is_present(&s.b, sizeof s.b));
+  assert(s.a == -72);
+  assert(s.b == 21);
 #else
   assert(!acc_is_present(&s.a, sizeof s.a));
   assert(!acc_is_present(&s.b, sizeof s.b));
+  assert(s.a == 74);
+  assert(s.b == -23);
 #endif
 }
 
-- 
2.26.2
From 4664ca1bc40318dbe60591cfe6d31c3d36d439c3 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>

Date: Thu, 4 Jun 2020 16:13:35 +0200
Subject: [PATCH] [OpenACC 'exit data'] Evaluate 'copyfrom' individually for
 'GOMP_MAP_STRUCT' entries

Currently, we don't at all evaluate 'copyfrom' for 'GOMP_MAP_STRUCT' entries.
Fix this by copying/adapting the corresponding non-'GOMP_MAP_STRUCT' code.

	libgomp/
	* oacc-mem.c (goacc_exit_data_internal) <GOMP_MAP_STRUCT>:
	Evaluate 'copyfrom' individually for each entry.
	* testsuite/libgomp.oacc-c-c++-common/struct-1.c: Update.

(cherry picked from commit 2c838a3e4ea06c69c856d074ae5b0400e08ae3c2)
---
 libgomp/oacc-mem.c                            | 16 ++++
 .../libgomp.oacc-c-c++-common/struct-1.c      | 93 +++++++++++++------
 2 files changed, 83 insertions(+), 26 deletions(-)

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index a34f4cf0e918..11419e692aa2 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -1194,6 +1194,12 @@ goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 		    || kind == GOMP_MAP_FORCE_DETACH)
 		  finalize = true;
 
+		copyfrom = false;
+		if (kind == GOMP_MAP_FROM
+		    || kind == GOMP_MAP_FORCE_FROM
+		    || kind == GOMP_MAP_ALWAYS_FROM)
+		  copyfrom = true;
+
 		struct splay_tree_key_s k;
 		k.host_start = (uintptr_t) hostaddrs[i + j];
 		k.host_end = k.host_start + sizes[i + j];
@@ -1216,6 +1222,16 @@ goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 		    else if (str->refcount > 0
 			     && str->refcount != REFCOUNT_INFINITY)
 		      str->refcount--;
+
+		    if (copyfrom
+			&& (kind != GOMP_MAP_FROM || str->refcount == 0))
+		      gomp_copy_dev2host (acc_dev, aq, (void *) k.host_start,
+					  (void *) (str->tgt->tgt_start
+						    + str->tgt_offset
+						    + k.host_start
+						    - str->host_start),
+					  k.host_end - k.host_start);
+
 		    if (str->refcount == 0)
 		      {
 			if (aq)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c
index 285be84f244b..543aaa153064 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-1.c
@@ -1,4 +1,4 @@
-/* Test dynamic refcount of separate structure members.  */
+/* Test dynamic refcount and copy behavior of separate structure members.  */
 
 #include <assert.h>
 #include <stdbool.h>
@@ -12,41 +12,45 @@ struct s
 
 static void test(unsigned variant)
 {
-  struct s s;
+  struct s s = { .a = 73, .b = -22 };
 
-#pragma acc enter data create(s.a, s.b)
+#pragma acc enter data copyin(s.a, s.b)
   assert(acc_is_present(&s.a, sizeof s.a));
   assert(acc_is_present(&s.b, sizeof s.b));
 
+  /* To verify that any following 'copyin' doesn't 'copyin' again.  */
+  s.a = -s.a;
+  s.b = -s.b;
+
   if (variant & 4)
     {
       if (variant & 8)
 	{
-#pragma acc enter data create(s.b)
+#pragma acc enter data copyin(s.b)
 	}
       else
-	acc_create(&s.b, sizeof s.b);
+	acc_copyin(&s.b, sizeof s.b);
       assert(acc_is_present(&s.a, sizeof s.a));
       assert(acc_is_present(&s.b, sizeof s.b));
 
       if (variant & 16)
 	{
-#pragma acc enter data create(s.a)
+#pragma acc enter data copyin(s.a)
 	}
       else
-	acc_create(&s.a, sizeof s.a);
+	acc_copyin(&s.a, sizeof s.a);
       assert(acc_is_present(&s.a, sizeof s.a));
       assert(acc_is_present(&s.b, sizeof s.b));
 
       if (variant & 32)
 	{
-#pragma acc enter data create(s.a)
-	  acc_create(&s.b, sizeof s.b);
-#pragma acc enter data create(s.b)
-#pragma acc enter data create(s.b)
-	  acc_create(&s.a, sizeof s.a);
-	  acc_create(&s.a, sizeof s.a);
-	  acc_create(&s.a, sizeof s.a);
+#pragma acc enter data copyin(s.a)
+	  acc_copyin(&s.b, sizeof s.b);
+#pragma acc enter data copyin(s.b)
+#pragma acc enter data copyin(s.b)
+	  acc_copyin(&s.a, sizeof s.a);
+	  acc_copyin(&s.a, sizeof s.a);
+	  acc_copyin(&s.a, sizeof s.a);
 	}
       assert(acc_is_present(&s.a, sizeof s.a));
       assert(acc_is_present(&s.b, sizeof s.b));
@@ -55,85 +59,122 @@ static void test(unsigned variant)
 #pragma acc parallel \
   copy(s.a, s.b)
   {
+#if ACC_MEM_SHARED
+    if (s.a++ != -73)
+      __builtin_abort();
+    if (s.b-- != 22)
+      __builtin_abort();
+#else
+    if (s.a++ != 73)
+      __builtin_abort();
+    if (s.b-- != -22)
+      __builtin_abort();
+#endif
   }
+#if ACC_MEM_SHARED
+  assert(s.a == -72);
+  assert(s.b == 21);
+#else
+  assert(s.a == -73);
+  assert(s.b == 22);
+#endif
 
   if (variant & 32)
     {
       if (variant & 1)
 	{
-#pragma acc exit data delete(s.a) finalize
+#pragma acc exit data copyout(s.a) finalize
 	}
       else
-	acc_delete_finalize(&s.a, sizeof s.a);
+	acc_copyout_finalize(&s.a, sizeof s.a);
     }
   else
     {
       if (variant & 1)
 	{
-#pragma acc exit data delete(s.a)
+#pragma acc exit data copyout(s.a)
 	}
       else
-	acc_delete(&s.a, sizeof s.a);
+	acc_copyout(&s.a, sizeof s.a);
       if (variant & 4)
 	{
 	  assert(acc_is_present(&s.a, sizeof s.a));
 	  assert(acc_is_present(&s.b, sizeof s.b));
+#if ACC_MEM_SHARED
+	  assert(s.a == -72);
+	  assert(s.b == 21);
+#else
+	  assert(s.a == -73);
+	  assert(s.b == 22);
+#endif
 	  if (variant & 1)
 	    {
-#pragma acc exit data delete(s.a)
+#pragma acc exit data copyout(s.a)
 	    }
 	  else
-	    acc_delete(&s.a, sizeof s.a);
+	    acc_copyout(&s.a, sizeof s.a);
 	}
     }
 #if ACC_MEM_SHARED
   assert(acc_is_present(&s.a, sizeof s.a));
   assert(acc_is_present(&s.b, sizeof s.b));
+  assert(s.a == -72);
+  assert(s.b == 21);
 #else
   assert(!acc_is_present(&s.a, sizeof s.a));
   assert(acc_is_present(&s.b, sizeof s.b));
+  assert(s.a == 74);
+  assert(s.b == 22);
 #endif
 
   if (variant & 32)
     {
       if (variant & 2)
 	{
-#pragma acc exit data delete(s.b) finalize
+#pragma acc exit data copyout(s.b) finalize
 	}
       else
-	acc_delete_finalize(&s.b, sizeof s.b);
+	acc_copyout_finalize(&s.b, sizeof s.b);
     }
   else
     {
       if (variant & 2)
 	{
-#pragma acc exit data delete(s.b)
+#pragma acc exit data copyout(s.b)
 	}
       else
-	acc_delete(&s.b, sizeof s.b);
+	acc_copyout(&s.b, sizeof s.b);
       if (variant & 4)
 	{
 #if ACC_MEM_SHARED
 	  assert(acc_is_present(&s.a, sizeof s.a));
 	  assert(acc_is_present(&s.b, sizeof s.b));
+	  assert(s.a == -72);
+	  assert(s.b == 21);
 #else
 	  assert(!acc_is_present(&s.a, sizeof s.a));
 	  assert(acc_is_present(&s.b, sizeof s.b));
+	  assert(s.a == 74);
+	  assert(s.b == 22);
 #endif
 	  if (variant & 2)
 	    {
-#pragma acc exit data delete(s.b)
+#pragma acc exit data copyout(s.b)
 	    }
 	  else
-	    acc_delete(&s.b, sizeof s.b);
+	    acc_copyout(&s.b, sizeof s.b);
 	}
     }
 #if ACC_MEM_SHARED
   assert(acc_is_present(&s.a, sizeof s.a));
   assert(acc_is_present(&s.b, sizeof s.b));
+  assert(s.a == -72);
+  assert(s.b == 21);
 #else
   assert(!acc_is_present(&s.a, sizeof s.a));
   assert(!acc_is_present(&s.b, sizeof s.b));
+  assert(s.a == 74);
+  assert(s.b == -23);
 #endif
 }
 
-- 
2.26.2
Thomas Schwinge June 5, 2020, 10:39 a.m. | #7
Hi Julian!

On 2019-12-17T22:03:47-0800, Julian Brown <julian@codesourcery.com> wrote:
> This part contains the libgomp runtime support for the GOMP_MAP_ATTACH and

> GOMP_MAP_DETACH mapping kinds


> --- a/libgomp/oacc-mem.c

> +++ b/libgomp/oacc-mem.c


> @@ -1018,6 +1033,33 @@ goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,

>  {

>    gomp_mutex_lock (&acc_dev->lock);

>

> +  /* Handle "detach" before copyback/deletion of mapped data.  */

> +  for (size_t i = 0; i < mapnum; ++i)

> +    {

> +      unsigned char kind = kinds[i] & 0xff;

> +      switch (kind)

> +     {

> +     case GOMP_MAP_DETACH:

> +     case GOMP_MAP_FORCE_DETACH:

> +       {

> +         struct splay_tree_key_s cur_node;

> +         uintptr_t hostaddr = (uintptr_t) hostaddrs[i];

> +         cur_node.host_start = hostaddr;

> +         cur_node.host_end = cur_node.host_start + sizeof (void *);

> +         splay_tree_key n

> +           = splay_tree_lookup (&acc_dev->mem_map, &cur_node);

> +

> +         if (n == NULL)

> +           gomp_fatal ("struct not mapped for detach operation");

> +

> +         gomp_detach_pointer (acc_dev, aq, n, hostaddr, finalize, NULL);

> +       }

> +       break;

> +     default:

> +       ;

> +     }

> +    }

> +

>    for (size_t i = 0; i < mapnum; ++i)

>      {

>        unsigned char kind = kinds[i] & 0xff;


What's the reason that we're not instead emitting any 'detach' mappings
in the expected order (that is, first), which would avoid this
double-traversal here?  Given that 'mapnum' typically won't exceed the
dozens, the code we now got doesn't have a big run-time cost, of course,
but it's still a bit ugly, conceptually, for no apparent reason, unless
I'm confused?

> --- a/libgomp/target.c

> +++ b/libgomp/target.c


> @@ -1534,6 +1571,18 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom,

>      }

>

>    size_t i;

> +

> +  /* We must perform detachments before any copies back to the host.  */

> +  for (i = 0; i < tgt->list_count; i++)

> +    {

> +      splay_tree_key k = tgt->list[i].key;

> +

> +      if (k != NULL && tgt->list[i].do_detach)

> +     gomp_detach_pointer (devicep, aq, k, tgt->list[i].key->host_start

> +                                          + tgt->list[i].offset,

> +                          k->refcount == 1, NULL);

> +    }

> +

>    for (i = 0; i < tgt->list_count; i++)

>      {

>        splay_tree_key k = tgt->list[i].key;


Similarly, isn't it sufficient if any 'detach' for 'tgt->list[i].key' is
done (directly) before the copy-back/unmap of 'tgt->list[i].key', again
to avoid the double-traversal here?


Grüße
 Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Thomas Schwinge June 5, 2020, 11:17 a.m. | #8
Hi Julian!

On 2019-12-17T21:03:47-0800, Julian Brown <julian@codesourcery.com> wrote:
> This part contains the libgomp runtime support for the GOMP_MAP_ATTACH and

> GOMP_MAP_DETACH mapping kinds


> --- a/libgomp/target.c

> +++ b/libgomp/target.c


> @@ -1203,6 +1211,32 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,


> +           case GOMP_MAP_ATTACH:

> +             {

> +               cur_node.host_start = (uintptr_t) hostaddrs[i];

> +               cur_node.host_end = cur_node.host_start + sizeof (void *);

> +               splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);

> +               if (n != NULL)

> +                 {

> +                   tgt->list[i].key = n;

> +                   tgt->list[i].offset = cur_node.host_start - n->host_start;

> +                   tgt->list[i].length = n->host_end - n->host_start;

> +                   tgt->list[i].copy_from = false;

> +                   tgt->list[i].always_copy_from = false;

> +                   tgt->list[i].do_detach

> +                     = (pragma_kind != GOMP_MAP_VARS_OPENACC_ENTER_DATA);

> +                   n->refcount++;

> +                 }

> +               else

> +                 {

> +                   gomp_mutex_unlock (&devicep->lock);

> +                   gomp_fatal ("outer struct not mapped for attach");

> +                 }

> +               gomp_attach_pointer (devicep, aq, mem_map, n,

> +                                    (uintptr_t) hostaddrs[i], sizes[i],

> +                                    cbufp);

> +               continue;

> +             }


For the OpenACC runtime API 'acc_attach' etc. routines they don't, so
what's the conceptual reason that for the corresponding OpenACC directive
variants, 'GOMP_MAP_ATTACH' etc. here participate in reference counting
('n->refcount++' above)?  I understand OpenACC 'attach'/'detach' clauses
to be simple "executable clauses", which just update some values
somewhere (say, like 'GOMP_MAP_ALWAYS_POINTER'), but they don't alter any
mapping state, thus wouldn't appear to need reference counting?


Grüße
 Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Julian Brown June 5, 2020, 8:28 p.m. | #9
On Fri, 5 Jun 2020 12:39:46 +0200
Thomas Schwinge <thomas@codesourcery.com> wrote:

> Hi Julian!

> 

> On 2019-12-17T22:03:47-0800, Julian Brown <julian@codesourcery.com>

> wrote:

> > This part contains the libgomp runtime support for the

> > GOMP_MAP_ATTACH and GOMP_MAP_DETACH mapping kinds  

> 

> > --- a/libgomp/oacc-mem.c

> > +++ b/libgomp/oacc-mem.c  

> 

> > @@ -1018,6 +1033,33 @@ goacc_exit_data_internal (struct

> > gomp_device_descr *acc_dev, size_t mapnum, {

> >    gomp_mutex_lock (&acc_dev->lock);

> >  

> > +  /* Handle "detach" before copyback/deletion of mapped data.  */

> > +  for (size_t i = 0; i < mapnum; ++i)

> > +    {

> > +      unsigned char kind = kinds[i] & 0xff;

> > +      switch (kind)

> > +	{

> > +	case GOMP_MAP_DETACH:

> > +	case GOMP_MAP_FORCE_DETACH:

> > +	  {

> > +	    struct splay_tree_key_s cur_node;

> > +	    uintptr_t hostaddr = (uintptr_t) hostaddrs[i];

> > +	    cur_node.host_start = hostaddr;

> > +	    cur_node.host_end = cur_node.host_start + sizeof (void

> > *);

> > +	    splay_tree_key n

> > +	      = splay_tree_lookup (&acc_dev->mem_map, &cur_node);

> > +

> > +	    if (n == NULL)

> > +	      gomp_fatal ("struct not mapped for detach

> > operation"); +

> > +	    gomp_detach_pointer (acc_dev, aq, n, hostaddr,

> > finalize, NULL);

> > +	  }

> > +	  break;

> > +	default:

> > +	  ;

> > +	}

> > +    }

> > +

> >    for (size_t i = 0; i < mapnum; ++i)

> >      {

> >        unsigned char kind = kinds[i] & 0xff;  

> 

> What's the reason that we're not instead emitting any 'detach'

> mappings in the expected order (that is, first), which would avoid

> this double-traversal here?  Given that 'mapnum' typically won't

> exceed the dozens, the code we now got doesn't have a big run-time

> cost, of course, but it's still a bit ugly, conceptually, for no

> apparent reason, unless I'm confused?


This is a weakness in the implementation -- the existing code in
gimplify.c that reorders clauses for GOMP_MAP_STRUCT, etc. is
sufficiently fiddly that I didn't want to mess with it. (I suppose
there's no reason the reordering needs to be done entirely in one pass,
though.)

Unfortunately that means (I now realise) that we're locked in
to supporting unordered detach clauses in libgomp going forwards anyway,
even if we fix the ordering in gimplify.c now.

Julian
Julian Brown June 5, 2020, 8:31 p.m. | #10
On Fri, 5 Jun 2020 13:17:09 +0200
Thomas Schwinge <thomas@codesourcery.com> wrote:

> Hi Julian!

> 

> On 2019-12-17T21:03:47-0800, Julian Brown <julian@codesourcery.com>

> wrote:

> > This part contains the libgomp runtime support for the

> > GOMP_MAP_ATTACH and GOMP_MAP_DETACH mapping kinds  

> 

> > --- a/libgomp/target.c

> > +++ b/libgomp/target.c  

> 

> > @@ -1203,6 +1211,32 @@ gomp_map_vars_internal (struct

> > gomp_device_descr *devicep,  

> 

> > +	      case GOMP_MAP_ATTACH:

> > +		{

> > +		  cur_node.host_start = (uintptr_t) hostaddrs[i];

> > +		  cur_node.host_end = cur_node.host_start + sizeof

> > (void *);

> > +		  splay_tree_key n = splay_tree_lookup (mem_map,

> > &cur_node);

> > +		  if (n != NULL)

> > +		    {

> > +		      tgt->list[i].key = n;

> > +		      tgt->list[i].offset = cur_node.host_start -

> > n->host_start;

> > +		      tgt->list[i].length = n->host_end -

> > n->host_start;

> > +		      tgt->list[i].copy_from = false;

> > +		      tgt->list[i].always_copy_from = false;

> > +		      tgt->list[i].do_detach

> > +			= (pragma_kind !=

> > GOMP_MAP_VARS_OPENACC_ENTER_DATA);

> > +		      n->refcount++;

> > +		    }

> > +		  else

> > +		    {

> > +		      gomp_mutex_unlock (&devicep->lock);

> > +		      gomp_fatal ("outer struct not mapped for

> > attach");

> > +		    }

> > +		  gomp_attach_pointer (devicep, aq, mem_map, n,

> > +				       (uintptr_t) hostaddrs[i],

> > sizes[i],

> > +				       cbufp);

> > +		  continue;

> > +		}  

> 

> For the OpenACC runtime API 'acc_attach' etc. routines they don't, so

> what's the conceptual reason that for the corresponding OpenACC

> directive variants, 'GOMP_MAP_ATTACH' etc. here participate in

> reference counting ('n->refcount++' above)?  I understand OpenACC

> 'attach'/'detach' clauses to be simple "executable clauses", which

> just update some values somewhere (say, like

> 'GOMP_MAP_ALWAYS_POINTER'), but they don't alter any mapping state,

> thus wouldn't appear to need reference counting?


IIUC, n->refcount is not directly the "structural reference count" as
seen at source level, but rather counts the number of target_var_descs
in the lists appended to each target_mem_desc -- and GOMP_MAP_ATTACH
have variable entries in those lists. That's not the case for the API
routines.

Julian
Thomas Schwinge June 26, 2020, 9:20 a.m. | #11
Hi Julian!

On 2019-12-17T22:03:47-0800, Julian Brown <julian@codesourcery.com> wrote:
> This part contains the libgomp runtime support for the GOMP_MAP_ATTACH and

> GOMP_MAP_DETACH mapping kinds (etc.), as introduced by the front-end

> patches following in this series.


> --- a/libgomp/target.c

> +++ b/libgomp/target.c


> @@ -1534,6 +1571,18 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom,


This is the code path at the end of a structured OpenACC 'data'
construct.

> +  /* We must perform detachments before any copies back to the host.  */

> +  for (i = 0; i < tgt->list_count; i++)

> +    {

> +      splay_tree_key k = tgt->list[i].key;

> +

> +      if (k != NULL && tgt->list[i].do_detach)

> +     gomp_detach_pointer (devicep, aq, k, tgt->list[i].key->host_start

> +                                          + tgt->list[i].offset,

> +                          k->refcount == 1, NULL);

> +    }


Can you please explain (as a source code comment) the logic for here
using 'k->refcount == 1' for the 'bool finalize' parameter of
'gomp_detach_pointer'; this somehow feels "strange"?

Nonwithstanding the question whether that's a valid thing to do or not,
but doesn't the current code hide the "attach count underflow" error if
you reach the above code with 'attach_count == 0' (user already
explicitly 'detach'ed), but then given 'k->refcount == 1' (thus
'finalize' semantics), 'gomp_detach_pointer' will then re-initialize
'attach_count = 1', and then do another 'gomp_copy_host2dev', etc.
instead of emitting an error.

(I have not attempted to produce a libgomp test case.)

Shouldn't this just always be 'finalize = false' given that there is no
'finalize' semantics for 'detach' on a structured OpenACC 'data'
constructs -- at least that's what I remember right now?


Grüße
 Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Thomas Schwinge June 30, 2020, 3:58 p.m. | #12
Hi Julian!

On 2019-12-17T22:03:47-0800, Julian Brown <julian@codesourcery.com> wrote:
> This part contains the libgomp runtime support for the GOMP_MAP_ATTACH and

> GOMP_MAP_DETACH mapping kinds (etc.), as introduced by the front-end

> patches following in this series.


> --- a/libgomp/target.c

> +++ b/libgomp/target.c

> @@ -540,6 +540,7 @@ gomp_map_vars_existing (struct gomp_device_descr *devicep,

>    tgt_var->key = oldn;

>    tgt_var->copy_from = GOMP_MAP_COPY_FROM_P (kind);

>    tgt_var->always_copy_from = GOMP_MAP_ALWAYS_FROM_P (kind);

> +  tgt_var->do_detach = kind == GOMP_MAP_ATTACH;

>    tgt_var->offset = newn->host_start - oldn->host_start;

>    tgt_var->length = newn->host_end - newn->host_start;

>


For 'kind == GOMP_MAP_ATTACH', this function 'gomp_map_vars_existing' is
actually unreachable.

> @@ -978,8 +979,15 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,

>         has_firstprivate = true;

>         continue;

>       }

> +      else if ((kind & typemask) == GOMP_MAP_ATTACH)

> +     {

> +       tgt->list[i].key = NULL;

> +       has_firstprivate = true;

> +       continue;

> +     }


Given this, the following condition also is always-false:

>        cur_node.host_start = (uintptr_t) hostaddrs[i];

> -      if (!GOMP_MAP_POINTER_P (kind & typemask))

> +      if (!GOMP_MAP_POINTER_P (kind & typemask)

> +       && (kind & typemask) != GOMP_MAP_ATTACH)

>       cur_node.host_end = cur_node.host_start + sizes[i];

>        else

>       cur_node.host_end = cur_node.host_start + sizeof (void *);


Thus pushed "Mark up unreachable OpenACC 'attach' code path" to master
branch in commit aff43ac0aed5185884724adbdfd4dbbabd87637c, and
releases/gcc-10 branch in commit
4b185ee144d0c53ea7f08d4edaa8b578739498be, see attached.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
From aff43ac0aed5185884724adbdfd4dbbabd87637c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>

Date: Fri, 26 Jun 2020 10:19:14 +0200
Subject: [PATCH] Mark up unreachable OpenACC 'attach' code path

... introduced in commit 8e7e71ff247fb116dc381c5ef0c09acc0d2b374f (r279625)
"OpenACC 2.6 deep copy: libgomp parts".

	libgomp/
	* target.c (gomp_map_vars_existing): Assert 'kind !=
	GOMP_MAP_ATTACH'.
	(gomp_map_vars_internal): Clean up.
---
 libgomp/target.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/libgomp/target.c b/libgomp/target.c
index 36425477dcb0..d4a4a408b400 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -357,10 +357,12 @@ gomp_map_vars_existing (struct gomp_device_descr *devicep,
 			splay_tree_key newn, struct target_var_desc *tgt_var,
 			unsigned char kind, struct gomp_coalesce_buf *cbuf)
 {
+  assert (kind != GOMP_MAP_ATTACH);
+
   tgt_var->key = oldn;
   tgt_var->copy_from = GOMP_MAP_COPY_FROM_P (kind);
   tgt_var->always_copy_from = GOMP_MAP_ALWAYS_FROM_P (kind);
-  tgt_var->do_detach = kind == GOMP_MAP_ATTACH;
+  tgt_var->do_detach = false;
   tgt_var->offset = newn->host_start - oldn->host_start;
   tgt_var->length = newn->host_end - newn->host_start;
 
@@ -815,8 +817,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	  continue;
 	}
       cur_node.host_start = (uintptr_t) hostaddrs[i];
-      if (!GOMP_MAP_POINTER_P (kind & typemask)
-	  && (kind & typemask) != GOMP_MAP_ATTACH)
+      if (!GOMP_MAP_POINTER_P (kind & typemask))
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
-- 
2.27.0
From 4b185ee144d0c53ea7f08d4edaa8b578739498be Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>

Date: Fri, 26 Jun 2020 10:19:14 +0200
Subject: [PATCH] Mark up unreachable OpenACC 'attach' code path

... introduced in commit 8e7e71ff247fb116dc381c5ef0c09acc0d2b374f (r279625)
"OpenACC 2.6 deep copy: libgomp parts".

	libgomp/
	* target.c (gomp_map_vars_existing): Assert 'kind !=
	GOMP_MAP_ATTACH'.
	(gomp_map_vars_internal): Clean up.

(cherry picked from commit aff43ac0aed5185884724adbdfd4dbbabd87637c)
---
 libgomp/target.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/libgomp/target.c b/libgomp/target.c
index 36425477dcb0..d4a4a408b400 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -357,10 +357,12 @@ gomp_map_vars_existing (struct gomp_device_descr *devicep,
 			splay_tree_key newn, struct target_var_desc *tgt_var,
 			unsigned char kind, struct gomp_coalesce_buf *cbuf)
 {
+  assert (kind != GOMP_MAP_ATTACH);
+
   tgt_var->key = oldn;
   tgt_var->copy_from = GOMP_MAP_COPY_FROM_P (kind);
   tgt_var->always_copy_from = GOMP_MAP_ALWAYS_FROM_P (kind);
-  tgt_var->do_detach = kind == GOMP_MAP_ATTACH;
+  tgt_var->do_detach = false;
   tgt_var->offset = newn->host_start - oldn->host_start;
   tgt_var->length = newn->host_end - newn->host_start;
 
@@ -815,8 +817,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	  continue;
 	}
       cur_node.host_start = (uintptr_t) hostaddrs[i];
-      if (!GOMP_MAP_POINTER_P (kind & typemask)
-	  && (kind & typemask) != GOMP_MAP_ATTACH)
+      if (!GOMP_MAP_POINTER_P (kind & typemask))
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
-- 
2.27.0
Thomas Schwinge July 16, 2020, 9:35 a.m. | #13
Hi Julian!

Ping.

On 2020-06-26T11:20:40+0200, I wrote:
> On 2019-12-17T22:03:47-0800, Julian Brown <julian@codesourcery.com> wrote:

>> This part contains the libgomp runtime support for the GOMP_MAP_ATTACH and

>> GOMP_MAP_DETACH mapping kinds (etc.), as introduced by the front-end

>> patches following in this series.

>

>> --- a/libgomp/target.c

>> +++ b/libgomp/target.c

>

>> @@ -1534,6 +1571,18 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom,

>

> This is the code path at the end of a structured OpenACC 'data'

> construct.

>

>> +  /* We must perform detachments before any copies back to the host.  */

>> +  for (i = 0; i < tgt->list_count; i++)

>> +    {

>> +      splay_tree_key k = tgt->list[i].key;

>> +

>> +      if (k != NULL && tgt->list[i].do_detach)

>> +    gomp_detach_pointer (devicep, aq, k, tgt->list[i].key->host_start

>> +                                         + tgt->list[i].offset,

>> +                         k->refcount == 1, NULL);

>> +    }

>

> Can you please explain (as a source code comment) the logic for here

> using 'k->refcount == 1' for the 'bool finalize' parameter of

> 'gomp_detach_pointer'; this somehow feels "strange"?

>

> Nonwithstanding the question whether that's a valid thing to do or not,

> but doesn't the current code hide the "attach count underflow" error if

> you reach the above code with 'attach_count == 0' (user already

> explicitly 'detach'ed), but then given 'k->refcount == 1' (thus

> 'finalize' semantics), 'gomp_detach_pointer' will then re-initialize

> 'attach_count = 1', and then do another 'gomp_copy_host2dev', etc.

> instead of emitting an error.

>

> (I have not attempted to produce a libgomp test case.)

>

> Shouldn't this just always be 'finalize = false' given that there is no

> 'finalize' semantics for 'detach' on a structured OpenACC 'data'

> constructs -- at least that's what I remember right now?



Grüße
 Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Julian Brown July 16, 2020, 9:21 p.m. | #14
On Thu, 16 Jul 2020 11:35:23 +0200
Thomas Schwinge <thomas@codesourcery.com> wrote:

> Hi Julian!

> 

> Ping.

> 

> On 2020-06-26T11:20:40+0200, I wrote:

> > On 2019-12-17T22:03:47-0800, Julian Brown <julian@codesourcery.com>

> > wrote:  

> >> This part contains the libgomp runtime support for the

> >> GOMP_MAP_ATTACH and GOMP_MAP_DETACH mapping kinds (etc.), as

> >> introduced by the front-end patches following in this series.  

> >  

> >> --- a/libgomp/target.c

> >> +++ b/libgomp/target.c  

> >  

> >> @@ -1534,6 +1571,18 @@ gomp_unmap_vars_internal (struct

> >> target_mem_desc *tgt, bool do_copyfrom,  

> >

> > This is the code path at the end of a structured OpenACC 'data'

> > construct.

> >  

> >> +  /* We must perform detachments before any copies back to the

> >> host.  */

> >> +  for (i = 0; i < tgt->list_count; i++)

> >> +    {

> >> +      splay_tree_key k = tgt->list[i].key;

> >> +

> >> +      if (k != NULL && tgt->list[i].do_detach)

> >> +	gomp_detach_pointer (devicep, aq, k,

> >> tgt->list[i].key->host_start

> >> +					     +

> >> tgt->list[i].offset,

> >> +			     k->refcount == 1, NULL);

> >> +    }  

> >

> > Can you please explain (as a source code comment) the logic for here

> > using 'k->refcount == 1' for the 'bool finalize' parameter of

> > 'gomp_detach_pointer'; this somehow feels "strange"?

[snip]
> > Shouldn't this just always be 'finalize = false' given that there

> > is no 'finalize' semantics for 'detach' on a structured OpenACC

> > 'data' constructs -- at least that's what I remember right now?  


As far as I can tell, forcing finalize there is unnecessary (and as you
point out, conceptually dubious), but I think in practice it's
harmless (at least I haven't figured out a breaking test case). Anyway,
this patch just passes "false" for the finalize argument. I've also
added a test case, though it passes before/after the patch.

OK? Tested with offloading to nvptx.

Julian
From 31618a79bb3aa0d088030904fff0ad386ddb0999 Mon Sep 17 00:00:00 2001
From: Julian Brown <julian@codesourcery.com>

Date: Thu, 2 Jul 2020 14:18:20 -0700
Subject: [PATCH] openacc: Remove unnecessary detach finalization

The call to gomp_detach_pointer in gomp_unmap_vars_internal does not
need to force finalization, and doing so may mask mismatched pointer
attachments/detachments. This patch removes the forcing.

2020-07-16  Julian Brown  <julian@codesourcery.com>

libgomp/
	* target.c (gomp_unmap_vars_internal): Remove unnecessary forcing of
	finalization for detach operation.
	* testsuite/libgomp.oacc-c-c++-common/structured-detach-underflow.c:
	New test.
---
 libgomp/target.c                              |  2 +-
 .../structured-detach-underflow.c             | 24 +++++++++++++++++++
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/structured-detach-underflow.c

diff --git a/libgomp/target.c b/libgomp/target.c
index d6b3572c8d8..00c75fbd885 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1437,7 +1437,7 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom,
       if (k != NULL && tgt->list[i].do_detach)
 	gomp_detach_pointer (devicep, aq, k, tgt->list[i].key->host_start
 					     + tgt->list[i].offset,
-			     k->refcount == 1, NULL);
+			     false, NULL);
     }
 
   for (i = 0; i < tgt->list_count; i++)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/structured-detach-underflow.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/structured-detach-underflow.c
new file mode 100644
index 00000000000..184410137d8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/structured-detach-underflow.c
@@ -0,0 +1,24 @@
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+int main ()
+{
+  struct {
+    int *arr;
+  } mystr;
+  int localarr[16];
+  mystr.arr = localarr;
+
+  #pragma acc enter data copyin(mystr, localarr[0:16])
+
+  #pragma acc data copyin(mystr.arr[0:16])
+  {
+    #pragma acc exit data detach(mystr.arr)
+    /* { dg-output "libgomp: attach count underflow" } */
+  }
+
+  #pragma acc exit data copyout(mystr, localarr[0:16])
+
+  return 0;
+}
+
+/* { dg-shouldfail "" } */
-- 
2.23.0
Thomas Schwinge July 17, 2020, 9:12 a.m. | #15
Hi Julian!

On 2020-07-16T22:21:43+0100, Julian Brown <julian@codesourcery.com> wrote:
> On Thu, 16 Jul 2020 11:35:23 +0200

> Thomas Schwinge <thomas@codesourcery.com> wrote:

>> On 2020-06-26T11:20:40+0200, I wrote:

>> > On 2019-12-17T22:03:47-0800, Julian Brown <julian@codesourcery.com>

>> > wrote:

>> >> This part contains the libgomp runtime support for the

>> >> GOMP_MAP_ATTACH and GOMP_MAP_DETACH mapping kinds (etc.), as

>> >> introduced by the front-end patches following in this series.

>> >

>> >> --- a/libgomp/target.c

>> >> +++ b/libgomp/target.c

>> >

>> >> @@ -1534,6 +1571,18 @@ gomp_unmap_vars_internal (struct

>> >> target_mem_desc *tgt, bool do_copyfrom,

>> >

>> > This is the code path at the end of a structured OpenACC 'data'

>> > construct.

>> >

>> >> +  /* We must perform detachments before any copies back to the host.  */

>> >> +  for (i = 0; i < tgt->list_count; i++)

>> >> +    {

>> >> +      splay_tree_key k = tgt->list[i].key;

>> >> +

>> >> +      if (k != NULL && tgt->list[i].do_detach)

>> >> + gomp_detach_pointer (devicep, aq, k, tgt->list[i].key->host_start

>> >> +                                      + tgt->list[i].offset,

>> >> +                      k->refcount == 1, NULL);

>> >> +    }

>> >

>> > Can you please explain (as a source code comment) the logic for here

>> > using 'k->refcount == 1' for the 'bool finalize' parameter of

>> > 'gomp_detach_pointer'; this somehow feels "strange"?

> [snip]

>> > Shouldn't this just always be 'finalize = false' given that there

>> > is no 'finalize' semantics for 'detach' on a structured OpenACC

>> > 'data' constructs -- at least that's what I remember right now?

>

> As far as I can tell, forcing finalize there is unnecessary (and as you

> point out, conceptually dubious)


Thanks for verifying.

> but I think in practice it's

> harmless (at least I haven't figured out a breaking test case).


I have.  ;-P

(That's in a tree with the pending "[OpenACC] Deep copy attach/detach
should not affect reference counts" included -- may or may not be
relevant.)

> Anyway,

> this patch just passes "false" for the finalize argument. I've also

> added a test case, though it passes before/after the patch.

>

> OK? Tested with offloading to nvptx.


With the attached incremental patch merged in, OK for master and
releases/gcc-10 (once un-frozen) branches.

> --- a/libgomp/target.c

> +++ b/libgomp/target.c

> @@ -1437,7 +1437,7 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom,

>        if (k != NULL && tgt->list[i].do_detach)

>       gomp_detach_pointer (devicep, aq, k, tgt->list[i].key->host_start

>                                            + tgt->list[i].offset,

> -                          k->refcount == 1, NULL);

> +                          false, NULL);


ACK.

Regarding the 'k->refcount' that was used above...

> --- /dev/null

> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/structured-detach-underflow.c

> @@ -0,0 +1,24 @@

> +/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */

> +

> +int main ()

> +{

> +  struct {

> +    int *arr;

> +  } mystr;

> +  int localarr[16];

> +  mystr.arr = localarr;

> +

> +  #pragma acc enter data copyin(mystr, localarr[0:16])


..., here we have 'k->refcount == 1', but...

> +

> +  #pragma acc data copyin(mystr.arr[0:16])

> +  {


..., here we now have 'k->refcount == 2' as the 'copyin' has incremented
it, so...

> +    #pragma acc exit data detach(mystr.arr)

> +    /* { dg-output "libgomp: attach count underflow" } */

> +  }


..., it won't trigger the erroneous behavior here.

Instead of 'copyin(mystr.arr[0:16])' on the OpenACC 'data' construct, we
have to do a 'attach(mystr.arr)', and can then reproduce the problem:
without the 'libgomp/target.c:gomp_unmap_vars_internal' change, it
unexpectedly doesn't catch 'libgomp: attach count underflow', and instead
'detach'es again, and probably (potentially?) writes wrong data into
'mystr.arr' (I haven't tested that aspect).

> +

> +  #pragma acc exit data copyout(mystr, localarr[0:16])

> +

> +  return 0;

> +}

> +

> +/* { dg-shouldfail "" } */



Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
From 6f7f6f0ac80cdc96bc02777542297b6fef538c0b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>

Date: Fri, 17 Jul 2020 09:41:18 +0200
Subject: [PATCH] into "openacc: Remove unnecessary detach finalization"

---
 .../structured-detach-underflow.c                    | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/structured-detach-underflow.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/structured-detach-underflow.c
index 184410137d8..fc1f59e2185 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/structured-detach-underflow.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/structured-detach-underflow.c
@@ -1,5 +1,7 @@
 /* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
 
+#include <stdio.h>
+
 int main ()
 {
   struct {
@@ -10,15 +12,17 @@ int main ()
 
   #pragma acc enter data copyin(mystr, localarr[0:16])
 
-  #pragma acc data copyin(mystr.arr[0:16])
+  #pragma acc data attach(mystr.arr)
   {
     #pragma acc exit data detach(mystr.arr)
-    /* { dg-output "libgomp: attach count underflow" } */
+    fprintf (stderr, "CheCKpOInT1\n");
+    /* { dg-output ".*CheCKpOInT1(\n|\r\n|\r)" } */
   }
+  /* { dg-shouldfail "" }
+     { dg-output "(\n|\r\n|\r)libgomp: attach count underflow(\n|\r\n|\r)$" } */
+  fprintf (stderr, "CheCKpOInT2\n");
 
   #pragma acc exit data copyout(mystr, localarr[0:16])
 
   return 0;
 }
-
-/* { dg-shouldfail "" } */
-- 
2.17.1

Patch

diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 9e356cdfeec..e8bd52e81bd 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -40,8 +40,11 @@ 
 #define GOMP_MAP_FLAG_SPECIAL_0		(1 << 2)
 #define GOMP_MAP_FLAG_SPECIAL_1		(1 << 3)
 #define GOMP_MAP_FLAG_SPECIAL_2		(1 << 4)
+#define GOMP_MAP_FLAG_SPECIAL_4		(1 << 6)
 #define GOMP_MAP_FLAG_SPECIAL		(GOMP_MAP_FLAG_SPECIAL_1 \
 					 | GOMP_MAP_FLAG_SPECIAL_0)
+#define GOMP_MAP_DEEP_COPY		(GOMP_MAP_FLAG_SPECIAL_4 \
+					 | GOMP_MAP_FLAG_SPECIAL_2)
 /* Flag to force a specific behavior (or else, trigger a run-time error).  */
 #define GOMP_MAP_FLAG_FORCE		(1 << 7)
 
@@ -127,6 +130,13 @@  enum gomp_map_kind
     /* Decrement usage count and deallocate if zero.  */
     GOMP_MAP_RELEASE =			(GOMP_MAP_FLAG_SPECIAL_2
 					 | GOMP_MAP_DELETE),
+    /* In OpenACC, attach a pointer to a mapped struct field.  */
+    GOMP_MAP_ATTACH =			(GOMP_MAP_DEEP_COPY | 0),
+    /* In OpenACC, detach a pointer to a mapped struct field.  */
+    GOMP_MAP_DETACH =			(GOMP_MAP_DEEP_COPY | 1),
+    /* In OpenACC, detach a pointer to a mapped struct field.  */
+    GOMP_MAP_FORCE_DETACH =		(GOMP_MAP_DEEP_COPY
+					 | GOMP_MAP_FLAG_FORCE | 1),
 
     /* Internal to GCC, not used in libgomp.  */
     /* Do not map, but pointer assign a pointer instead.  */
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 2017991b59c..6141cc117bc 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -948,6 +948,8 @@  struct target_var_desc {
   bool copy_from;
   /* True if data always should be copied from device to host at the end.  */
   bool always_copy_from;
+  /* True if variable should be detached at end of region.  */
+  bool do_detach;
   /* Relative offset against key host_start.  */
   uintptr_t offset;
   /* Actual length.  */
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 08507791399..ce9f2759dfa 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -956,33 +956,48 @@  acc_detach_finalize_async (void **hostaddr, int async)
    mappings.  */
 
 static int
-find_group_last (int pos, size_t mapnum, unsigned short *kinds)
+find_group_last (int pos, size_t mapnum, size_t *sizes, unsigned short *kinds)
 {
   unsigned char kind0 = kinds[pos] & 0xff;
-  int first_pos = pos, last_pos = pos;
+  int first_pos = pos;
 
-  if (kind0 == GOMP_MAP_TO_PSET)
+  switch (kind0)
     {
+    case GOMP_MAP_TO_PSET:
       while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER)
-	last_pos = ++pos;
+	pos++;
       /* We expect at least one GOMP_MAP_POINTER after a GOMP_MAP_TO_PSET.  */
-      assert (last_pos > first_pos);
-    }
-  else
-    {
+      assert (pos > first_pos);
+      break;
+
+    case GOMP_MAP_STRUCT:
+      pos += sizes[pos];
+      break;
+
+    case GOMP_MAP_POINTER:
+    case GOMP_MAP_ALWAYS_POINTER:
+      /* These mappings are only expected after some other mapping.  If we
+	 see one by itself, something has gone wrong.  */
+      gomp_fatal ("unexpected mapping");
+      break;
+
+    default:
       /* GOMP_MAP_ALWAYS_POINTER can only appear directly after some other
 	 mapping.  */
-      if (pos + 1 < mapnum
-	  && (kinds[pos + 1] & 0xff) == GOMP_MAP_ALWAYS_POINTER)
-	return pos + 1;
+      if (pos + 1 < mapnum)
+	{
+	  unsigned char kind1 = kinds[pos + 1] & 0xff;
+	  if (kind1 == GOMP_MAP_ALWAYS_POINTER)
+	    return pos + 1;
+	}
 
-      /* We can have one or several GOMP_MAP_POINTER mappings after a to/from
+      /* We can have zero or more GOMP_MAP_POINTER mappings after a to/from
 	 (etc.) mapping.  */
       while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER)
-	last_pos = ++pos;
+	pos++;
     }
 
-  return last_pos;
+  return pos;
 }
 
 /* Map variables for OpenACC "enter data".  We can't just call
@@ -996,7 +1011,7 @@  goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 {
   for (size_t i = 0; i < mapnum; i++)
     {
-      int group_last = find_group_last (i, mapnum, kinds);
+      int group_last = find_group_last (i, mapnum, sizes, kinds);
 
       gomp_map_vars_async (acc_dev, aq,
 			   (group_last - i) + 1,
@@ -1018,6 +1033,33 @@  goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 {
   gomp_mutex_lock (&acc_dev->lock);
 
+  /* Handle "detach" before copyback/deletion of mapped data.  */
+  for (size_t i = 0; i < mapnum; ++i)
+    {
+      unsigned char kind = kinds[i] & 0xff;
+      switch (kind)
+	{
+	case GOMP_MAP_DETACH:
+	case GOMP_MAP_FORCE_DETACH:
+	  {
+	    struct splay_tree_key_s cur_node;
+	    uintptr_t hostaddr = (uintptr_t) hostaddrs[i];
+	    cur_node.host_start = hostaddr;
+	    cur_node.host_end = cur_node.host_start + sizeof (void *);
+	    splay_tree_key n
+	      = splay_tree_lookup (&acc_dev->mem_map, &cur_node);
+
+	    if (n == NULL)
+	      gomp_fatal ("struct not mapped for detach operation");
+
+	    gomp_detach_pointer (acc_dev, aq, n, hostaddr, finalize, NULL);
+	  }
+	  break;
+	default:
+	  ;
+	}
+    }
+
   for (size_t i = 0; i < mapnum; ++i)
     {
       unsigned char kind = kinds[i] & 0xff;
@@ -1035,6 +1077,8 @@  goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	case GOMP_MAP_POINTER:
 	case GOMP_MAP_DELETE:
 	case GOMP_MAP_RELEASE:
+	case GOMP_MAP_DETACH:
+	case GOMP_MAP_FORCE_DETACH:
 	  {
 	    struct splay_tree_key_s cur_node;
 	    cur_node.host_start = (uintptr_t) hostaddrs[i];
@@ -1075,6 +1119,39 @@  goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	      gomp_remove_var_async (acc_dev, n, aq);
 	  }
 	  break;
+
+	case GOMP_MAP_STRUCT:
+	  {
+	    int elems = sizes[i];
+	    for (int j = 1; j <= elems; j++)
+	      {
+		struct splay_tree_key_s k;
+		k.host_start = (uintptr_t) hostaddrs[i + j];
+		k.host_end = k.host_start + sizes[i + j];
+		splay_tree_key str;
+		str = splay_tree_lookup (&acc_dev->mem_map, &k);
+		if (str)
+		  {
+		    if (finalize)
+		      {
+			str->refcount -= str->virtual_refcount;
+			str->virtual_refcount = 0;
+		      }
+		    if (str->virtual_refcount > 0)
+		      {
+			str->refcount--;
+			str->virtual_refcount--;
+		      }
+		    else if (str->refcount > 0)
+		      str->refcount--;
+		    if (str->refcount == 0)
+		      gomp_remove_var_async (acc_dev, str, aq);
+		  }
+	      }
+	    i += elems;
+	  }
+	  break;
+
 	default:
 	  gomp_fatal (">>>> goacc_exit_data_internal UNHANDLED kind 0x%.2x",
 			  kind);
@@ -1107,8 +1184,13 @@  GOACC_enter_exit_data (int flags_m, size_t mapnum, void **hostaddrs,
   if (mapnum > 0)
     {
       unsigned char kind = kinds[0] & 0xff;
+
+      if (kind == GOMP_MAP_STRUCT || kind == GOMP_MAP_FORCE_PRESENT)
+	kind = kinds[1] & 0xff;
+
       if (kind == GOMP_MAP_DELETE
-	  || kind == GOMP_MAP_FORCE_FROM)
+	  || kind == GOMP_MAP_FORCE_FROM
+	  || kind == GOMP_MAP_FORCE_DETACH)
 	finalize = true;
     }
 
@@ -1117,11 +1199,14 @@  GOACC_enter_exit_data (int flags_m, size_t mapnum, void **hostaddrs,
     {
       unsigned char kind = kinds[i] & 0xff;
 
-      if (kind == GOMP_MAP_POINTER || kind == GOMP_MAP_TO_PSET)
+      if (kind == GOMP_MAP_POINTER
+	  || kind == GOMP_MAP_TO_PSET
+	  || kind == GOMP_MAP_STRUCT)
 	continue;
 
       if (kind == GOMP_MAP_FORCE_ALLOC
 	  || kind == GOMP_MAP_FORCE_PRESENT
+	  || kind == GOMP_MAP_ATTACH
 	  || kind == GOMP_MAP_FORCE_TO
 	  || kind == GOMP_MAP_TO
 	  || kind == GOMP_MAP_ALLOC)
@@ -1132,6 +1217,8 @@  GOACC_enter_exit_data (int flags_m, size_t mapnum, void **hostaddrs,
 
       if (kind == GOMP_MAP_RELEASE
 	  || kind == GOMP_MAP_DELETE
+	  || kind == GOMP_MAP_DETACH
+	  || kind == GOMP_MAP_FORCE_DETACH
 	  || kind == GOMP_MAP_FROM
 	  || kind == GOMP_MAP_FORCE_FROM)
 	break;
diff --git a/libgomp/target.c b/libgomp/target.c
index 1f429900113..6fa94dec6ce 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -540,6 +540,7 @@  gomp_map_vars_existing (struct gomp_device_descr *devicep,
   tgt_var->key = oldn;
   tgt_var->copy_from = GOMP_MAP_COPY_FROM_P (kind);
   tgt_var->always_copy_from = GOMP_MAP_ALWAYS_FROM_P (kind);
+  tgt_var->do_detach = kind == GOMP_MAP_ATTACH;
   tgt_var->offset = newn->host_start - oldn->host_start;
   tgt_var->length = newn->host_end - newn->host_start;
 
@@ -978,8 +979,15 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	  has_firstprivate = true;
 	  continue;
 	}
+      else if ((kind & typemask) == GOMP_MAP_ATTACH)
+	{
+	  tgt->list[i].key = NULL;
+	  has_firstprivate = true;
+	  continue;
+	}
       cur_node.host_start = (uintptr_t) hostaddrs[i];
-      if (!GOMP_MAP_POINTER_P (kind & typemask))
+      if (!GOMP_MAP_POINTER_P (kind & typemask)
+	  && (kind & typemask) != GOMP_MAP_ATTACH)
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
@@ -1203,6 +1211,32 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		cur_node.tgt_offset = n->tgt->tgt_start + n->tgt_offset
 				      + cur_node.host_start - n->host_start;
 		continue;
+	      case GOMP_MAP_ATTACH:
+		{
+		  cur_node.host_start = (uintptr_t) hostaddrs[i];
+		  cur_node.host_end = cur_node.host_start + sizeof (void *);
+		  splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
+		  if (n != NULL)
+		    {
+		      tgt->list[i].key = n;
+		      tgt->list[i].offset = cur_node.host_start - n->host_start;
+		      tgt->list[i].length = n->host_end - n->host_start;
+		      tgt->list[i].copy_from = false;
+		      tgt->list[i].always_copy_from = false;
+		      tgt->list[i].do_detach
+			= (pragma_kind != GOMP_MAP_VARS_OPENACC_ENTER_DATA);
+		      n->refcount++;
+		    }
+		  else
+		    {
+		      gomp_mutex_unlock (&devicep->lock);
+		      gomp_fatal ("outer struct not mapped for attach");
+		    }
+		  gomp_attach_pointer (devicep, aq, mem_map, n,
+				       (uintptr_t) hostaddrs[i], sizes[i],
+				       cbufp);
+		  continue;
+		}
 	      default:
 		break;
 	      }
@@ -1247,10 +1281,12 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
 		tgt->list[i].always_copy_from
 		  = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
+		tgt->list[i].do_detach = false;
 		tgt->list[i].offset = 0;
 		tgt->list[i].length = k->host_end - k->host_start;
 		k->refcount = 1;
 		k->virtual_refcount = 0;
+		k->aux = NULL;
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
@@ -1301,6 +1337,7 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 			  tgt->list[j].key = k;
 			  tgt->list[j].copy_from = false;
 			  tgt->list[j].always_copy_from = false;
+			  tgt->list[j].do_detach = false;
 			  if (k->refcount != REFCOUNT_INFINITY)
 			    k->refcount++;
 			  gomp_map_pointer (tgt, aq,
@@ -1534,6 +1571,18 @@  gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom,
     }
 
   size_t i;
+
+  /* We must perform detachments before any copies back to the host.  */
+  for (i = 0; i < tgt->list_count; i++)
+    {
+      splay_tree_key k = tgt->list[i].key;
+
+      if (k != NULL && tgt->list[i].do_detach)
+	gomp_detach_pointer (devicep, aq, k, tgt->list[i].key->host_start
+					     + tgt->list[i].offset,
+			     k->refcount == 1, NULL);
+    }
+
   for (i = 0; i < tgt->list_count; i++)
     {
       splay_tree_key k = tgt->list[i].key;