[2/2,OpenACC] Detect pointer updates for attach operations (PR95590)

Message ID c032357fce4fcd1720e1d6fd65d461718cf5d67c.1592826181.git.julian@codesourcery.com
State New
Headers show
Series
  • Attached deep-copy pointers, diagnostics & modifications
Related show

Commit Message

Julian Brown June 22, 2020, 12:14 p.m.
As mentioned in the parent email, this is a fix for PR95590 that detects
updates of attached pointers in blocks, and rewrites the attached pointer
and resets its attachment counter appropriately. I am however not entirely
sure this is desirable or required by the spec: points against are:

 - To avoid expensive copies from the device to the host and/or "wrong
   way" device-to-host splay tree lookups, it requires keeping an extra
   shadow copy of mapped blocks on the host in order to detect if a
   host pointer with attachments in the block has been changed between
   attach operations. We incur this overhead unconditionally if
   attach/detach are in use for what's not likely to be a common use case
   (it's slightly tricky to write a test case to exercise the behaviour,
   even -- Thomas's unmodified original for the PR raises an error after
   the previous patch in this series).

 - From a user perspective, I think it's going to be quite easy to get
   confused wrt. the hidden attachment counter state, with this kind of
   reset-on-host-pointer-modification behaviour.  Mind you, silently *not*
   doing the update is likewise going to be confusing (the stale device
   pointer would be updated at present).  Maybe this should be detected
   as an error instead?

 - The text in "2.6.8. Attachment Counter" *might* contribute to the
   argument that this kind of pointer-update detection is not required.

Anyway, thoughts, or OK for mainline?

Thanks,

Julian

ChangeLog

	PR libgomp/95590

	libgomp/
	* target.c (gomp_attach_pointer): Initialise shadow copy of block with
	attached pointers, and use to detect modifications of those pointers.
	* testsuite/libgomp.oacc-c-c++-common/attach-ptr-change-1.c: New test.
---
 libgomp/target.c                              | 29 +++++++-
 .../attach-ptr-change-1.c                     | 74 +++++++++++++++++++
 2 files changed, 100 insertions(+), 3 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/attach-ptr-change-1.c

-- 
2.23.0

Comments

Thomas Schwinge July 24, 2020, 2:04 p.m. | #1
Hi Julian!

On 2020-06-22T05:14:44-0700, Julian Brown <julian@codesourcery.com> wrote:
> As mentioned in the parent email, this is a fix for PR95590 that detects

> updates of attached pointers in blocks, and rewrites the attached pointer

> and resets its attachment counter appropriately. I am however not entirely

> sure this is desirable or required by the spec: points against are:

>

>  - To avoid expensive copies from the device to the host and/or "wrong

>    way" device-to-host splay tree lookups, it requires keeping an extra

>    shadow copy of mapped blocks on the host in order to detect if a

>    host pointer with attachments in the block has been changed between

>    attach operations.


I haven't spent too much time trying, but I too have not yet seen a way
to avoid keeping this state ("shadow copy"), or looking it up on demand
("expensive copies from the device to the host").

I suppose we cannot get the necessary information/state from the
host-side pointer (value) alone, and/or other state kept in the
'splay_tree_key n' etc.?

>    We incur this overhead unconditionally if

>    attach/detach are in use for what's not likely to be a common use case


Is the overhead so bad, though?  As soon as there's an 'attach', we have
to 'malloc' anyway (can combine the two, as you've done), and the
checking overhead doesn't seem so bad either?

Should we reach out to other OpenACC compiler implementors, and ask for
their understanding/approach to this aspect?

>    (it's slightly tricky to write a test case to exercise the behaviour,

>    even -- Thomas's unmodified original for the PR raises an error after

>    the previous patch in this series).


Challange accepted!  ;-P (..., but not right now.)

>  - From a user perspective, I think it's going to be quite easy to get

>    confused wrt. the hidden attachment counter state


(Indeed that "hidden" aspect is a bit confusing.  I've even thought
whether we should add some 'gomp_get_attach_count' function just for our
own testing purposes.)

>    with this kind of

>    reset-on-host-pointer-modification behaviour.  Mind you, silently *not*

>    doing the update is likewise going to be confusing (the stale device

>    pointer would be updated at present).  Maybe this should be detected

>    as an error instead?


I don't understand that, I'm afraid, because as I have quoted in
<https://gcc.gnu.org/PR95590> "OpenACC 'attach' behavior if already
attached to different data", OpenACC explicitly mandates the
"reset-on-host-pointer-modification" behavior, so I don't see a way to
avoid implementing that?

>  - The text in "2.6.8. Attachment Counter" *might* contribute to the

>    argument that this kind of pointer-update detection is not required.


Do you think these texts are in conflict in some way (that's not obvious
to me)?


Conceptually we're talking about making this behave:

    int a[];
    #pragma acc enter data create(a)

    int b[];
    #pragma acc enter data create(b)

    int *p;

    #pragma acc data create(p) // create long-lived device copy of 'p'
      {
        p = a;
        #pragma acc enter data attach(p) // explicit, or implicit
        #pragma acc parallel present(p)
          { fill_array(p); } // writes to device copy of 'a'

        p = b;
        #pragma acc enter data attach(p) // explicit, or implicit
        #pragma acc parallel present(p)
          { fill_array(p); } // writes to device copy of 'b'
      }

    #pragma acc parallel present(a, b)
      { [use device copies of 'a', 'b'] }

..., or this:

    int a[];
    #pragma acc enter data create(a)

    int b[];
    #pragma acc enter data create(b)

    struct {
      int data;
      int *p;
    } s s;

    #pragma acc data create(s) // create long-lived device copy of 's'
      {
        s.data = [...];
        s.p = a;
        #pragma acc update device(s) // invokes ("expected") undefined behavior w.r.t. 's.p'
        #pragma acc parallel // implicit 'attach(s.p)'
          { fill_array(s.data, s.p); } // writes to device copy of 'a'

        s.data = [...];
        s.p = a;
        #pragma acc update device(s) // invokes ("expected") undefined behavior w.r.t. 's.p'
        #pragma acc parallel // implicit 'attach(s.p)'
          { fill_array(s.data, s.p); } // writes to device copy of 'b'
      }

    #pragma acc parallel
      { [use device copies of 'a', 'b'] }

Please verify these conceptually -- and, is there any other, different
scenario to consider here?

> OK for mainline?


I think the implementation is fine, but I'd like to think about all this
a bit more.


Grüße
 Thomas


>       PR libgomp/95590

>

>       libgomp/

>       * target.c (gomp_attach_pointer): Initialise shadow copy of block with

>       attached pointers, and use to detect modifications of those pointers.

>       * testsuite/libgomp.oacc-c-c++-common/attach-ptr-change-1.c: New test.

> ---

>  libgomp/target.c                              | 29 +++++++-

>  .../attach-ptr-change-1.c                     | 74 +++++++++++++++++++

>  2 files changed, 100 insertions(+), 3 deletions(-)

>  create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/attach-ptr-change-1.c

>

> diff --git a/libgomp/target.c b/libgomp/target.c

> index db6f56a8ff8..076cc2bbbcb 100644

> --- a/libgomp/target.c

> +++ b/libgomp/target.c

> @@ -691,6 +691,8 @@ gomp_attach_pointer (struct gomp_device_descr *devicep,

>  {

>    struct splay_tree_key_s s;

>    size_t size, idx;

> +  char *shadow_block;

> +  size_t shadow_size = n->host_end - n->host_start;

>

>    if (n == NULL)

>      {

> @@ -707,9 +709,31 @@ gomp_attach_pointer (struct gomp_device_descr *devicep,

>    if (!n->aux)

>      n->aux = gomp_malloc_cleared (sizeof (struct splay_tree_aux));

>

> +  bool first = false;

> +

>    if (!n->aux->attach_count)

> -    n->aux->attach_count

> -      = gomp_malloc_cleared (sizeof (*n->aux->attach_count) * size);

> +    {

> +      n->aux->attach_count

> +     = gomp_malloc_cleared (sizeof (*n->aux->attach_count) * size

> +                            + shadow_size);

> +      first = true;

> +    }

> +

> +  shadow_block = ((char *) n->aux->attach_count)

> +              + sizeof (*n->aux->attach_count) * size;

> +

> +  if (first)

> +    memcpy (shadow_block, (const void *) n->host_start, shadow_size);

> +

> +  uintptr_t target = (uintptr_t) *(void **) attach_to;

> +  uintptr_t shadow_target

> +    = (uintptr_t) *(void **) (shadow_block + attach_to - n->host_start);

> +  if (target != shadow_target)

> +    {

> +      n->aux->attach_count[idx] = 0;

> +      memcpy ((char *) shadow_block + attach_to - n->host_start,

> +           (const void *) target, sizeof (void *));

> +    }

>

>    if (n->aux->attach_count[idx] < UINTPTR_MAX)

>      n->aux->attach_count[idx]++;

> @@ -723,7 +747,6 @@ gomp_attach_pointer (struct gomp_device_descr *devicep,

>      {

>        uintptr_t devptr = n->tgt->tgt_start + n->tgt_offset + attach_to

>                        - n->host_start;

> -      uintptr_t target = (uintptr_t) *(void **) attach_to;

>        splay_tree_key tn;

>        uintptr_t data;

>

> diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-ptr-change-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-ptr-change-1.c

> new file mode 100644

> index 00000000000..d4d84fdb092

> --- /dev/null

> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-ptr-change-1.c

> @@ -0,0 +1,74 @@

> +#include <assert.h>

> +#include <stdlib.h>

> +#include <openacc.h>

> +

> +struct str {

> +  unsigned char *c;

> +};

> +

> +int main()

> +{

> +  const int size_1 = sizeof (void *);

> +  unsigned char *data_1 = (unsigned char *) malloc(sizeof (void *));

> +  assert(data_1);

> +  void *data_1_d = acc_create(data_1, size_1);

> +  assert(data_1_d);

> +  assert(acc_is_present(data_1, size_1));

> +

> +  const int size_2 = sizeof (void *);

> +  unsigned char *data_2 = (unsigned char *) malloc(size_2);

> +  assert(data_2);

> +  void *data_2_d = acc_create(data_2, size_2);

> +  assert(data_2_d);

> +  assert(acc_is_present(data_2, size_2));

> +

> +  struct str data_work;

> +  data_work.c = data_1;

> +

> +  acc_copyin(&data_work, sizeof data_work);

> +  assert(acc_is_present(&data_work, sizeof data_work));

> +  assert(data_work.c == data_1);

> +

> +  /* No attach has taken place so far.  We can still do a self-update.  */

> +  acc_update_self(&data_work, sizeof data_work);

> +  assert(data_work.c == data_1);

> +

> +  data_1[0] = 'a';

> +  data_2[0] = 'b';

> +

> +  acc_update_device (data_1, size_1);

> +  acc_update_device (data_2, size_2);

> +

> +  acc_attach((void **) &data_work.c);

> +  #pragma acc serial /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } } */

> +  {

> +    data_work.c[0] = 'c';

> +  }

> +

> +  acc_update_self (data_1, size_1);

> +  acc_update_self (data_2, size_2);

> +

> +  assert (data_1[0] == 'c');

> +  assert (data_2[0] == 'b');

> +

> +  data_1[0] = 'a';

> +  data_2[0] = 'b';

> +

> +  acc_update_device (data_1, size_1);

> +  acc_update_device (data_2, size_2);

> +

> +  data_work.c = data_2;

> +  acc_attach((void **) &data_work.c);

> +  #pragma acc serial /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } } */

> +  {

> +    data_work.c[0] = 'd';

> +  }

> +

> +  acc_update_self (data_1, size_1);

> +  acc_update_self (data_2, size_2);

> +

> +  assert (data_1[0] == 'a');

> +  assert (data_2[0] == 'd');

> +

> +  return 0;

> +}

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Julian Brown July 24, 2020, 10:36 p.m. | #2
On Fri, 24 Jul 2020 16:04:02 +0200
Thomas Schwinge <thomas@codesourcery.com> wrote:

> Hi Julian!

> 

> On 2020-06-22T05:14:44-0700, Julian Brown <julian@codesourcery.com>

> wrote:

> > As mentioned in the parent email, this is a fix for PR95590 that

> > detects updates of attached pointers in blocks, and rewrites the

> > attached pointer and resets its attachment counter appropriately. I

> > am however not entirely sure this is desirable or required by the

> > spec: points against are:

> >

> >  - To avoid expensive copies from the device to the host and/or

> > "wrong way" device-to-host splay tree lookups, it requires keeping

> > an extra shadow copy of mapped blocks on the host in order to

> > detect if a host pointer with attachments in the block has been

> > changed between attach operations.  

> 

> I haven't spent too much time trying, but I too have not yet seen a

> way to avoid keeping this state ("shadow copy"), or looking it up on

> demand ("expensive copies from the device to the host").

> 

> I suppose we cannot get the necessary information/state from the

> host-side pointer (value) alone, and/or other state kept in the

> 'splay_tree_key n' etc.?


I don't think so.

A different implementation might keep the attachment counters
associated with the target_mem_desc (on the "target side"), rather than
the splay tree key (the "host side"), in which case the
reset-on-host-pointer-modification might sort-of happen for free. But I
think that would be quite problematic for other reasons with our
current implementation. (Purely speculating, but maybe it "works"
somewhat accidentally for PGI because of the way its host-to-device
pointer mapping is implemented?)

> >    We incur this overhead unconditionally if

> >    attach/detach are in use for what's not likely to be a common

> > use case  

> 

> Is the overhead so bad, though?  As soon as there's an 'attach', we

> have to 'malloc' anyway (can combine the two, as you've done), and the

> checking overhead doesn't seem so bad either?

> 

> Should we reach out to other OpenACC compiler implementors, and ask

> for their understanding/approach to this aspect?


I haven't measured the performance impact (it's probably negligible). It
may be worth trying to get clarification from OpenACC upstream, though.

> >    (it's slightly tricky to write a test case to exercise the

> > behaviour, even -- Thomas's unmodified original for the PR raises

> > an error after the previous patch in this series).  

> 

> Challange accepted!  ;-P (..., but not right now.)


There was a test case attached to the parent email, too :-).

> >  - From a user perspective, I think it's going to be quite easy to

> > get confused wrt. the hidden attachment counter state  

> 

> (Indeed that "hidden" aspect is a bit confusing.  I've even thought

> whether we should add some 'gomp_get_attach_count' function just for

> our own testing purposes.)


Yeah, maybe.

> >    with this kind of

> >    reset-on-host-pointer-modification behaviour.  Mind you,

> > silently *not* doing the update is likewise going to be confusing

> > (the stale device pointer would be updated at present).  Maybe this

> > should be detected as an error instead?  

> 

> I don't understand that, I'm afraid, because as I have quoted in

> <https://gcc.gnu.org/PR95590> "OpenACC 'attach' behavior if already

> attached to different data", OpenACC explicitly mandates the

> "reset-on-host-pointer-modification" behavior, so I don't see a way to

> avoid implementing that?

> 

> >  - The text in "2.6.8. Attachment Counter" *might* contribute to the

> >    argument that this kind of pointer-update detection is not

> > required.  

> 

> Do you think these texts are in conflict in some way (that's not

> obvious to me)?


I'm still not sure that the intended meaning (in OpenACC 2.6, 2.7.2.
"Data Clause Actions", "Attach Action") is what you are reading into
it. See also "2.7.1. Data Specification in Data Clauses", under
Restrictions:

"* In C and C++, modifying pointers in pointer arrays during the data
lifetime, either on the host or on the device, may result in undefined
behavior."

That isn't explicitly about pointers within structs (as we're talking
about here), but is of a similar flavour, I think -- in that
recognizing host pointer modifications in arrays of pointers would
require similar housekeeping in the runtime, but OpenACC 2.6 makes such
modifications undefined behaviour instead.

The text in "2.6.7. Attachment Counter" (in OpenACC 2.6) is
specifically about update operations (acc_update API routines or
equivalent directives), but again, detecting pointer modifications
(on the host side) between successive "attach" operations seems like a
departure from *not* needing to do the same for update operations.

Should we also support modifications of attached pointers (e.g. in
mapped structs) in device-side code? Why or why not? (That wouldn't
be impossible, but the details of how it could work would be ugly
indeed...)

Here's a quick example of "weird" behaviour that would arise with the
pointer-modification detection patch:

#include <assert.h>
#include <stdlib.h>

struct mystr {
  int *ptr;
};

#define N 1024

int 
main (int argc, char *argv[])
{
  int *arr1 = malloc (sizeof (int) * N);
  int *arr2 = malloc (sizeof (int) * N);
  struct mystr s;
 
  for (int i = 0; i < N; i++) 
    { 
      arr1[i] = i;
      arr2[i] = i * 2;
    } 
 
  s.ptr = arr1;

  #pragma acc enter data copyin(s)
 
  #pragma acc data copy(s.ptr[0:N])
  {
    s.ptr = arr2;
    #pragma acc parallel loop copy(s.ptr[0:N])
    for (int i = 0; i < N; i++)
      s.ptr[i] = i * 3;
  }
 
  for (int i = 0; i < N; i++)
    {
      assert (arr1[i] == i);
      assert (arr2[i] == i * 3);
    }
 
  free (arr1);
  free (arr2);
}

With the patch, this gives:

libgomp: attach count underflow

Though of course it doesn't work properly without the
pointer-modification detection patch either.

This example could be made to work, but it would mean *not* resetting
the attachment counter to one on detecting a modified host pointer --
the pointer mapping would be modified but the attachment counter would
be incremented as usual (at the start of the "acc parallel"). That's
arguably the right thing to do perhaps, but it's clearly not what the
spec says, even with your reading.

HTH,

Julian

Patch

diff --git a/libgomp/target.c b/libgomp/target.c
index db6f56a8ff8..076cc2bbbcb 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -691,6 +691,8 @@  gomp_attach_pointer (struct gomp_device_descr *devicep,
 {
   struct splay_tree_key_s s;
   size_t size, idx;
+  char *shadow_block;
+  size_t shadow_size = n->host_end - n->host_start;
 
   if (n == NULL)
     {
@@ -707,9 +709,31 @@  gomp_attach_pointer (struct gomp_device_descr *devicep,
   if (!n->aux)
     n->aux = gomp_malloc_cleared (sizeof (struct splay_tree_aux));
 
+  bool first = false;
+
   if (!n->aux->attach_count)
-    n->aux->attach_count
-      = gomp_malloc_cleared (sizeof (*n->aux->attach_count) * size);
+    {
+      n->aux->attach_count
+	= gomp_malloc_cleared (sizeof (*n->aux->attach_count) * size
+			       + shadow_size);
+      first = true;
+    }
+
+  shadow_block = ((char *) n->aux->attach_count)
+		 + sizeof (*n->aux->attach_count) * size;
+
+  if (first)
+    memcpy (shadow_block, (const void *) n->host_start, shadow_size);
+
+  uintptr_t target = (uintptr_t) *(void **) attach_to;
+  uintptr_t shadow_target
+    = (uintptr_t) *(void **) (shadow_block + attach_to - n->host_start);
+  if (target != shadow_target)
+    {
+      n->aux->attach_count[idx] = 0;
+      memcpy ((char *) shadow_block + attach_to - n->host_start,
+	      (const void *) target, sizeof (void *));
+    }
 
   if (n->aux->attach_count[idx] < UINTPTR_MAX)
     n->aux->attach_count[idx]++;
@@ -723,7 +747,6 @@  gomp_attach_pointer (struct gomp_device_descr *devicep,
     {
       uintptr_t devptr = n->tgt->tgt_start + n->tgt_offset + attach_to
 			 - n->host_start;
-      uintptr_t target = (uintptr_t) *(void **) attach_to;
       splay_tree_key tn;
       uintptr_t data;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-ptr-change-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-ptr-change-1.c
new file mode 100644
index 00000000000..d4d84fdb092
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-ptr-change-1.c
@@ -0,0 +1,74 @@ 
+#include <assert.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+struct str {
+  unsigned char *c;
+};
+
+int main()
+{
+  const int size_1 = sizeof (void *);
+  unsigned char *data_1 = (unsigned char *) malloc(sizeof (void *));
+  assert(data_1);
+  void *data_1_d = acc_create(data_1, size_1);
+  assert(data_1_d);
+  assert(acc_is_present(data_1, size_1));
+
+  const int size_2 = sizeof (void *);
+  unsigned char *data_2 = (unsigned char *) malloc(size_2);
+  assert(data_2);
+  void *data_2_d = acc_create(data_2, size_2);
+  assert(data_2_d);
+  assert(acc_is_present(data_2, size_2));
+
+  struct str data_work;
+  data_work.c = data_1;
+
+  acc_copyin(&data_work, sizeof data_work);
+  assert(acc_is_present(&data_work, sizeof data_work));
+  assert(data_work.c == data_1);
+
+  /* No attach has taken place so far.  We can still do a self-update.  */
+  acc_update_self(&data_work, sizeof data_work);
+  assert(data_work.c == data_1);
+
+  data_1[0] = 'a';
+  data_2[0] = 'b';
+
+  acc_update_device (data_1, size_1);
+  acc_update_device (data_2, size_2);
+
+  acc_attach((void **) &data_work.c);
+  #pragma acc serial /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } } */
+  {
+    data_work.c[0] = 'c';
+  }
+
+  acc_update_self (data_1, size_1);
+  acc_update_self (data_2, size_2);
+
+  assert (data_1[0] == 'c');
+  assert (data_2[0] == 'b');
+
+  data_1[0] = 'a';
+  data_2[0] = 'b';
+
+  acc_update_device (data_1, size_1);
+  acc_update_device (data_2, size_2);
+
+  data_work.c = data_2;
+  acc_attach((void **) &data_work.c);
+  #pragma acc serial /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } } */
+  {
+    data_work.c[0] = 'd';
+  }
+
+  acc_update_self (data_1, size_1);
+  acc_update_self (data_2, size_2);
+
+  assert (data_1[0] == 'a');
+  assert (data_2[0] == 'd');
+
+  return 0;
+}