[1/6,OpenACC,libgomp] Async re-work, interfaces

Message ID e7a4b16c-995a-8f8f-33f7-1310552b7e26@mentor.com
State New
Headers show
Series
  • Async re-work
Related show

Commit Message

Chung-Lin Tang Sept. 25, 2018, 1:10 p.m.
This patch separates out the header interface changes. GOMP_VERSION has been bumped,
and various changes to the plugin interface, and a few libgomp internal functions
declared. The libgomp linkmap updated as well.

Thanks,
Chung-Lin

	include/
	* gomp-constants.h (GOMP_ASYNC_DEFAULT): Define.
	(GOMP_VERSION): Increment for new plugin interface changes.

	libgomp/
	* libgomp-plugin.h (struct goacc_asyncqueue): Declare.
	(struct goacc_asyncqueue_list): Likewise.
	(goacc_aq): Likewise.
	(goacc_aq_list): Likewise.
	(GOMP_OFFLOAD_openacc_register_async_cleanup): Remove.
	(GOMP_OFFLOAD_openacc_async_test): Remove.
	(GOMP_OFFLOAD_openacc_async_test_all): Remove.
	(GOMP_OFFLOAD_openacc_async_wait): Remove.
	(GOMP_OFFLOAD_openacc_async_wait_async): Remove.
	(GOMP_OFFLOAD_openacc_async_wait_all): Remove.
	(GOMP_OFFLOAD_openacc_async_wait_all_async): Remove.
	(GOMP_OFFLOAD_openacc_async_set_async): Remove.
	(GOMP_OFFLOAD_openacc_exec): Adjust declaration.
	(GOMP_OFFLOAD_openacc_cuda_get_stream): Likewise.
	(GOMP_OFFLOAD_openacc_cuda_set_stream): Likewise.

	(GOMP_OFFLOAD_openacc_async_exec): Declare.
	(GOMP_OFFLOAD_openacc_async_construct): Declare.
	(GOMP_OFFLOAD_openacc_async_destruct): Declare.
	(GOMP_OFFLOAD_openacc_async_test): Declare.
	(GOMP_OFFLOAD_openacc_async_synchronize): Declare.
	(GOMP_OFFLOAD_openacc_async_serialize): Declare.
	(GOMP_OFFLOAD_openacc_async_queue_callback): Declare.
	(GOMP_OFFLOAD_openacc_async_host2dev): Declare.
	(GOMP_OFFLOAD_openacc_async_dev2host): Declare.

	* libgomp.h (struct acc_dispatch_t): Define 'async' sub-struct.
	(gomp_acc_insert_pointer): Adjust declaration.
	(gomp_copy_host2dev): New declaration.
	(gomp_copy_dev2host): Likewise.
	(gomp_map_vars_async): Likewise.
	(gomp_unmap_tgt): Likewise.
	(gomp_unmap_vars_async): Likewise.
	(gomp_fini_device): Likewise.

	* libgomp.map (OACC_2.5): Add acc_get_default_async,
	acc_get_default_async_h_, acc_set_default_async, and
	acc_set_default_async_h_.
	(GOMP_PLUGIN_1.0): Remove GOMP_PLUGIN_async_unmap_vars.

Comments

Thomas Schwinge Dec. 14, 2018, 5:52 p.m. | #1
Hi!

A few more -- final? ;-) -- comments:

On Tue, 25 Sep 2018 21:10:21 +0800, Chung-Lin Tang <chunglin_tang@mentor.com> wrote:
> This patch separates out the header interface changes. GOMP_VERSION has been bumped,

> and various changes to the plugin interface, and a few libgomp internal functions

> declared. The libgomp linkmap updated as well.


> --- a/include/gomp-constants.h

> +++ b/include/gomp-constants.h


> @@ -199,7 +200,7 @@ enum gomp_map_kind

>  /* Versions of libgomp and device-specific plugins.  GOMP_VERSION

>     should be incremented whenever an ABI-incompatible change is introduced

>     to the plugin interface defined in libgomp/libgomp.h.  */

> -#define GOMP_VERSION	1

> +#define GOMP_VERSION	2

>  #define GOMP_VERSION_NVIDIA_PTX 1

>  #define GOMP_VERSION_INTEL_MIC 0

>  #define GOMP_VERSION_HSA 0


OK, I think -- but I'm never quite sure whether we do need to increment
"GOMP_VERSION" when only doing libgomp-internal libgomp-plugin changes,
which don't affect the user/GCC side?

GCC encodes "GOMP_VERSION" in "GOMP_offload_register_ver" calls
synthesized by "mkoffload": "GOMP_VERSION_PACK (/* LIB */ GOMP_VERSION,
/* DEV */ GOMP_VERSION_NVIDIA_PTX)", and then at run time libgomp checks
in "GOMP_offload_register_ver", so that we don't try to load offloading
code with an _old_ libgomp that has been compiled with/for the _new_
version.  (Right?)

    void
    GOMP_offload_register_ver (unsigned version, const void *host_table,
                               int target_type, const void *target_data)
    { [...]
      if (GOMP_VERSION_LIB (version) > GOMP_VERSION)
        gomp_fatal ("Library too old for offload (version %u < %u)",
                    GOMP_VERSION, GOMP_VERSION_LIB (version));

I don't have a problem with your change per se, but wouldn't we still be
able to load such code, given that we only changed the libgomp-interal
libgomp-plugin interface?

Am I confused?

Or is the above just an (unavoidable?) side effect, because we do need to
increment "GOMP_VERSION" for this check here:

      if (device->version_func () != GOMP_VERSION)
        {
          err = "plugin version mismatch";
          goto fail;
        }

..., which is making sure that the libgomp proper vs. libgomp-plugin
versions match.


> --- a/libgomp/libgomp.map

> +++ b/libgomp/libgomp.map

> @@ -458,7 +462,6 @@ GOMP_PLUGIN_1.0 {

>  	GOMP_PLUGIN_debug;

>  	GOMP_PLUGIN_error;

>  	GOMP_PLUGIN_fatal;

> -	GOMP_PLUGIN_async_unmap_vars;

>  	GOMP_PLUGIN_acc_thread;

>  };


I think that's fine, but highlighting this again for Jakub, in case
there's an issue with removing a symbol from the libgomp-plugin
interface.


> --- a/libgomp/libgomp-plugin.h

> +++ b/libgomp/libgomp-plugin.h


> +/* Opaque type to represent plugin-dependent implementation of an

> +   OpenACC asynchronous queue.  */

> +struct goacc_asyncqueue;

> +

> +/* Used to keep a list of active asynchronous queues.  */

> +struct goacc_asyncqueue_list

> +{

> +  struct goacc_asyncqueue *aq;

> +  struct goacc_asyncqueue_list *next;

> +};

> +

> +typedef struct goacc_asyncqueue *goacc_aq;

> +typedef struct goacc_asyncqueue_list *goacc_aq_list;


I'm not too fond of such "syntactic sugar" typedefs, but if that's fine
for Jakub to have in libgomp, then I won't object.

I'd be in favor then of "typedef struct N *N" or "typedef struct N *N_t"
variants however, instead of introducing yet another "goacc_aq" acronym
next to "goacc_asyncqueue", and "async queue" or "asynchronous queue" as
used in the descriptive texts (comments, etc.).  Maybe standardize all
these to "asyncqueue", also in the descriptive texts?

OpenACC, by the way, uses the term "device activity queue" (in most?
places...) to describe the underlying mechanism used to implement the
OpenACC "async" clause etc.

Should "struct goacc_asyncqueue_list" and its typedef still be defined
here in "libgomp/libgomp-plugin.h" (for proximity to the other stuff),
even though it's not actually used in the libgomp-plugin interface?

> --- a/libgomp/libgomp.h

> +++ b/libgomp/libgomp.h

> @@ -888,19 +888,23 @@ typedef struct acc_dispatch_t

[...]
> +  struct {

> +    gomp_mutex_t lock;

> +    int nasyncqueue;

> +    struct goacc_asyncqueue **asyncqueue;

> +    struct goacc_asyncqueue_list *active;

[...]
> +  } async;


For "lock" see my comments elsewhere.

That data structure itself should be fine, no need for something more
complex, given that users typically only use a handful of such queues,
with low integer ID async-arguments.

I'd maybe name these members "queues_n", "queues", "queues_active".


As for the following changes, will you please make sure that there is one
common order for these, used in "libgomp/libgomp-plugin.h" function
prototypes, "libgomp/libgomp.h:acc_dispatch_t",
"libgomp/target.c:gomp_load_plugin_for_device", "libgomp/oacc-host.c"
function definitions as well as in "host_dispatch", and the
libgomp-plugin(s) themselves (that's all, I think?).

> --- a/libgomp/libgomp-plugin.h

> +++ b/libgomp/libgomp-plugin.h

> @@ -93,22 +107,31 @@ extern bool GOMP_OFFLOAD_dev2dev (int, void *, const void *, size_t);

>  extern bool GOMP_OFFLOAD_can_run (void *);

>  extern void GOMP_OFFLOAD_run (int, void *, void *, void **);

>  extern void GOMP_OFFLOAD_async_run (int, void *, void *, void **, void *);

> +

>  extern void GOMP_OFFLOAD_openacc_exec (void (*) (void *), size_t, void **,

> -				       void **, int, unsigned *, void *);

> -extern void GOMP_OFFLOAD_openacc_register_async_cleanup (void *, int);

> -extern int GOMP_OFFLOAD_openacc_async_test (int);

> -extern int GOMP_OFFLOAD_openacc_async_test_all (void);

> -extern void GOMP_OFFLOAD_openacc_async_wait (int);

> -extern void GOMP_OFFLOAD_openacc_async_wait_async (int, int);

> -extern void GOMP_OFFLOAD_openacc_async_wait_all (void);

> -extern void GOMP_OFFLOAD_openacc_async_wait_all_async (int);

> -extern void GOMP_OFFLOAD_openacc_async_set_async (int);

> +				       void **, unsigned *, void *);

> +extern void GOMP_OFFLOAD_openacc_async_exec (void (*) (void *), size_t, void **,

> +					     void **, unsigned *, void *,

> +					     struct goacc_asyncqueue *);

> +extern struct goacc_asyncqueue *GOMP_OFFLOAD_openacc_async_construct (void);

> +extern bool GOMP_OFFLOAD_openacc_async_destruct (struct goacc_asyncqueue *);

> +extern int GOMP_OFFLOAD_openacc_async_test (struct goacc_asyncqueue *);

> +extern void GOMP_OFFLOAD_openacc_async_synchronize (struct goacc_asyncqueue *);

> +extern void GOMP_OFFLOAD_openacc_async_serialize (struct goacc_asyncqueue *,

> +						  struct goacc_asyncqueue *);

> +extern void GOMP_OFFLOAD_openacc_async_queue_callback (struct goacc_asyncqueue *,

> +						       void (*)(void *), void *);

> +extern bool GOMP_OFFLOAD_openacc_async_host2dev (int, void *, const void *, size_t,

> +						 struct goacc_asyncqueue *);

> +extern bool GOMP_OFFLOAD_openacc_async_dev2host (int, void *, const void *, size_t,

> +						 struct goacc_asyncqueue *);

>  extern void *GOMP_OFFLOAD_openacc_create_thread_data (int);

>  extern void GOMP_OFFLOAD_openacc_destroy_thread_data (void *);

>  extern void *GOMP_OFFLOAD_openacc_cuda_get_current_device (void);

>  extern void *GOMP_OFFLOAD_openacc_cuda_get_current_context (void);

> -extern void *GOMP_OFFLOAD_openacc_cuda_get_stream (int);

> -extern int GOMP_OFFLOAD_openacc_cuda_set_stream (int, void *);

> +extern void *GOMP_OFFLOAD_openacc_cuda_get_stream (struct goacc_asyncqueue *);

> +extern int GOMP_OFFLOAD_openacc_cuda_set_stream (struct goacc_asyncqueue *,

> +						 void *);

>  

>  #ifdef __cplusplus

>  }


> --- a/libgomp/libgomp.h

> +++ b/libgomp/libgomp.h

> @@ -888,19 +888,23 @@ typedef struct acc_dispatch_t

>    /* Execute.  */

>    __typeof (GOMP_OFFLOAD_openacc_exec) *exec_func;

>  

> -  /* Async cleanup callback registration.  */

> -  __typeof (GOMP_OFFLOAD_openacc_register_async_cleanup)

> -    *register_async_cleanup_func;

> -

> -  /* Asynchronous routines.  */

> -  __typeof (GOMP_OFFLOAD_openacc_async_test) *async_test_func;

> -  __typeof (GOMP_OFFLOAD_openacc_async_test_all) *async_test_all_func;

> -  __typeof (GOMP_OFFLOAD_openacc_async_wait) *async_wait_func;

> -  __typeof (GOMP_OFFLOAD_openacc_async_wait_async) *async_wait_async_func;

> -  __typeof (GOMP_OFFLOAD_openacc_async_wait_all) *async_wait_all_func;

> -  __typeof (GOMP_OFFLOAD_openacc_async_wait_all_async)

> -    *async_wait_all_async_func;

> -  __typeof (GOMP_OFFLOAD_openacc_async_set_async) *async_set_async_func;

> +  struct {

> +    gomp_mutex_t lock;

> +    int nasyncqueue;

> +    struct goacc_asyncqueue **asyncqueue;

> +    struct goacc_asyncqueue_list *active;

> +

> +    __typeof (GOMP_OFFLOAD_openacc_async_construct) *construct_func;

> +    __typeof (GOMP_OFFLOAD_openacc_async_destruct) *destruct_func;

> +    __typeof (GOMP_OFFLOAD_openacc_async_test) *test_func;

> +    __typeof (GOMP_OFFLOAD_openacc_async_synchronize) *synchronize_func;

> +    __typeof (GOMP_OFFLOAD_openacc_async_serialize) *serialize_func;

> +    __typeof (GOMP_OFFLOAD_openacc_async_queue_callback) *queue_callback_func;

> +

> +    __typeof (GOMP_OFFLOAD_openacc_async_exec) *exec_func;

> +    __typeof (GOMP_OFFLOAD_openacc_async_host2dev) *host2dev_func;

> +    __typeof (GOMP_OFFLOAD_openacc_async_dev2host) *dev2host_func;

> +  } async;

>  

>    /* Create/destroy TLS data.  */

>    __typeof (GOMP_OFFLOAD_openacc_create_thread_data) *create_thread_data_func;



Grüße
 Thomas
Chung-Lin Tang Dec. 17, 2018, 10:27 a.m. | #2
On 2018/12/15 1:52 AM, Thomas Schwinge wrote:
>> --- a/include/gomp-constants.h

>> +++ b/include/gomp-constants.h

> 

>> @@ -199,7 +200,7 @@ enum gomp_map_kind

>>   /* Versions of libgomp and device-specific plugins.  GOMP_VERSION

>>      should be incremented whenever an ABI-incompatible change is introduced

>>      to the plugin interface defined in libgomp/libgomp.h.  */

>> -#define GOMP_VERSION	1

>> +#define GOMP_VERSION	2

>>   #define GOMP_VERSION_NVIDIA_PTX 1

>>   #define GOMP_VERSION_INTEL_MIC 0

>>   #define GOMP_VERSION_HSA 0

> 

> OK, I think -- but I'm never quite sure whether we do need to increment

> "GOMP_VERSION" when only doing libgomp-internal libgomp-plugin changes,

> which don't affect the user/GCC side?

> 

> GCC encodes "GOMP_VERSION" in "GOMP_offload_register_ver" calls

> synthesized by "mkoffload": "GOMP_VERSION_PACK (/* LIB */ GOMP_VERSION,

> /* DEV */ GOMP_VERSION_NVIDIA_PTX)", and then at run time libgomp checks

> in "GOMP_offload_register_ver", so that we don't try to load offloading

> code with an _old_ libgomp that has been compiled with/for the _new_

> version.  (Right?)

> 

>      void

>      GOMP_offload_register_ver (unsigned version, const void *host_table,

>                                 int target_type, const void *target_data)

>      { [...]

>        if (GOMP_VERSION_LIB (version) > GOMP_VERSION)

>          gomp_fatal ("Library too old for offload (version %u < %u)",

>                      GOMP_VERSION, GOMP_VERSION_LIB (version));

> 

> I don't have a problem with your change per se, but wouldn't we still be

> able to load such code, given that we only changed the libgomp-interal

> libgomp-plugin interface?

> 

> Am I confused?

> 

> Or is the above just an (unavoidable?) side effect, because we do need to

> increment "GOMP_VERSION" for this check here:

> 

>        if (device->version_func () != GOMP_VERSION)

>          {

>            err = "plugin version mismatch";

>            goto fail;

>          }

> 

> ..., which is making sure that the libgomp proper vs. libgomp-plugin

> versions match.


The intended effect is exactly to ensure libgomp proper vs plugin compatibility.
We don't ensure backward/forward compatibility between libgomp/plugin, and
this version equality check is what enforces that.

>> --- a/libgomp/libgomp.map

>> +++ b/libgomp/libgomp.map

>> @@ -458,7 +462,6 @@ GOMP_PLUGIN_1.0 {

>>   	GOMP_PLUGIN_debug;

>>   	GOMP_PLUGIN_error;

>>   	GOMP_PLUGIN_fatal;

>> -	GOMP_PLUGIN_async_unmap_vars;

>>   	GOMP_PLUGIN_acc_thread;

>>   };

> 

> I think that's fine, but highlighting this again for Jakub, in case

> there's an issue with removing a symbol from the libgomp-plugin

> interface.


Since we don't enforce the libgomp/plugin interface to be compatible across
versions, I expect this to be okay.

>> --- a/libgomp/libgomp-plugin.h

>> +++ b/libgomp/libgomp-plugin.h

> 

>> +/* Opaque type to represent plugin-dependent implementation of an

>> +   OpenACC asynchronous queue.  */

>> +struct goacc_asyncqueue;

>> +

>> +/* Used to keep a list of active asynchronous queues.  */

>> +struct goacc_asyncqueue_list

>> +{

>> +  struct goacc_asyncqueue *aq;

>> +  struct goacc_asyncqueue_list *next;

>> +};

>> +

>> +typedef struct goacc_asyncqueue *goacc_aq;

>> +typedef struct goacc_asyncqueue_list *goacc_aq_list;

> 

> I'm not too fond of such "syntactic sugar" typedefs, but if that's fine

> for Jakub to have in libgomp, then I won't object.

> 

> I'd be in favor then of "typedef struct N *N" or "typedef struct N *N_t"

> variants however, instead of introducing yet another "goacc_aq" acronym

> next to "goacc_asyncqueue", and "async queue" or "asynchronous queue" as

> used in the descriptive texts (comments, etc.).  Maybe standardize all

> these to "asyncqueue", also in the descriptive texts?

> 

> OpenACC, by the way, uses the term "device activity queue" (in most?

> places...) to describe the underlying mechanism used to implement the

> OpenACC "async" clause etc.


Please, no more name style changes, please... Orz (beg)

I think I originally thought of "asyncqueue" too, but I felt a shorthand was
needed in many places, and the straightforward "aq" seemed too short to be
comfortably informative. The "goacc_" prefix seemed just right.

Besides, the crucial name convention I had in mind was "queues" in OpenACC,
versus "streams" in CUDA. I don't see much value in further spinning on this
name.

> Should "struct goacc_asyncqueue_list" and its typedef still be defined

> here in "libgomp/libgomp-plugin.h" (for proximity to the other stuff),

> even though it's not actually used in the libgomp-plugin interface?


It looks like it can indeed be placed in libgomp.h, maybe just before the
declaration of acc_dispatch_t.

I originally placed it there in libgomp-plugin.h simply to
collect the declarations related to goacc_asyncqueue together in one place.
Is separating them really better?

>> --- a/libgomp/libgomp.h

>> +++ b/libgomp/libgomp.h

>> @@ -888,19 +888,23 @@ typedef struct acc_dispatch_t

> [...]

>> +  struct {

>> +    gomp_mutex_t lock;

>> +    int nasyncqueue;

>> +    struct goacc_asyncqueue **asyncqueue;

>> +    struct goacc_asyncqueue_list *active;

> [...]

>> +  } async;

> 

> For "lock" see my comments elsewhere.


I'll respond to that part there later as well.

> That data structure itself should be fine, no need for something more

> complex, given that users typically only use a handful of such queues,

> with low integer ID async-arguments.

> I'd maybe name these members "queues_n", "queues", "queues_active".

> 

> 

> As for the following changes, will you please make sure that there is one

> common order for these, used in "libgomp/libgomp-plugin.h" function

> prototypes, "libgomp/libgomp.h:acc_dispatch_t",

> "libgomp/target.c:gomp_load_plugin_for_device", "libgomp/oacc-host.c"

> function definitions as well as in "host_dispatch", and the

> libgomp-plugin(s) themselves (that's all, I think?).


Okay, I'll update that.

Thanks,
Chung-Lin



>> --- a/libgomp/libgomp-plugin.h

>> +++ b/libgomp/libgomp-plugin.h

>> @@ -93,22 +107,31 @@ extern bool GOMP_OFFLOAD_dev2dev (int, void *, const void *, size_t);

>>   extern bool GOMP_OFFLOAD_can_run (void *);

>>   extern void GOMP_OFFLOAD_run (int, void *, void *, void **);

>>   extern void GOMP_OFFLOAD_async_run (int, void *, void *, void **, void *);

>> +

>>   extern void GOMP_OFFLOAD_openacc_exec (void (*) (void *), size_t, void **,

>> -				       void **, int, unsigned *, void *);

>> -extern void GOMP_OFFLOAD_openacc_register_async_cleanup (void *, int);

>> -extern int GOMP_OFFLOAD_openacc_async_test (int);

>> -extern int GOMP_OFFLOAD_openacc_async_test_all (void);

>> -extern void GOMP_OFFLOAD_openacc_async_wait (int);

>> -extern void GOMP_OFFLOAD_openacc_async_wait_async (int, int);

>> -extern void GOMP_OFFLOAD_openacc_async_wait_all (void);

>> -extern void GOMP_OFFLOAD_openacc_async_wait_all_async (int);

>> -extern void GOMP_OFFLOAD_openacc_async_set_async (int);

>> +				       void **, unsigned *, void *);

>> +extern void GOMP_OFFLOAD_openacc_async_exec (void (*) (void *), size_t, void **,

>> +					     void **, unsigned *, void *,

>> +					     struct goacc_asyncqueue *);

>> +extern struct goacc_asyncqueue *GOMP_OFFLOAD_openacc_async_construct (void);

>> +extern bool GOMP_OFFLOAD_openacc_async_destruct (struct goacc_asyncqueue *);

>> +extern int GOMP_OFFLOAD_openacc_async_test (struct goacc_asyncqueue *);

>> +extern void GOMP_OFFLOAD_openacc_async_synchronize (struct goacc_asyncqueue *);

>> +extern void GOMP_OFFLOAD_openacc_async_serialize (struct goacc_asyncqueue *,

>> +						  struct goacc_asyncqueue *);

>> +extern void GOMP_OFFLOAD_openacc_async_queue_callback (struct goacc_asyncqueue *,

>> +						       void (*)(void *), void *);

>> +extern bool GOMP_OFFLOAD_openacc_async_host2dev (int, void *, const void *, size_t,

>> +						 struct goacc_asyncqueue *);

>> +extern bool GOMP_OFFLOAD_openacc_async_dev2host (int, void *, const void *, size_t,

>> +						 struct goacc_asyncqueue *);

>>   extern void *GOMP_OFFLOAD_openacc_create_thread_data (int);

>>   extern void GOMP_OFFLOAD_openacc_destroy_thread_data (void *);

>>   extern void *GOMP_OFFLOAD_openacc_cuda_get_current_device (void);

>>   extern void *GOMP_OFFLOAD_openacc_cuda_get_current_context (void);

>> -extern void *GOMP_OFFLOAD_openacc_cuda_get_stream (int);

>> -extern int GOMP_OFFLOAD_openacc_cuda_set_stream (int, void *);

>> +extern void *GOMP_OFFLOAD_openacc_cuda_get_stream (struct goacc_asyncqueue *);

>> +extern int GOMP_OFFLOAD_openacc_cuda_set_stream (struct goacc_asyncqueue *,

>> +						 void *);

>>   

>>   #ifdef __cplusplus

>>   }

> 

>> --- a/libgomp/libgomp.h

>> +++ b/libgomp/libgomp.h

>> @@ -888,19 +888,23 @@ typedef struct acc_dispatch_t

>>     /* Execute.  */

>>     __typeof (GOMP_OFFLOAD_openacc_exec) *exec_func;

>>   

>> -  /* Async cleanup callback registration.  */

>> -  __typeof (GOMP_OFFLOAD_openacc_register_async_cleanup)

>> -    *register_async_cleanup_func;

>> -

>> -  /* Asynchronous routines.  */

>> -  __typeof (GOMP_OFFLOAD_openacc_async_test) *async_test_func;

>> -  __typeof (GOMP_OFFLOAD_openacc_async_test_all) *async_test_all_func;

>> -  __typeof (GOMP_OFFLOAD_openacc_async_wait) *async_wait_func;

>> -  __typeof (GOMP_OFFLOAD_openacc_async_wait_async) *async_wait_async_func;

>> -  __typeof (GOMP_OFFLOAD_openacc_async_wait_all) *async_wait_all_func;

>> -  __typeof (GOMP_OFFLOAD_openacc_async_wait_all_async)

>> -    *async_wait_all_async_func;

>> -  __typeof (GOMP_OFFLOAD_openacc_async_set_async) *async_set_async_func;

>> +  struct {

>> +    gomp_mutex_t lock;

>> +    int nasyncqueue;

>> +    struct goacc_asyncqueue **asyncqueue;

>> +    struct goacc_asyncqueue_list *active;

>> +

>> +    __typeof (GOMP_OFFLOAD_openacc_async_construct) *construct_func;

>> +    __typeof (GOMP_OFFLOAD_openacc_async_destruct) *destruct_func;

>> +    __typeof (GOMP_OFFLOAD_openacc_async_test) *test_func;

>> +    __typeof (GOMP_OFFLOAD_openacc_async_synchronize) *synchronize_func;

>> +    __typeof (GOMP_OFFLOAD_openacc_async_serialize) *serialize_func;

>> +    __typeof (GOMP_OFFLOAD_openacc_async_queue_callback) *queue_callback_func;

>> +

>> +    __typeof (GOMP_OFFLOAD_openacc_async_exec) *exec_func;

>> +    __typeof (GOMP_OFFLOAD_openacc_async_host2dev) *host2dev_func;

>> +    __typeof (GOMP_OFFLOAD_openacc_async_dev2host) *dev2host_func;

>> +  } async;

>>   

>>     /* Create/destroy TLS data.  */

>>     __typeof (GOMP_OFFLOAD_openacc_create_thread_data) *create_thread_data_func;

> 

> 

> Grüße

>   Thomas

>
Jakub Jelinek Dec. 18, 2018, 12:36 p.m. | #3
On Fri, Dec 14, 2018 at 06:52:20PM +0100, Thomas Schwinge wrote:
> > --- a/include/gomp-constants.h

> > +++ b/include/gomp-constants.h

> 

> > @@ -199,7 +200,7 @@ enum gomp_map_kind

> >  /* Versions of libgomp and device-specific plugins.  GOMP_VERSION

> >     should be incremented whenever an ABI-incompatible change is introduced

> >     to the plugin interface defined in libgomp/libgomp.h.  */

> > -#define GOMP_VERSION	1

> > +#define GOMP_VERSION	2

> >  #define GOMP_VERSION_NVIDIA_PTX 1

> >  #define GOMP_VERSION_INTEL_MIC 0

> >  #define GOMP_VERSION_HSA 0

> 

> OK, I think -- but I'm never quite sure whether we do need to increment

> "GOMP_VERSION" when only doing libgomp-internal libgomp-plugin changes,

> which don't affect the user/GCC side?

> 

> GCC encodes "GOMP_VERSION" in "GOMP_offload_register_ver" calls

> synthesized by "mkoffload": "GOMP_VERSION_PACK (/* LIB */ GOMP_VERSION,

> /* DEV */ GOMP_VERSION_NVIDIA_PTX)", and then at run time libgomp checks

> in "GOMP_offload_register_ver", so that we don't try to load offloading

> code with an _old_ libgomp that has been compiled with/for the _new_

> version.  (Right?)


To me it looks wrong to tie two different things in the same version number.
Just because we are changing something in the libgomp vs. the corresponding
plugin APIs doesn't mean we need to rebuild all binaries and libraries that
have offloading code in it.
So, IMHO GOMP_VERSION should be bumped only if we do a change that requires
the offloading data to be changed, and either have an additional internal
version to make sure that the plugin are kept in sync with libgomp, or just
figure that out because dlsym will fail on some of the new symbols in the
plugin.

> > --- a/libgomp/libgomp.map

> > +++ b/libgomp/libgomp.map

> > @@ -458,7 +462,6 @@ GOMP_PLUGIN_1.0 {

> >  	GOMP_PLUGIN_debug;

> >  	GOMP_PLUGIN_error;

> >  	GOMP_PLUGIN_fatal;

> > -	GOMP_PLUGIN_async_unmap_vars;

> >  	GOMP_PLUGIN_acc_thread;

> >  };

> 

> I think that's fine, but highlighting this again for Jakub, in case

> there's an issue with removing a symbol from the libgomp-plugin

> interface.


I'd prefer not to remove symbols from libgomp.so.*.  You can
do a gomp_fatal in it.
> 

> 

> > --- a/libgomp/libgomp-plugin.h

> > +++ b/libgomp/libgomp-plugin.h

> 

> > +/* Opaque type to represent plugin-dependent implementation of an

> > +   OpenACC asynchronous queue.  */

> > +struct goacc_asyncqueue;

> > +

> > +/* Used to keep a list of active asynchronous queues.  */

> > +struct goacc_asyncqueue_list

> > +{

> > +  struct goacc_asyncqueue *aq;

> > +  struct goacc_asyncqueue_list *next;

> > +};

> > +

> > +typedef struct goacc_asyncqueue *goacc_aq;

> > +typedef struct goacc_asyncqueue_list *goacc_aq_list;

> 

> I'm not too fond of such "syntactic sugar" typedefs, but if that's fine

> for Jakub to have in libgomp, then I won't object.


If it helps with making sure the formatting of the code isn't too ugly,
yes, otherwise no.

> I'd be in favor then of "typedef struct N *N" or "typedef struct N *N_t"


Please avoid *_t, that is reserved in POSIX.

	Jakub
Chung-Lin Tang Dec. 18, 2018, 2:03 p.m. | #4
On 2018/12/18 8:36 PM, Jakub Jelinek wrote:
> On Fri, Dec 14, 2018 at 06:52:20PM +0100, Thomas Schwinge wrote:

>>> --- a/include/gomp-constants.h

>>> +++ b/include/gomp-constants.h

>>

>>> @@ -199,7 +200,7 @@ enum gomp_map_kind

>>>   /* Versions of libgomp and device-specific plugins.  GOMP_VERSION

>>>      should be incremented whenever an ABI-incompatible change is introduced

>>>      to the plugin interface defined in libgomp/libgomp.h.  */

>>> -#define GOMP_VERSION	1

>>> +#define GOMP_VERSION	2

>>>   #define GOMP_VERSION_NVIDIA_PTX 1

>>>   #define GOMP_VERSION_INTEL_MIC 0

>>>   #define GOMP_VERSION_HSA 0

>>

>> OK, I think -- but I'm never quite sure whether we do need to increment

>> "GOMP_VERSION" when only doing libgomp-internal libgomp-plugin changes,

>> which don't affect the user/GCC side?

>>

>> GCC encodes "GOMP_VERSION" in "GOMP_offload_register_ver" calls

>> synthesized by "mkoffload": "GOMP_VERSION_PACK (/* LIB */ GOMP_VERSION,

>> /* DEV */ GOMP_VERSION_NVIDIA_PTX)", and then at run time libgomp checks

>> in "GOMP_offload_register_ver", so that we don't try to load offloading

>> code with an _old_ libgomp that has been compiled with/for the _new_

>> version.  (Right?)

> 

> To me it looks wrong to tie two different things in the same version number.

> Just because we are changing something in the libgomp vs. the corresponding

> plugin APIs doesn't mean we need to rebuild all binaries and libraries that

> have offloading code in it.


The GOMP_offload_register_ver test is for "> GOMP_VERSION", so a wrt GOMP_VERSION's value
a libgomp can be too old, but never too new. It should not require a rebuild of programs
with offloading just because of this.

> So, IMHO GOMP_VERSION should be bumped only if we do a change that requires

> the offloading data to be changed, and either have an additional internal

> version to make sure that the plugin are kept in sync with libgomp, or just

> figure that out because dlsym will fail on some of the new symbols in the

> plugin.


We can of course create a new symbol version number specifically for the libgomp/plugin
interface.

I'll update this.

>>> --- a/libgomp/libgomp.map

>>> +++ b/libgomp/libgomp.map

>>> @@ -458,7 +462,6 @@ GOMP_PLUGIN_1.0 {

>>>   	GOMP_PLUGIN_debug;

>>>   	GOMP_PLUGIN_error;

>>>   	GOMP_PLUGIN_fatal;

>>> -	GOMP_PLUGIN_async_unmap_vars;

>>>   	GOMP_PLUGIN_acc_thread;

>>>   };

>>

>> I think that's fine, but highlighting this again for Jakub, in case

>> there's an issue with removing a symbol from the libgomp-plugin

>> interface.

> 

> I'd prefer not to remove symbols from libgomp.so.*.  You can

> do a gomp_fatal in it.


Okay, then.

>>> --- a/libgomp/libgomp-plugin.h

>>> +++ b/libgomp/libgomp-plugin.h

>>

>>> +/* Opaque type to represent plugin-dependent implementation of an

>>> +   OpenACC asynchronous queue.  */

>>> +struct goacc_asyncqueue;

>>> +

>>> +/* Used to keep a list of active asynchronous queues.  */

>>> +struct goacc_asyncqueue_list

>>> +{

>>> +  struct goacc_asyncqueue *aq;

>>> +  struct goacc_asyncqueue_list *next;

>>> +};

>>> +

>>> +typedef struct goacc_asyncqueue *goacc_aq;

>>> +typedef struct goacc_asyncqueue_list *goacc_aq_list;

>>

>> I'm not too fond of such "syntactic sugar" typedefs, but if that's fine

>> for Jakub to have in libgomp, then I won't object.

> 

> If it helps with making sure the formatting of the code isn't too ugly,

> yes, otherwise no.


Thanks, formatting was exactly my intention.

Chung-Lin

>> I'd be in favor then of "typedef struct N *N" or "typedef struct N *N_t"

> 

> Please avoid *_t, that is reserved in POSIX.

> 

> 	Jakub

>
Chung-Lin Tang Dec. 18, 2018, 3:01 p.m. | #5
On 2018/12/15 1:52 AM, Thomas Schwinge wrote:
> As for the following changes, will you please make sure that there is one

> common order for these, used in "libgomp/libgomp-plugin.h" function

> prototypes, "libgomp/libgomp.h:acc_dispatch_t",

> "libgomp/target.c:gomp_load_plugin_for_device", "libgomp/oacc-host.c"

> function definitions as well as in "host_dispatch", and the

> libgomp-plugin(s) themselves (that's all, I think?).

> 

>> --- a/libgomp/libgomp-plugin.h

>> +++ b/libgomp/libgomp-plugin.h

>> @@ -93,22 +107,31 @@ extern bool GOMP_OFFLOAD_dev2dev (int, void *, const void *, size_t);

>>   extern bool GOMP_OFFLOAD_can_run (void *);

>>   extern void GOMP_OFFLOAD_run (int, void *, void *, void **);

>>   extern void GOMP_OFFLOAD_async_run (int, void *, void *, void **, void *);

>> +

>>   extern void GOMP_OFFLOAD_openacc_exec (void (*) (void *), size_t, void **,

>> -				       void **, int, unsigned *, void *);

>> -extern void GOMP_OFFLOAD_openacc_register_async_cleanup (void *, int);

>> -extern int GOMP_OFFLOAD_openacc_async_test (int);

>> -extern int GOMP_OFFLOAD_openacc_async_test_all (void);

>> -extern void GOMP_OFFLOAD_openacc_async_wait (int);

>> -extern void GOMP_OFFLOAD_openacc_async_wait_async (int, int);

>> -extern void GOMP_OFFLOAD_openacc_async_wait_all (void);

>> -extern void GOMP_OFFLOAD_openacc_async_wait_all_async (int);

>> -extern void GOMP_OFFLOAD_openacc_async_set_async (int);

>> +				       void **, unsigned *, void *);

>> +extern void GOMP_OFFLOAD_openacc_async_exec (void (*) (void *), size_t, void **,

>> +					     void **, unsigned *, void *,

>> +					     struct goacc_asyncqueue *);

>> +extern struct goacc_asyncqueue *GOMP_OFFLOAD_openacc_async_construct (void);

>> +extern bool GOMP_OFFLOAD_openacc_async_destruct (struct goacc_asyncqueue *);

>> +extern int GOMP_OFFLOAD_openacc_async_test (struct goacc_asyncqueue *);

>> +extern void GOMP_OFFLOAD_openacc_async_synchronize (struct goacc_asyncqueue *);

>> +extern void GOMP_OFFLOAD_openacc_async_serialize (struct goacc_asyncqueue *,

>> +						  struct goacc_asyncqueue *);

>> +extern void GOMP_OFFLOAD_openacc_async_queue_callback (struct goacc_asyncqueue *,

>> +						       void (*)(void *), void *);

>> +extern bool GOMP_OFFLOAD_openacc_async_host2dev (int, void *, const void *, size_t,

>> +						 struct goacc_asyncqueue *);

>> +extern bool GOMP_OFFLOAD_openacc_async_dev2host (int, void *, const void *, size_t,

>> +						 struct goacc_asyncqueue *);


This patch revises the ordering of the above functions/hooks to be consistent
across libgomp, and un-deletes goacc_async_unmap_vars in libgomp.map.

Chung-Lin
Index: libgomp/libgomp.h
===================================================================
--- libgomp/libgomp.h	(revision 267226)
+++ libgomp/libgomp.h	(working copy)
@@ -949,25 +949,29 @@ typedef struct acc_dispatch_t
   /* Execute.  */
   __typeof (GOMP_OFFLOAD_openacc_exec) *exec_func;
 
-  /* Async cleanup callback registration.  */
-  __typeof (GOMP_OFFLOAD_openacc_register_async_cleanup)
-    *register_async_cleanup_func;
-
-  /* Asynchronous routines.  */
-  __typeof (GOMP_OFFLOAD_openacc_async_test) *async_test_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_test_all) *async_test_all_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_wait) *async_wait_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_wait_async) *async_wait_async_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_wait_all) *async_wait_all_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_wait_all_async)
-    *async_wait_all_async_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_set_async) *async_set_async_func;
-
   /* Create/destroy TLS data.  */
   __typeof (GOMP_OFFLOAD_openacc_create_thread_data) *create_thread_data_func;
   __typeof (GOMP_OFFLOAD_openacc_destroy_thread_data)
     *destroy_thread_data_func;
+  
+  struct {
+    gomp_mutex_t lock;
+    int nasyncqueue;
+    struct goacc_asyncqueue **asyncqueue;
+    struct goacc_asyncqueue_list *active;
 
+    __typeof (GOMP_OFFLOAD_openacc_async_construct) *construct_func;
+    __typeof (GOMP_OFFLOAD_openacc_async_destruct) *destruct_func;
+    __typeof (GOMP_OFFLOAD_openacc_async_test) *test_func;
+    __typeof (GOMP_OFFLOAD_openacc_async_synchronize) *synchronize_func;
+    __typeof (GOMP_OFFLOAD_openacc_async_serialize) *serialize_func;
+    __typeof (GOMP_OFFLOAD_openacc_async_queue_callback) *queue_callback_func;
+
+    __typeof (GOMP_OFFLOAD_openacc_async_exec) *exec_func;
+    __typeof (GOMP_OFFLOAD_openacc_async_dev2host) *dev2host_func;
+    __typeof (GOMP_OFFLOAD_openacc_async_host2dev) *host2dev_func;
+  } async;
+
   /* NVIDIA target specific routines.  */
   struct {
     __typeof (GOMP_OFFLOAD_openacc_cuda_get_current_device)
@@ -1053,17 +1057,33 @@ enum gomp_map_vars_kind
   GOMP_MAP_VARS_ENTER_DATA
 };
 
-extern void gomp_acc_insert_pointer (size_t, void **, size_t *, void *);
+extern void gomp_acc_insert_pointer (size_t, void **, size_t *, void *, int);
 extern void gomp_acc_remove_pointer (void *, size_t, bool, int, int, int);
 extern void gomp_acc_declare_allocate (bool, size_t, void **, size_t *,
 				       unsigned short *);
+struct gomp_coalesce_buf;
+extern void gomp_copy_host2dev (struct gomp_device_descr *,
+				struct goacc_asyncqueue *, void *, const void *,
+				size_t, struct gomp_coalesce_buf *);
+extern void gomp_copy_dev2host (struct gomp_device_descr *,
+				struct goacc_asyncqueue *, void *, const void *,
+				size_t);
 
 extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
 					      size_t, void **, void **,
 					      size_t *, void *, bool,
 					      enum gomp_map_vars_kind);
+extern struct target_mem_desc *gomp_map_vars_async (struct gomp_device_descr *,
+						    struct goacc_asyncqueue *,
+						    size_t, void **, void **,
+						    size_t *, void *, bool,
+						    enum gomp_map_vars_kind);
+extern void gomp_unmap_tgt (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
+extern void gomp_unmap_vars_async (struct target_mem_desc *, bool,
+				   struct goacc_asyncqueue *);
 extern void gomp_init_device (struct gomp_device_descr *);
+extern bool gomp_fini_device (struct gomp_device_descr *);
 extern void gomp_free_memmap (struct splay_tree_s *);
 extern void gomp_unload_device (struct gomp_device_descr *);
 extern bool gomp_remove_var (struct gomp_device_descr *, splay_tree_key);
Index: libgomp/libgomp.map
===================================================================
--- libgomp/libgomp.map	(revision 267226)
+++ libgomp/libgomp.map	(working copy)
@@ -464,8 +464,12 @@ OACC_2.5 {
 	acc_delete_finalize_async_32_h_;
 	acc_delete_finalize_async_64_h_;
 	acc_delete_finalize_async_array_h_;
+	acc_get_default_async;
+	acc_get_default_async_h_;
 	acc_memcpy_from_device_async;
 	acc_memcpy_to_device_async;
+	acc_set_default_async;
+	acc_set_default_async_h_;
 	acc_update_device_async;
 	acc_update_device_async_32_h_;
 	acc_update_device_async_64_h_;
Index: libgomp/libgomp-plugin.h
===================================================================
--- libgomp/libgomp-plugin.h	(revision 267226)
+++ libgomp/libgomp-plugin.h	(working copy)
@@ -53,6 +53,20 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_HSA = 7
 };
 
+/* Opaque type to represent plugin-dependent implementation of an
+   OpenACC asynchronous queue.  */
+struct goacc_asyncqueue;
+
+/* Used to keep a list of active asynchronous queues.  */
+struct goacc_asyncqueue_list
+{
+  struct goacc_asyncqueue *aq;
+  struct goacc_asyncqueue_list *next;
+};
+
+typedef struct goacc_asyncqueue *goacc_aq;
+typedef struct goacc_asyncqueue_list *goacc_aq_list;
+
 /* Auxiliary struct, used for transferring pairs of addresses from plugin
    to libgomp.  */
 struct addr_pair
@@ -93,22 +107,31 @@ extern bool GOMP_OFFLOAD_dev2dev (int, void *, con
 extern bool GOMP_OFFLOAD_can_run (void *);
 extern void GOMP_OFFLOAD_run (int, void *, void *, void **);
 extern void GOMP_OFFLOAD_async_run (int, void *, void *, void **, void *);
+
 extern void GOMP_OFFLOAD_openacc_exec (void (*) (void *), size_t, void **,
-				       void **, int, unsigned *, void *);
-extern void GOMP_OFFLOAD_openacc_register_async_cleanup (void *, int);
-extern int GOMP_OFFLOAD_openacc_async_test (int);
-extern int GOMP_OFFLOAD_openacc_async_test_all (void);
-extern void GOMP_OFFLOAD_openacc_async_wait (int);
-extern void GOMP_OFFLOAD_openacc_async_wait_async (int, int);
-extern void GOMP_OFFLOAD_openacc_async_wait_all (void);
-extern void GOMP_OFFLOAD_openacc_async_wait_all_async (int);
-extern void GOMP_OFFLOAD_openacc_async_set_async (int);
+				       void **, unsigned *, void *);
 extern void *GOMP_OFFLOAD_openacc_create_thread_data (int);
 extern void GOMP_OFFLOAD_openacc_destroy_thread_data (void *);
+extern struct goacc_asyncqueue *GOMP_OFFLOAD_openacc_async_construct (void);
+extern bool GOMP_OFFLOAD_openacc_async_destruct (struct goacc_asyncqueue *);
+extern int GOMP_OFFLOAD_openacc_async_test (struct goacc_asyncqueue *);
+extern void GOMP_OFFLOAD_openacc_async_synchronize (struct goacc_asyncqueue *);
+extern void GOMP_OFFLOAD_openacc_async_serialize (struct goacc_asyncqueue *,
+						  struct goacc_asyncqueue *);
+extern void GOMP_OFFLOAD_openacc_async_queue_callback (struct goacc_asyncqueue *,
+						       void (*)(void *), void *);
+extern void GOMP_OFFLOAD_openacc_async_exec (void (*) (void *), size_t, void **,
+					     void **, unsigned *, void *,
+					     struct goacc_asyncqueue *);
+extern bool GOMP_OFFLOAD_openacc_async_dev2host (int, void *, const void *, size_t,
+						 struct goacc_asyncqueue *);
+extern bool GOMP_OFFLOAD_openacc_async_host2dev (int, void *, const void *, size_t,
+						 struct goacc_asyncqueue *);
 extern void *GOMP_OFFLOAD_openacc_cuda_get_current_device (void);
 extern void *GOMP_OFFLOAD_openacc_cuda_get_current_context (void);
-extern void *GOMP_OFFLOAD_openacc_cuda_get_stream (int);
-extern int GOMP_OFFLOAD_openacc_cuda_set_stream (int, void *);
+extern void *GOMP_OFFLOAD_openacc_cuda_get_stream (struct goacc_asyncqueue *);
+extern int GOMP_OFFLOAD_openacc_cuda_set_stream (struct goacc_asyncqueue *,
+						 void *);
 
 #ifdef __cplusplus
 }
diff -ru trunk-orig/libgomp/libgomp.h trunk-work/libgomp/libgomp.h
--- trunk-orig/libgomp/libgomp.h	2018-12-14 18:31:07.487203770 +0800
+++ trunk-work/libgomp/libgomp.h	2018-12-18 22:24:06.351190428 +0800
@@ -949,6 +949,11 @@
   /* Execute.  */
   __typeof (GOMP_OFFLOAD_openacc_exec) *exec_func;
 
+  /* Create/destroy TLS data.  */
+  __typeof (GOMP_OFFLOAD_openacc_create_thread_data) *create_thread_data_func;
+  __typeof (GOMP_OFFLOAD_openacc_destroy_thread_data)
+    *destroy_thread_data_func;
+  
   struct {
     gomp_mutex_t lock;
     int nasyncqueue;
@@ -963,15 +968,10 @@
     __typeof (GOMP_OFFLOAD_openacc_async_queue_callback) *queue_callback_func;
 
     __typeof (GOMP_OFFLOAD_openacc_async_exec) *exec_func;
-    __typeof (GOMP_OFFLOAD_openacc_async_host2dev) *host2dev_func;
     __typeof (GOMP_OFFLOAD_openacc_async_dev2host) *dev2host_func;
+    __typeof (GOMP_OFFLOAD_openacc_async_host2dev) *host2dev_func;
   } async;
 
-  /* Create/destroy TLS data.  */
-  __typeof (GOMP_OFFLOAD_openacc_create_thread_data) *create_thread_data_func;
-  __typeof (GOMP_OFFLOAD_openacc_destroy_thread_data)
-    *destroy_thread_data_func;
-
   /* NVIDIA target specific routines.  */
   struct {
     __typeof (GOMP_OFFLOAD_openacc_cuda_get_current_device)
diff -ru trunk-orig/libgomp/libgomp.map trunk-work/libgomp/libgomp.map
--- trunk-orig/libgomp/libgomp.map	2018-12-14 18:31:07.487203770 +0800
+++ trunk-work/libgomp/libgomp.map	2018-12-18 22:28:18.103210976 +0800
@@ -506,6 +506,7 @@
 	GOMP_PLUGIN_debug;
 	GOMP_PLUGIN_error;
 	GOMP_PLUGIN_fatal;
+	GOMP_PLUGIN_async_unmap_vars;
 	GOMP_PLUGIN_acc_thread;
 };
 
diff -ru trunk-orig/libgomp/libgomp-plugin.h trunk-work/libgomp/libgomp-plugin.h
--- trunk-orig/libgomp/libgomp-plugin.h	2018-12-14 18:31:07.487203770 +0800
+++ trunk-work/libgomp/libgomp-plugin.h	2018-12-18 22:56:12.338389017 +0800
@@ -110,9 +110,8 @@
 
 extern void GOMP_OFFLOAD_openacc_exec (void (*) (void *), size_t, void **,
 				       void **, unsigned *, void *);
-extern void GOMP_OFFLOAD_openacc_async_exec (void (*) (void *), size_t, void **,
-					     void **, unsigned *, void *,
-					     struct goacc_asyncqueue *);
+extern void *GOMP_OFFLOAD_openacc_create_thread_data (int);
+extern void GOMP_OFFLOAD_openacc_destroy_thread_data (void *);
 extern struct goacc_asyncqueue *GOMP_OFFLOAD_openacc_async_construct (void);
 extern bool GOMP_OFFLOAD_openacc_async_destruct (struct goacc_asyncqueue *);
 extern int GOMP_OFFLOAD_openacc_async_test (struct goacc_asyncqueue *);
@@ -121,12 +120,13 @@
 						  struct goacc_asyncqueue *);
 extern void GOMP_OFFLOAD_openacc_async_queue_callback (struct goacc_asyncqueue *,
 						       void (*)(void *), void *);
-extern bool GOMP_OFFLOAD_openacc_async_host2dev (int, void *, const void *, size_t,
-						 struct goacc_asyncqueue *);
+extern void GOMP_OFFLOAD_openacc_async_exec (void (*) (void *), size_t, void **,
+					     void **, unsigned *, void *,
+					     struct goacc_asyncqueue *);
 extern bool GOMP_OFFLOAD_openacc_async_dev2host (int, void *, const void *, size_t,
 						 struct goacc_asyncqueue *);
-extern void *GOMP_OFFLOAD_openacc_create_thread_data (int);
-extern void GOMP_OFFLOAD_openacc_destroy_thread_data (void *);
+extern bool GOMP_OFFLOAD_openacc_async_host2dev (int, void *, const void *, size_t,
+						 struct goacc_asyncqueue *);
 extern void *GOMP_OFFLOAD_openacc_cuda_get_current_device (void);
 extern void *GOMP_OFFLOAD_openacc_cuda_get_current_context (void);
 extern void *GOMP_OFFLOAD_openacc_cuda_get_stream (struct goacc_asyncqueue *);

Patch

diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index f1c53c5..697080c 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -160,6 +160,7 @@  enum gomp_map_kind
 /* Asynchronous behavior.  Keep in sync with
    libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_async_t.  */
 
+#define GOMP_ASYNC_DEFAULT		0
 #define GOMP_ASYNC_NOVAL		-1
 #define GOMP_ASYNC_SYNC			-2
 
@@ -199,7 +200,7 @@  enum gomp_map_kind
 /* Versions of libgomp and device-specific plugins.  GOMP_VERSION
    should be incremented whenever an ABI-incompatible change is introduced
    to the plugin interface defined in libgomp/libgomp.h.  */
-#define GOMP_VERSION	1
+#define GOMP_VERSION	2
 #define GOMP_VERSION_NVIDIA_PTX 1
 #define GOMP_VERSION_INTEL_MIC 0
 #define GOMP_VERSION_HSA 0
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 2fc35d56..667ba19 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -53,6 +53,20 @@  enum offload_target_type
   OFFLOAD_TARGET_TYPE_HSA = 7
 };
 
+/* Opaque type to represent plugin-dependent implementation of an
+   OpenACC asynchronous queue.  */
+struct goacc_asyncqueue;
+
+/* Used to keep a list of active asynchronous queues.  */
+struct goacc_asyncqueue_list
+{
+  struct goacc_asyncqueue *aq;
+  struct goacc_asyncqueue_list *next;
+};
+
+typedef struct goacc_asyncqueue *goacc_aq;
+typedef struct goacc_asyncqueue_list *goacc_aq_list;
+
 /* Auxiliary struct, used for transferring pairs of addresses from plugin
    to libgomp.  */
 struct addr_pair
@@ -93,22 +107,31 @@  extern bool GOMP_OFFLOAD_dev2dev (int, void *, const void *, size_t);
 extern bool GOMP_OFFLOAD_can_run (void *);
 extern void GOMP_OFFLOAD_run (int, void *, void *, void **);
 extern void GOMP_OFFLOAD_async_run (int, void *, void *, void **, void *);
+
 extern void GOMP_OFFLOAD_openacc_exec (void (*) (void *), size_t, void **,
-				       void **, int, unsigned *, void *);
-extern void GOMP_OFFLOAD_openacc_register_async_cleanup (void *, int);
-extern int GOMP_OFFLOAD_openacc_async_test (int);
-extern int GOMP_OFFLOAD_openacc_async_test_all (void);
-extern void GOMP_OFFLOAD_openacc_async_wait (int);
-extern void GOMP_OFFLOAD_openacc_async_wait_async (int, int);
-extern void GOMP_OFFLOAD_openacc_async_wait_all (void);
-extern void GOMP_OFFLOAD_openacc_async_wait_all_async (int);
-extern void GOMP_OFFLOAD_openacc_async_set_async (int);
+				       void **, unsigned *, void *);
+extern void GOMP_OFFLOAD_openacc_async_exec (void (*) (void *), size_t, void **,
+					     void **, unsigned *, void *,
+					     struct goacc_asyncqueue *);
+extern struct goacc_asyncqueue *GOMP_OFFLOAD_openacc_async_construct (void);
+extern bool GOMP_OFFLOAD_openacc_async_destruct (struct goacc_asyncqueue *);
+extern int GOMP_OFFLOAD_openacc_async_test (struct goacc_asyncqueue *);
+extern void GOMP_OFFLOAD_openacc_async_synchronize (struct goacc_asyncqueue *);
+extern void GOMP_OFFLOAD_openacc_async_serialize (struct goacc_asyncqueue *,
+						  struct goacc_asyncqueue *);
+extern void GOMP_OFFLOAD_openacc_async_queue_callback (struct goacc_asyncqueue *,
+						       void (*)(void *), void *);
+extern bool GOMP_OFFLOAD_openacc_async_host2dev (int, void *, const void *, size_t,
+						 struct goacc_asyncqueue *);
+extern bool GOMP_OFFLOAD_openacc_async_dev2host (int, void *, const void *, size_t,
+						 struct goacc_asyncqueue *);
 extern void *GOMP_OFFLOAD_openacc_create_thread_data (int);
 extern void GOMP_OFFLOAD_openacc_destroy_thread_data (void *);
 extern void *GOMP_OFFLOAD_openacc_cuda_get_current_device (void);
 extern void *GOMP_OFFLOAD_openacc_cuda_get_current_context (void);
-extern void *GOMP_OFFLOAD_openacc_cuda_get_stream (int);
-extern int GOMP_OFFLOAD_openacc_cuda_set_stream (int, void *);
+extern void *GOMP_OFFLOAD_openacc_cuda_get_stream (struct goacc_asyncqueue *);
+extern int GOMP_OFFLOAD_openacc_cuda_set_stream (struct goacc_asyncqueue *,
+						 void *);
 
 #ifdef __cplusplus
 }
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3a8cc2b..a69faa7 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -888,19 +888,23 @@  typedef struct acc_dispatch_t
   /* Execute.  */
   __typeof (GOMP_OFFLOAD_openacc_exec) *exec_func;
 
-  /* Async cleanup callback registration.  */
-  __typeof (GOMP_OFFLOAD_openacc_register_async_cleanup)
-    *register_async_cleanup_func;
-
-  /* Asynchronous routines.  */
-  __typeof (GOMP_OFFLOAD_openacc_async_test) *async_test_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_test_all) *async_test_all_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_wait) *async_wait_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_wait_async) *async_wait_async_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_wait_all) *async_wait_all_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_wait_all_async)
-    *async_wait_all_async_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_set_async) *async_set_async_func;
+  struct {
+    gomp_mutex_t lock;
+    int nasyncqueue;
+    struct goacc_asyncqueue **asyncqueue;
+    struct goacc_asyncqueue_list *active;
+
+    __typeof (GOMP_OFFLOAD_openacc_async_construct) *construct_func;
+    __typeof (GOMP_OFFLOAD_openacc_async_destruct) *destruct_func;
+    __typeof (GOMP_OFFLOAD_openacc_async_test) *test_func;
+    __typeof (GOMP_OFFLOAD_openacc_async_synchronize) *synchronize_func;
+    __typeof (GOMP_OFFLOAD_openacc_async_serialize) *serialize_func;
+    __typeof (GOMP_OFFLOAD_openacc_async_queue_callback) *queue_callback_func;
+
+    __typeof (GOMP_OFFLOAD_openacc_async_exec) *exec_func;
+    __typeof (GOMP_OFFLOAD_openacc_async_host2dev) *host2dev_func;
+    __typeof (GOMP_OFFLOAD_openacc_async_dev2host) *dev2host_func;
+  } async;
 
   /* Create/destroy TLS data.  */
   __typeof (GOMP_OFFLOAD_openacc_create_thread_data) *create_thread_data_func;
@@ -992,17 +996,33 @@  enum gomp_map_vars_kind
   GOMP_MAP_VARS_ENTER_DATA
 };
 
-extern void gomp_acc_insert_pointer (size_t, void **, size_t *, void *);
+extern void gomp_acc_insert_pointer (size_t, void **, size_t *, void *, int);
 extern void gomp_acc_remove_pointer (void *, size_t, bool, int, int, int);
 extern void gomp_acc_declare_allocate (bool, size_t, void **, size_t *,
 				       unsigned short *);
+struct gomp_coalesce_buf;
+extern void gomp_copy_host2dev (struct gomp_device_descr *,
+				struct goacc_asyncqueue *, void *, const void *,
+				size_t, struct gomp_coalesce_buf *);
+extern void gomp_copy_dev2host (struct gomp_device_descr *,
+				struct goacc_asyncqueue *, void *, const void *,
+				size_t);
 
 extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
 					      size_t, void **, void **,
 					      size_t *, void *, bool,
 					      enum gomp_map_vars_kind);
+extern struct target_mem_desc *gomp_map_vars_async (struct gomp_device_descr *,
+						    struct goacc_asyncqueue *,
+						    size_t, void **, void **,
+						    size_t *, void *, bool,
+						    enum gomp_map_vars_kind);
+extern void gomp_unmap_tgt (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
+extern void gomp_unmap_vars_async (struct target_mem_desc *, bool,
+				   struct goacc_asyncqueue *);
 extern void gomp_init_device (struct gomp_device_descr *);
+extern bool gomp_fini_device (struct gomp_device_descr *);
 extern void gomp_free_memmap (struct splay_tree_s *);
 extern void gomp_unload_device (struct gomp_device_descr *);
 extern bool gomp_remove_var (struct gomp_device_descr *, splay_tree_key);
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index e3f0c64..dd97728 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -420,8 +420,12 @@  OACC_2.5 {
 	acc_delete_finalize_async_32_h_;
 	acc_delete_finalize_async_64_h_;
 	acc_delete_finalize_async_array_h_;
+	acc_get_default_async;
+	acc_get_default_async_h_;
 	acc_memcpy_from_device_async;
 	acc_memcpy_to_device_async;
+	acc_set_default_async;
+	acc_set_default_async_h_;
 	acc_update_device_async;
 	acc_update_device_async_32_h_;
 	acc_update_device_async_64_h_;
@@ -458,7 +462,6 @@  GOMP_PLUGIN_1.0 {
 	GOMP_PLUGIN_debug;
 	GOMP_PLUGIN_error;
 	GOMP_PLUGIN_fatal;
-	GOMP_PLUGIN_async_unmap_vars;
 	GOMP_PLUGIN_acc_thread;
 };