This patch is a re-organization of OpenACC asynchronous queues. The previous 
style of implementation
was essentially re-defining the entire async API inside the plugin-interface, 
and relaying all such
API calls to the target plugin, which is awkward in design; it requires (each) 
target plugin to
essentially re-implement large portions of the async functionality to support 
OpenACC, and the
way it uses a state-setting style to "select/de-select" asynchronous queues for 
operations litters
a lot of code paths.

The new design proposed here in this patch declares a "struct 
goacc_asyncqueue*" opaque type in libgomp.h,
and re-defines the plugin interface to a few operations (e.g. 
construct/destruct/test/synchronize/etc.)
on this async-queue type, all details are target-dependent inside the specific 
plugin/plugin-<target>.c file.

Also included in this patch is the code for the acc_get/set_default_async API 
functions in OpenACC 2.5.
It's a minor part of this patch, but since some code was merge together, I'm 
submitting it together here.

Testing has been done with offloading enabled. The results are mostly okay, but 
with a few issues
with either yet incomplete submission of our testsuite adjustment patches, or 
other independent problems.
Seeking permission to commit this to trunk first.

Thanks,
Chung-Lin

2018-09-25  Chung-Lin Tang  <clt...@codesourcery.com>

        include/
        * gomp-constants.h (GOMP_ASYNC_DEFAULT): Define.
        (GOMP_VERSION): Increment for new plugin interface changes.

        libgomp/
        * libgomp-plugin.h (struct goacc_asyncqueue): Declare.
        (struct goacc_asyncqueue_list): Likewise.
        (goacc_aq): Likewise.
        (goacc_aq_list): Likewise.
        (GOMP_OFFLOAD_openacc_register_async_cleanup): Remove.
        (GOMP_OFFLOAD_openacc_async_test): Remove.
        (GOMP_OFFLOAD_openacc_async_test_all): Remove.
        (GOMP_OFFLOAD_openacc_async_wait): Remove.
        (GOMP_OFFLOAD_openacc_async_wait_async): Remove.
        (GOMP_OFFLOAD_openacc_async_wait_all): Remove.
        (GOMP_OFFLOAD_openacc_async_wait_all_async): Remove.
        (GOMP_OFFLOAD_openacc_async_set_async): Remove.
        (GOMP_OFFLOAD_openacc_exec): Adjust declaration.
        (GOMP_OFFLOAD_openacc_cuda_get_stream): Likewise.
        (GOMP_OFFLOAD_openacc_cuda_set_stream): Likewise.

        (GOMP_OFFLOAD_openacc_async_exec): Declare.
        (GOMP_OFFLOAD_openacc_async_construct): Declare.
        (GOMP_OFFLOAD_openacc_async_destruct): Declare.
        (GOMP_OFFLOAD_openacc_async_test): Declare.
        (GOMP_OFFLOAD_openacc_async_synchronize): Declare.
        (GOMP_OFFLOAD_openacc_async_serialize): Declare.
        (GOMP_OFFLOAD_openacc_async_queue_callback): Declare.
        (GOMP_OFFLOAD_openacc_async_host2dev): Declare.
        (GOMP_OFFLOAD_openacc_async_dev2host): Declare.

        * libgomp.h (struct acc_dispatch_t): Define 'async' sub-struct.
        (gomp_acc_insert_pointer): Adjust declaration.
        (gomp_copy_host2dev): New declaration.
        (gomp_copy_dev2host): Likewise.
        (gomp_map_vars_async): Likewise.
        (gomp_unmap_tgt): Likewise.
        (gomp_unmap_vars_async): Likewise.
        (gomp_fini_device): Likewise.

        * libgomp.map (OACC_2.5): Add acc_get_default_async,
        acc_get_default_async_h_, acc_set_default_async, and
        acc_set_default_async_h_.
        (GOMP_PLUGIN_1.0): Remove GOMP_PLUGIN_async_unmap_vars.

        * oacc-async.c (get_goacc_thread): New function.
        (get_goacc_thread_device): New function.
        (lookup_goacc_asyncqueue): New function.
        (get_goacc_asyncqueue): New function.
        (acc_async_test): Adjust code to use new async design.
        (acc_async_test_all): Likewise.
        (acc_wait): Likewise.
        (acc_wait_async): Likewise.
        (acc_wait_all): Likewise.
        (acc_wait_all_async): Likewise.
        (acc_get_default_async): New API function.
        (acc_set_default_async): Likewise.
        (goacc_async_unmap_tgt): New function.
        (goacc_async_copyout_unmap_vars): Likewise.
        (goacc_async_free): Likewise.
        (goacc_init_asyncqueues): Likewise.
        (goacc_fini_asyncqueues): Likewise.
        * oacc-cuda.c (acc_get_cuda_stream): Adjust code to use new async
        design.
        (acc_set_cuda_stream): Likewise.
        * oacc-host.c (host_openacc_exec): Adjust parameters, remove 'async'.
        (host_openacc_register_async_cleanup): Remove.
        (host_openacc_async_exec): New function.
        (host_openacc_async_test): Adjust parameters.
        (host_openacc_async_test_all): Remove.
        (host_openacc_async_wait): Remove.
        (host_openacc_async_wait_async): Remove.
        (host_openacc_async_wait_all): Remove.
        (host_openacc_async_wait_all_async): Remove.
        (host_openacc_async_set_async): Remove.
        (host_openacc_async_synchronize): New function.
        (host_openacc_async_serialize): New function.
        (host_openacc_async_host2dev): New function.
        (host_openacc_async_dev2host): New function.
        (host_openacc_async_queue_callback): New function.
        (host_openacc_async_construct): New function.
        (host_openacc_async_destruct): New function.
        (struct gomp_device_descr host_dispatch): Remove initialization of old
        interface, add intialization of new async sub-struct.
        * oacc-init.c (acc_shutdown_1): Adjust to use gomp_fini_device.
        (goacc_attach_host_thread_to_device): Remove old async code usage, add
        initialization of per-thread default_async.
        * oacc-int.h (struct goacc_thread): Add default_async field.
        (goacc_init_asyncqueues): New declaration.
        (goacc_fini_asyncqueues): Likewise.
        (goacc_async_copyout_unmap_vars): Likewise.
        (goacc_async_free): Likewise.
        (get_goacc_asyncqueue): Likewise.
        (lookup_goacc_asyncqueue): Likewise.

        * oacc-mem.c (memcpy_tofrom_device): Adjust code to use new async
        design.
        (present_create_copy): Adjust code to use new async design.
        (delete_copyout): Likewise.
        (update_dev_host): Likewise.
        (gomp_acc_insert_pointer): Add async parameter, adjust code to use new
        async design.
        (gomp_acc_remove_pointer): Adjust code to use new async design.
        * oacc-parallel.c (GOACC_parallel_keyed): Adjust code to use new async
        design, adjust profiling bits, interpret launch op as signed 16-bit
        field.
        (GOACC_enter_exit_data): Handle -1 as waits num, adjust code to use new
        async design.
        (goacc_wait): Adjust code to use new async design.
        (GOACC_update): Likewise.
        * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Remove.

        * target.c (goacc_device_copy_async): New function.
        (gomp_copy_host2dev): Remove 'static', add goacc_asyncqueue parameter,
        add goacc_device_copy_async case.
        (gomp_copy_dev2host): Likewise.
        (gomp_map_vars_existing): Add goacc_asyncqueue parameter, adjust code.
        (gomp_map_pointer): Likewise.
        (gomp_map_fields_existing): Likewise.
        (gomp_map_vars): Add function for compatiblity.
        (gomp_map_vars_async): Adapt from gomp_map_vars, add goacc_asyncqueue
        parameter.
        (gomp_unmap_tgt): Remove statis, add attribute_hidden.
        (gomp_unmap_vars): Add function for compatiblity.
        (gomp_unmap_vars_async): Adapt from gomp_unmap_vars, add
        goacc_asyncqueue parameter.
        (gomp_fini_device): New function.
        (gomp_exit_data): Adjust gomp_copy_dev2host call.
        (gomp_load_plugin_for_device): Remove old interface, adjust to load
        new async interface.
        (gomp_target_fini): Adjust code to call gomp_fini_device.

        * plugin/plugin-nvptx.c (struct cuda_map): Remove.
        (struct ptx_stream): Remove.
        (struct nvptx_thread): Remove current_stream field.
        (cuda_map_create): Remove.
        (cuda_map_destroy): Remove.
        (map_init): Remove.
        (map_fini): Remove.
        (map_pop): Remove.
        (map_push): Remove.
        (struct goacc_asyncqueue): Define.
        (struct nvptx_callback): Define.
        (struct ptx_free_block): Define.
        (struct ptx_device): Remove null_stream, active_streams, async_streams,
        stream_lock, and next fields.
        (enum ptx_event_type): Remove.
        (struct ptx_event): Remove.
        (ptx_event_lock): Remove.
        (ptx_events): Remove.
        (init_streams_for_device): Remove.
        (fini_streams_for_device): Remove.
        (select_stream_for_async): Remove.
        (nvptx_init): Remove ptx_events and ptx_event_lock references.
        (nvptx_attach_host_thread_to_device): Remove CUDA_ERROR_NOT_PERMITTED
        case.
        (nvptx_open_device): Add free_blocks initialization, remove
        init_streams_for_device call.
        (nvptx_close_device): Remove fini_streams_for_device call, add
        free_blocks destruct code.
        (event_gc): Remove.
        (event_add): Remove.
        (nvptx_exec): Adjust parameters and code.
        (nvptx_free): Likewise.
        (nvptx_host2dev): Remove.
        (nvptx_dev2host): Remove.
        (nvptx_set_async): Remove.
        (nvptx_async_test): Remove.
        (nvptx_async_test_all): Remove.
        (nvptx_wait): Remove.
        (nvptx_wait_async): Remove.
        (nvptx_wait_all): Remove.
        (nvptx_wait_all_async): Remove.
        (nvptx_get_cuda_stream): Remove.
        (nvptx_set_cuda_stream): Remove.
        (GOMP_OFFLOAD_alloc): Adjust code.
        (GOMP_OFFLOAD_free): Likewise.
        (GOMP_OFFLOAD_openacc_register_async_cleanup): Remove.
        (GOMP_OFFLOAD_openacc_exec): Adjust parameters and code.
        (GOMP_OFFLOAD_openacc_async_test_all): Remove.
        (GOMP_OFFLOAD_openacc_async_wait): Remove.
        (GOMP_OFFLOAD_openacc_async_wait_async): Remove.
        (GOMP_OFFLOAD_openacc_async_wait_all): Remove.
        (GOMP_OFFLOAD_openacc_async_wait_all_async): Remove.
        (GOMP_OFFLOAD_openacc_async_set_async): Remove.
        (cuda_free_argmem): New function.
        (GOMP_OFFLOAD_openacc_async_exec): New plugin hook function.
        (GOMP_OFFLOAD_openacc_create_thread_data): Adjust code.
        (GOMP_OFFLOAD_openacc_cuda_get_stream): Adjust code.
        (GOMP_OFFLOAD_openacc_cuda_set_stream): Adjust code.
        (GOMP_OFFLOAD_openacc_async_construct): New plugin hook function.
        (GOMP_OFFLOAD_openacc_async_destruct): New plugin hook function.
        (GOMP_OFFLOAD_openacc_async_test): Remove and re-implement.
        (GOMP_OFFLOAD_openacc_async_synchronize): New plugin hook function.
        (GOMP_OFFLOAD_openacc_async_serialize): New plugin hook function.
        (GOMP_OFFLOAD_openacc_async_queue_callback): New plugin hook function.
        (cuda_callback_wrapper): New function.
        (cuda_memcpy_sanity_check): New function.
        (GOMP_OFFLOAD_host2dev): Remove and re-implement.
        (GOMP_OFFLOAD_dev2host): Remove and re-implement.
        (GOMP_OFFLOAD_openacc_async_host2dev): New plugin hook function.
        (GOMP_OFFLOAD_openacc_async_dev2host): New plugin hook function.

        * testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Adjust testcase.
        * testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/data-3.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/lib-71.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/lib-77.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/lib-79.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/lib-81.c: Likewise.
        * testsuite/libgomp.oacc-fortran/lib-12.f90: Likewise.

Reply via email to