Hello, This patch series moves libgomp/nvptx porting further along to get initial bits of parallel execution working, mostly unbreaking the testsuite. Please have a look! I'm interested in feedback, and would like to know if it's suitable to become a part of a branch.
This patch series ports enough of libgomp.c to get warp-level parallelism working for OpenMP offloading. The overall approach is as follows. I've opted not to use dynamic parallelism. It increases the hardware requirement from sm_30 to sm_35, needs a library from CUDA Toolkit at link time (libcudadevrt.a), and imposes overhead at run time. The last point might be moot if we don't manage to make libgomp's own overhead low, but still my judgement is that a hard dependency on dynamic parallelism is problematic. The plugin launches one (for now) thread block with 8 warps, which begin executing a new function in libgomp, gomp_nvptx_main. The warps for a (pre-allocated) pool. Warp 0 is responsible for initialization and final cleanup, and proceeds to execute target region functions. Other warps proceed to gomp_thread_start. With these patches, it's possible to have libgomp testsuite mostly passing. The failures are as follows: libgomp.c/target-{1,7,critical-1}.c: segfault in accelerator code libgomp.c/thread-limit-2.c: fails to link due to 'usleep' unavailable on NVPTX. Note, the test does not run anything on the device because the target region has 'if (0)' clause. libgomp.c++/examples-4/declare_target-2.C: libgomp: Can't map target variables (size mismatch). Will investigate later. libgomp.c++/target-1.C: same as libgomp.c/target-1.c, segfault on device. I didn't run the libgomp/gfortran testsuite yet. I'd like your input on dealing with testsuite breaks (XFAIL?). I have not rebased my private branch in a while, so context in gcc/config/nvptx is probably out-of-date in places. Yours, Alexander nvptx: emit kernels for 'omp target entrypoint' only for OpenACC nvptx: emit pointers to OpenMP target region entry points nvptx: expand support for address spaces nvptx: fix output of _Bool global variables omp-low: set 'omp target entrypoint' only on entypoints omp-low: copy omp_data_o to shared memory on NVPTX libgomp nvptx plugin: launch target functions via gomp_nvptx_main libgomp nvptx: populate proc.c libgomp: provide barriers on NVPTX libgomp: arrange a team of pre-started threads via gomp_nvptx_main libgomp: avoid variable-length stack allocation in team.c libgomp: fixup error.c on nvptx libgomp: provide minimal GOMP_teams libgomp: use more generic implementations on nvptx gcc/config/nvptx/nvptx.c | 78 +++++++++++++-- gcc/omp-low.c | 58 +++++++++-- libgomp/config/nvptx/alloc.c | 0 libgomp/config/nvptx/bar.c | 210 ++++++++++++++++++++++++++++++++++++++++ libgomp/config/nvptx/bar.h | 129 +++++++++++++++++++++++- libgomp/config/nvptx/barrier.c | 0 libgomp/config/nvptx/critical.c | 57 ----------- libgomp/config/nvptx/error.c | 0 libgomp/config/nvptx/iter.c | 0 libgomp/config/nvptx/iter_ull.c | 0 libgomp/config/nvptx/loop.c | 0 libgomp/config/nvptx/loop_ull.c | 0 libgomp/config/nvptx/ordered.c | 0 libgomp/config/nvptx/parallel.c | 0 libgomp/config/nvptx/proc.c | 40 ++++++++ libgomp/config/nvptx/single.c | 0 libgomp/config/nvptx/target.c | 39 ++++++++ libgomp/config/nvptx/task.c | 0 libgomp/config/nvptx/team.c | 0 libgomp/config/nvptx/work.c | 0 libgomp/error.c | 5 + libgomp/libgomp.h | 10 +- libgomp/plugin/plugin-nvptx.c | 23 ++++- libgomp/task.c | 7 +- libgomp/team.c | 92 +++++++++++++++++- 25 files changed, 664 insertions(+), 84 deletions(-) delete mode 100644 libgomp/config/nvptx/alloc.c delete mode 100644 libgomp/config/nvptx/barrier.c delete mode 100644 libgomp/config/nvptx/critical.c delete mode 100644 libgomp/config/nvptx/error.c delete mode 100644 libgomp/config/nvptx/iter.c delete mode 100644 libgomp/config/nvptx/iter_ull.c delete mode 100644 libgomp/config/nvptx/loop.c delete mode 100644 libgomp/config/nvptx/loop_ull.c delete mode 100644 libgomp/config/nvptx/ordered.c delete mode 100644 libgomp/config/nvptx/parallel.c delete mode 100644 libgomp/config/nvptx/single.c delete mode 100644 libgomp/config/nvptx/task.c delete mode 100644 libgomp/config/nvptx/team.c delete mode 100644 libgomp/config/nvptx/work.c