https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81802
Bug ID: 81802 Summary: Report cuLaunchKernel launch dimensions in GOMP_OFFLOAD_run Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- This patch: ... diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index f5b9502..2cb63b4 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -2457,6 +2457,7 @@ nvptx_stacks_free (void *p, int num) void GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args) { + const struct targ_fn_launch *launch = ((struct targ_fn_descriptor *) tgt_fn)->launch; CUfunction function = ((struct targ_fn_descriptor *) tgt_fn)->fn; CUresult r; struct ptx_device *ptx_dev = ptx_devices[ord]; @@ -2492,6 +2493,9 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args) CU_LAUNCH_PARAM_BUFFER_SIZE, &fn_args_size, CU_LAUNCH_PARAM_END }; + GOMP_PLUGIN_debug (0, " %s: kernel %s: launch" + " [(teams: %u), 1, 1] [32, (threads: %u), 1]\n", + __FUNCTION__, launch->fn, teams, threads); r = CUDA_CALL_NOCHECK (cuLaunchKernel, function, teams, 1, 1, 32, threads, 1, 0, ptx_dev->null_stream->stream, NULL, config); ... Prints this information with GOMP_DEBUG=1: ... GOMP_OFFLOAD_run: kernel f2_tpf_static32$_omp_fn$0: launch [(teams: 1), 1, 1] [32, (threads: 8), 1] ... I'm not happy with the term 'threads', the term is ambiguous in the context. Perhaps 'warps', or 'omp-threads' would be better.