https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81802
Bug ID: 81802
Summary: Report cuLaunchKernel launch dimensions in
GOMP_OFFLOAD_run
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libgomp
Assignee: unassigned at gcc dot gnu.org
Reporter: vries at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
Target Milestone: ---
This patch:
...
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index f5b9502..2cb63b4 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -2457,6 +2457,7 @@ nvptx_stacks_free (void *p, int num)
void
GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)
{
+ const struct targ_fn_launch *launch = ((struct targ_fn_descriptor *)
tgt_fn)->launch;
CUfunction function = ((struct targ_fn_descriptor *) tgt_fn)->fn;
CUresult r;
struct ptx_device *ptx_dev = ptx_devices[ord];
@@ -2492,6 +2493,9 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars,
void **args)
CU_LAUNCH_PARAM_BUFFER_SIZE, &fn_args_size,
CU_LAUNCH_PARAM_END
};
+ GOMP_PLUGIN_debug (0, " %s: kernel %s: launch"
+ " [(teams: %u), 1, 1] [32, (threads: %u), 1]\n",
+ __FUNCTION__, launch->fn, teams, threads);
r = CUDA_CALL_NOCHECK (cuLaunchKernel, function, teams, 1, 1,
32, threads, 1, 0, ptx_dev->null_stream->stream,
NULL, config);
...
Prints this information with GOMP_DEBUG=1:
...
GOMP_OFFLOAD_run: kernel f2_tpf_static32$_omp_fn$0: launch [(teams: 1), 1, 1]
[32, (threads: 8), 1]
...
I'm not happy with the term 'threads', the term is ambiguous in the context.
Perhaps 'warps', or 'omp-threads' would be better.