>> Right, in fact there are two separate things you're trying to address >> here: launch failure and occupancy heuristic, so split the patch.
> That hunk was small, so I included it with this patch. Although if you > insist, I can remove it. Please, for future reference, always assume that I insist instead of asking me, unless you have an argument to present why that is not a good idea. And just to be clear here: "small" is not such an argument. Please keep in mind ( https://gcc.gnu.org/contribute.html#patches ): ... Don't mix together changes made for different reasons. Send them individually. ... > + /* Check if the accelerator has sufficient hardware resources to > + launch the offloaded kernel. */ > + if (dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR] > + > targ_fn->max_threads_per_block) > + GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources to" > + " launch '%s' with num_workers = %d and vector_length =" > + " %d; recompile the program with 'num_workers = x and" > + " vector_length = y' on that offloaded region or " > + "'-fopenacc-dim=-:x:y' where x * y <= %d.\n", > + targ_fn->launch->fn, dims[GOMP_DIM_WORKER], > + dims[GOMP_DIM_VECTOR], targ_fn->max_threads_per_block); > + This is copied from the state on an openacc branch where vector-length is variable, and the error message text doesn't make sense on current trunk for that reason. Also, it suggests a syntax for fopenacc-dim that's not supported on trunk. Committed as attached. Thanks, - Tom
[libgomp, nvptx] Add error with recompilation hint for launch failure Currently, when a kernel is lauched with too many workers, it results in a cuda launch failure. This is triggered f.i. for parallel-loop-1.c at -O0 on a Quadro M1200. This patch detects this situation, and errors out with a hint on how to fix it. Build and reg-tested on x86_64 with nvptx accelerator. 2018-07-26 Cesar Philippidis <ce...@codesourcery.com> Tom de Vries <tdevr...@suse.de> * plugin/plugin-nvptx.c (nvptx_exec): Error if the hardware doesn't have sufficient resources to launch a kernel, and give a hint on how to fix it. --- libgomp/plugin/plugin-nvptx.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 5d9b5151e95..3a4077a1315 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1204,6 +1204,21 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs, dims[i] = default_dims[i]; } + /* Check if the accelerator has sufficient hardware resources to + launch the offloaded kernel. */ + if (dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR] + > targ_fn->max_threads_per_block) + { + int suggest_workers + = targ_fn->max_threads_per_block / dims[GOMP_DIM_VECTOR]; + GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources to" + " launch '%s' with num_workers = %d; recompile the" + " program with 'num_workers = %d' on that offloaded" + " region or '-fopenacc-dim=:%d'", + targ_fn->launch->fn, dims[GOMP_DIM_WORKER], + suggest_workers, suggest_workers); + } + /* This reserves a chunk of a pre-allocated page of memory mapped on both the host and the device. HP is a host pointer to the new chunk, and DP is the corresponding device pointer. */