https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88703
Bug ID: 88703 Summary: oacc_validate_dims allows invalid dimensions Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Consider oacc_validate_dims on trunk: ... oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used) { tree purpose[GOMP_DIM_MAX]; unsigned ix; tree pos = TREE_VALUE (attrs); /* Make sure the attribute creator attached the dimension information. */ gcc_assert (pos); for (ix = 0; ix != GOMP_DIM_MAX; ix++) { purpose[ix] = TREE_PURPOSE (pos); tree val = TREE_VALUE (pos); dims[ix] = val ? TREE_INT_CST_LOW (val) : -1; pos = TREE_CHAIN (pos); } bool changed = targetm.goacc.validate_dims (fn, dims, level); /* Default anything left to 1 or a partitioned default. */ for (ix = 0; ix != GOMP_DIM_MAX; ix++) if (dims[ix] < 0) { /* The OpenACC spec says 'If the [num_gangs] clause is not specified, an implementation-defined default will be used; the default may depend on the code within the construct.' (2.5.6). Thus an implementation is free to choose non-unity default for a parallel region that doesn't have any gang-partitioned loops. However, it appears that there is a sufficient body of user code that expects non-gang partitioned regions to not execute in gang-redundant mode. So we (a) don't warn about the non-portability and (b) pick the minimum permissible dimension size when there is no partitioned execution. Otherwise we pick the global default for the dimension, which the user can control. The same wording and logic applies to num_workers and vector_length, however the worker- or vector- single execution doesn't have the same impact as gang-redundant execution. (If the minimum gang-level partioning is not 1, the target is probably too confusing.) */ dims[ix] = (used & GOMP_DIM_MASK (ix) ? oacc_default_dims[ix] : oacc_min_dims[ix]); changed = true; } if (changed) { /* Replace the attribute with new values. */ pos = NULL_TREE; for (ix = GOMP_DIM_MAX; ix--;) pos = tree_cons (purpose[ix], build_int_cst (integer_type_node, dims[ix]), pos); oacc_replace_fn_attrib (fn, pos); } } ... It does the following: - read the dimensions set in the attributes - call targetm.goacc.validate_dims on those dimensions - apply oacc_default_dims[ix] or oacc_min_dims[ix] to set remaining unset dimensions - update the dimensions in the attributes However, it's possible that the resulting dimensions are in fact invalid. Consider this test-case on og8 branch: ... $ cat libgomp/testsuite/libgomp.oacc-c-c++-common/test.c /* { dg-do run { target openacc_nvidia_accel_selected } } */ /* { dg-additional-options "-fopenacc-dim=:32 -foffload=-mlong-vector-in-workers" } */ #include <stdlib.h> #define N 2048 unsigned int a[N]; unsigned int b[N]; unsigned int c[N]; unsigned int n = N; int main (void) { #pragma acc parallel vector_length (128) copyin (a,b) copyout (c) { #pragma acc loop worker for (unsigned int i = 0; i < n; i++) #pragma acc loop vector for (unsigned int j = 0; j < n; j++) ; } return 0; } ... This generates these dimensions in the .s file: ... //:FUNC_MAP "main$_omp_fn$0", 0x1, 0x20, 0x80 ... so, num_workers * vector_length == 0x20 * 0x80 == 32 * 128 == 4096 (while the maximum allowed is 1024 == maximum CTA size). This causes a runtime error: ... libgomp: The Nvidia accelerator has insufficient resources to launch 'main$_omp_fn$0' with num_workers = 32 and vector_length = 128; recompile the program with 'num_workers = x and vector_length = y' on that offloaded region or '-fopenacc-dim=-:x:y' where x * y <= 1024. ... An easy way to detect this problem at compile time is by adding an assert here: ... diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index aac0aa8b27a..7e3efa1032e 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -700,6 +700,7 @@ oacc_validate_dims ? oacc_default_dims[ix] : oacc_min_dims[ix]); changed = true; } + gcc_assert (!targetm.goacc.validate_dims (fn, dims, level)); if (changed) { ... For the test-case, the compiler will enter the second call to targetm.goacc.validate_dims with dims {1, 32, 128}, which will be updated to {1, 32, 32}, which will cause targetm.goacc.validate_dims to return true, which will trigger the assert. AFAIU, this is a generic problem with the targetm.goacc.validate_dims hook on both trunk and og8, and not specific to nvptx.