https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88706
Bug ID: 88706 Summary: [og8, nvptx, openacc] Inconsistencies when vector length set using vector_length clause or fopenacc-dim Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Consider libgomp testcase vred2d-128.c (posted partially here): ... gentest (test1, "acc parallel loop gang vector_length (128)", "acc loop vector reduction(+:t1) reduction(-:t2)") gentest (test2, "acc parallel loop gang vector_length (128)", "acc loop worker vector reduction(+:t1) reduction(-:t2)") gentest (test3, "acc parallel loop gang worker vector_length (128)", "acc loop vector reduction(+:t1) reduction(-:t2)") gentest (test4, "acc parallel loop", "acc loop reduction(+:t1) reduction(-:t2)") ... The resulting front-end attributes are: ... $ grep -A1 __attribute__ vred2d-128.c.088t.fixup_cfg4 __attribute__((oacc function (, , 128), omp target entrypoint)) test1._omp_fn.0 (long int * t2, long int * t1, int[10000] * a2, int[10000] * a1) -- __attribute__((oacc function (, , 128), omp target entrypoint)) test2._omp_fn.1 (long int * t2, long int * t1, int[10000] * a2, int[10000] * a1) -- __attribute__((oacc function (, , 128), omp target entrypoint)) test3._omp_fn.2 (long int * t2, long int * t1, int[10000] * a2, int[10000] * a1) -- __attribute__((oacc function (, , ), omp target entrypoint)) test4._omp_fn.3 (long int * t2, long int * t1, int[10000] * a2, int[10000] * a1) ... When we compile at -O2 and grep for the resulting dimensions, we have: ... $ grep FUNC_MAP vred2d-128.s //:FUNC_MAP "test1$_omp_fn$0", 0, 0x1, 0x80 //:FUNC_MAP "test2$_omp_fn$1", 0, 0x1, 0x80 //:FUNC_MAP "test3$_omp_fn$2", 0, 0, 0x20 //:FUNC_MAP "test4$_omp_fn$3", 0, 0, 0x20 ... Note that the vector length for test3 has been downgraded by the -mno-long-vector-in-workers workaround. Now if we remove the hardcoded vector-length (128) from test1, test2 and test3, and we add -fopenacc-dim=::128 we have instead: ... //:FUNC_MAP "test1$_omp_fn$0", 0, 0x1, 0x80 //:FUNC_MAP "test2$_omp_fn$1", 0, 0, 0x80 //:FUNC_MAP "test3$_omp_fn$2", 0, 0, 0x80 //:FUNC_MAP "test4$_omp_fn$3", 0, 0, 0x80 ... The change on test4 is expected. But the change on test3 is unexpected. It should not matter whether we set the vector length on the parallel directive, or using -fopenacc-dim, the effect of -mno-long-vector-in-workers should be the same. The cause for this can be seen by adding this print statement: ... diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 110dbffe0d0..5aab6db169f 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -5688,6 +5688,7 @@ nvptx_adjust_parallelism (unsigned inner_mask, unsigned outer_mask) offload_attrs oa; populate_offload_attrs (&oa); + fprintf (stderr, "oa.vector_length in nvptx_adjust_parallelism: %d\n", oa.vector_length); if (oa.vector_length == PTX_WARP_SIZE) return inner_mask; ... If we have the first case (vector_length set on parallel directive, no -fopenacc-dim=), we have: ... oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 ... But in the second case (no vector_length set on parallel directive, using -fopenacc-dim=), we have: ... oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 ... I think the same problem exists for the other work around in nvptx_adjust_parallelism, this one: ... /* FIXME: This is overly conservative; worker and vector loop will eventually be combined. */ if (wv) return inner_mask & ~GOMP_DIM_MASK (GOMP_DIM_WORKER); ... It's just harded to spot because the workaround doesn't affect vector length.