[Bug target/88706] New: [og8, nvptx, openacc] Inconsistencies when vector length set using vector_length clause or fopenacc-dim

vries at gcc dot gnu.org Sat, 05 Jan 2019 03:17:49 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88706


            Bug ID: 88706
           Summary: [og8, nvptx, openacc] Inconsistencies when vector
                    length set using vector_length clause or fopenacc-dim
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider libgomp testcase vred2d-128.c (posted partially here):
...
gentest (test1, "acc parallel loop gang vector_length (128)",
         "acc loop vector reduction(+:t1) reduction(-:t2)")

gentest (test2, "acc parallel loop gang vector_length (128)",
         "acc loop worker vector reduction(+:t1) reduction(-:t2)")

gentest (test3, "acc parallel loop gang worker vector_length (128)",
         "acc loop vector reduction(+:t1) reduction(-:t2)")

gentest (test4, "acc parallel loop",
         "acc loop reduction(+:t1) reduction(-:t2)")
...

The resulting front-end attributes are:
...
$ grep -A1 __attribute__ vred2d-128.c.088t.fixup_cfg4
__attribute__((oacc function (, , 128), omp target entrypoint))
test1._omp_fn.0 (long int * t2, long int * t1, int[10000] * a2, int[10000] *
a1)
--
__attribute__((oacc function (, , 128), omp target entrypoint))
test2._omp_fn.1 (long int * t2, long int * t1, int[10000] * a2, int[10000] *
a1)
--
__attribute__((oacc function (, , 128), omp target entrypoint))
test3._omp_fn.2 (long int * t2, long int * t1, int[10000] * a2, int[10000] *
a1)
--
__attribute__((oacc function (, , ), omp target entrypoint))
test4._omp_fn.3 (long int * t2, long int * t1, int[10000] * a2, int[10000] *
a1)
...

When we compile at -O2 and grep for the resulting dimensions, we have:
...
$ grep FUNC_MAP vred2d-128.s
//:FUNC_MAP "test1$_omp_fn$0", 0, 0x1, 0x80
//:FUNC_MAP "test2$_omp_fn$1", 0, 0x1, 0x80
//:FUNC_MAP "test3$_omp_fn$2", 0, 0, 0x20
//:FUNC_MAP "test4$_omp_fn$3", 0, 0, 0x20
...

Note that the vector length for test3 has been downgraded by the
-mno-long-vector-in-workers workaround.

Now if we remove the hardcoded vector-length (128) from test1, test2 and test3,
and we add -fopenacc-dim=::128 we have instead:
...
//:FUNC_MAP "test1$_omp_fn$0", 0, 0x1, 0x80
//:FUNC_MAP "test2$_omp_fn$1", 0, 0, 0x80
//:FUNC_MAP "test3$_omp_fn$2", 0, 0, 0x80
//:FUNC_MAP "test4$_omp_fn$3", 0, 0, 0x80
...

The change on test4 is expected.

But the change on test3 is unexpected. It should not matter whether we set the
vector length on the parallel directive, or using -fopenacc-dim, the effect of
-mno-long-vector-in-workers should be the same.

The cause for this can be seen by adding this print statement:
...
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 110dbffe0d0..5aab6db169f 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5688,6 +5688,7 @@ nvptx_adjust_parallelism (unsigned inner_mask, unsigned
outer_mask)
   offload_attrs oa;

   populate_offload_attrs (&oa);
+  fprintf (stderr, "oa.vector_length in nvptx_adjust_parallelism: %d\n",
oa.vector_length);

   if (oa.vector_length == PTX_WARP_SIZE)
     return inner_mask;
...

If we have the first case (vector_length set on parallel directive, no
-fopenacc-dim=), we have:
...
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
...

But in the second case (no vector_length set on parallel directive, using
-fopenacc-dim=), we have:
...
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
...

I think the same problem exists for the other work around in
nvptx_adjust_parallelism, this one:
...
  /* FIXME: This is overly conservative; worker and vector loop will            
     eventually be combined.  */
  if (wv)
    return inner_mask & ~GOMP_DIM_MASK (GOMP_DIM_WORKER);
...
It's just harded to spot because the workaround doesn't affect vector length.

[Bug target/88706] New: [og8, nvptx, openacc] Inconsistencies when vector length set using vector_length clause or fopenacc-dim

Reply via email to