Hi Cesar!
On Fri, 27 Jan 2017 07:45:52 -0800, Cesar Philippidis <[email protected]>
wrote:
> If you take a close look at lower_omp_target, you'll notice that I'm
> gave reference types special treatment. Specifically, I disabled this
> optimization on non-INTEGER_TYPE and floating point values, because the
> nvptx target was having some problems dereferencing boolean-typed
> pointers. That's something I have on my TODO list to track down later.
Please file an issue as appropriate.
> As for the performance gains, this optimization resulted in a
> non-trivial speedup in CloverLeaf running on a Nvidia Pascal board.
> CloverLeaf is somewhat special in that it consists of a lot of OpenACC
> offloaded regions which gets called multiple times throughout its
> execution. Consequently, it is I/O limited. The other benchmarks I ran
> didn't benefit nearly as much as CloverLeaf. I chose a small data set
> for CloverLeaf that only ran in 1.3s without the patch, and hence make
> it even more I/O limited. After the patch, it ran 0.35s faster.
\o/ Yay!
> This patch has been applied to gomp-4_0-branch.
(Not reviewed in detail.)
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> +static tree
> +convert_from_firstprivate_pointer (tree var, bool is_ref, gimple_seq *gs)
> +{
> + tree type = TREE_TYPE (var);
> + tree new_type = NULL_TREE;
> + tree tmp = NULL_TREE;
> + tree inner_type = NULL_TREE;
[...]/source-gcc/gcc/omp-low.c: In function 'tree_node*
convert_from_firstprivate_pointer(tree, bool, gimple**)':
[...]/source-gcc/gcc/omp-low.c:16515:8: warning: unused variable
'inner_type' [-Wunused-variable]
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90
I see:
{+FAIL: libgomp.oacc-fortran/firstprivate-int.f90
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O
(internal compiler error)+}
{+FAIL: libgomp.oacc-fortran/firstprivate-int.f90
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O 4
blank line(s) in output+}
{+FAIL: libgomp.oacc-fortran/firstprivate-int.f90
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O (test
for excess errors)+}
{+UNRESOLVED: libgomp.oacc-fortran/firstprivate-int.f90
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O
compilation failed to produce executable+}
That's the nvptx offloading compiler configured with
"--enable-checking=yes,df,fold,rtl":
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:
In function 'MAIN__._omp_fn.1':
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
error: conversion of register to a different size
VIEW_CONVERT_EXPR<logical(kind=2)>(_17);
_18 = VIEW_CONVERT_EXPR<logical(kind=2)>(_17);
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
error: conversion of register to a different size
VIEW_CONVERT_EXPR<logical(kind=4)>(_20);
_21 = VIEW_CONVERT_EXPR<logical(kind=4)>(_20);
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
error: conversion of register to a different size
VIEW_CONVERT_EXPR<logical(kind=8)>(_23);
_24 = VIEW_CONVERT_EXPR<logical(kind=8)>(_23);
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
error: conversion of register to a different size
VIEW_CONVERT_EXPR<logical(kind=16)>(_26);
_27 = VIEW_CONVERT_EXPR<logical(kind=16)>(_26);
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
internal compiler error: verify_gimple failed
0xa67d75 verify_gimple_in_cfg(function*, bool)
[...]/source-gcc/gcc/tree-cfg.c:5125
0x94ebbc execute_function_todo
[...]/source-gcc/gcc/passes.c:1958
0x94f513 execute_todo
[...]/source-gcc/gcc/passes.c:2010
And with "-m32" multilib testing, I see:
{+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 -DACC_DEVICE_TYPE_host=1
-DACC_MEM_SHARED=1 -foffload=disable -O (test for excess errors)+}
{+UNRESOLVED: libgomp.oacc-fortran/firstprivate-int.f90
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O compilation
failed to produce executable+}
That is:
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:10:18:
Error: Kind 16 not supported for type INTEGER at (1)
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:16:18:
Error: Kind 16 not supported for type LOGICAL at (1)
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:115:18:
Error: Kind 16 not supported for type INTEGER at (1)
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:121:18:
Error: Kind 16 not supported for type LOGICAL at (1)
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:31:6:
Error: Symbol 'i16i' at (1) has no IMPLICIT type
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:49:40:
Error: Symbol 'i16o' at (1) has no IMPLICIT type
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:37:6:
Error: Symbol 'l16i' at (1) has no IMPLICIT type
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:51:40:
Error: Symbol 'l16o' at (1) has no IMPLICIT type
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:105:43:
Error: Symbol 'i16i' at (1) has no IMPLICIT type
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:105:69:
Error: Symbol 'i16o' at (1) has no IMPLICIT type
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:106:43:
Error: Symbol 'l16i' at (1) has no IMPLICIT type
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:106:69:
Error: Symbol 'l16o' at (1) has no IMPLICIT type
Grüße
Thomas
> @@ -0,0 +1,203 @@
> +! Verify the GOMP_MAP_FIRSTPRIVATE_INT optimziation on various types.
> +
> +program test
> + implicit none
> +
> + integer (kind=1) :: i1i, i1o
> + integer (kind=2) :: i2i, i2o
> + integer (kind=4) :: i4i, i4o
> + integer (kind=8) :: i8i, i8o
> + integer (kind=16) :: i16i, i16o
> +
> + logical (kind=1) :: l1i, l1o
> + logical (kind=2) :: l2i, l2o
> + logical (kind=4) :: l4i, l4o
> + logical (kind=8) :: l8i, l8o
> + logical (kind=16) :: l16i, l16o
> +
> + real (kind=4) :: r4i, r4o
> + real (kind=8) :: r8i, r8o
> +
> + complex (kind=4) :: c4i, c4o
> + complex (kind=8) :: c8i, c8o
> +
> + character (kind=1) :: ch1i, ch1o
> + character (kind=4) :: ch4i, ch4o
> +
> + i1i = 1
> + i2i = 2
> + i4i = 3
> + i8i = 4
> + i16i = 5
> +
> + l1i = .true.
> + l2i = .false.
> + l4i = .true.
> + l8i = .true.
> + l16i = .false.
> +
> + r4i = .5
> + r8i = .25
> +
> + c4i = (2, -2)
> + c8i = (4, -4)
> +
> + ch1i = "a"
> + ch4i = "b"
> +
> + !$acc parallel firstprivate(i1i, i2i, i4i, i8i, i16i) &
> + !$acc copyout(i1o, i2o, i4o, i8o, i16o) &
> + !$acc firstprivate(l1i, l2i, l4i, l8i, l16i) &
> + !$acc copyout(l1o, l2o, l4o, l8o, l16o) &
> + !$acc firstprivate(r4i, r8i) copyout(r4o, r8o) &
> + !$acc firstprivate(c4i, c8i) copyout(c4o, c8o) &
> + !$acc firstprivate(ch1i, ch4i) &
> + !$acc copyout(ch1o, ch4o)
> + i1o = i1i
> + i2o = i2i
> + i4o = i4i
> + i8o = i8i
> + i16o = i16i
> +
> + l1o = l1i
> + l2o = l2i
> + l4o = l4i
> + l8o = l8i
> + l16o = l16i
> +
> + r4o = r4i
> + r8o = r8i
> +
> + c4o = c4i
> + c8o = c8i
> +
> + ch1o = ch1i
> + ch4o = ch4i
> + !$acc end parallel
> +
> + if (i1i /= i1o) call abort
> + if (i2i /= i2o) call abort
> + if (i4i /= i4o) call abort
> + if (i8i /= i8o) call abort
> + if (i16i /= i16o) call abort
> +
> + if (l1i .neqv. l1o) call abort
> + if (l2i .neqv. l2o) call abort
> + if (l4i .neqv. l4o) call abort
> + if (l8i .neqv. l8o) call abort
> + if (l16i .neqv. l16o) call abort
> +
> + if (r4i /= r4o) call abort
> + if (r8i /= r8o) call abort
> +
> + if (c4i /= c4o) call abort
> + if (c8i /= c8o) call abort
> +
> + if (ch1i /= ch1o) call abort
> + if (ch4i /= ch4o) call abort
> +
> + call subtest(i1i, i2i, i4i, i8i, i16i, i1o, i2o, i4o, i8o, i16o, &
> + l1i, l2i, l4i, l8i, l16i, l1o, l2o, l4o, l8o, l16o, &
> + r4i, r8i, r4o, r8o, c4i, c8i, c4o, c8o, &
> + ch1i, ch4i, ch1o, ch4o)
> +end program test
> +
> +subroutine subtest(i1i, i2i, i4i, i8i, i16i, i1o, i2o, i4o, i8o, i16o, &
> + l1i, l2i, l4i, l8i, l16i, l1o, l2o, l4o, l8o, l16o, &
> + r4i, r8i, r4o, r8o, c4i, c8i, c4o, c8o, &
> + ch1i, ch4i, ch1o, ch4o)
> + implicit none
> +
> + integer (kind=1) :: i1i, i1o
> + integer (kind=2) :: i2i, i2o
> + integer (kind=4) :: i4i, i4o
> + integer (kind=8) :: i8i, i8o
> + integer (kind=16) :: i16i, i16o
> +
> + logical (kind=1) :: l1i, l1o
> + logical (kind=2) :: l2i, l2o
> + logical (kind=4) :: l4i, l4o
> + logical (kind=8) :: l8i, l8o
> + logical (kind=16) :: l16i, l16o
> +
> + real (kind=4) :: r4i, r4o
> + real (kind=8) :: r8i, r8o
> +
> + complex (kind=4) :: c4i, c4o
> + complex (kind=8) :: c8i, c8o
> +
> + character (kind=1) :: ch1i, ch1o
> + character (kind=4) :: ch4i, ch4o
> +
> + i1i = -i1i
> + i2i = -i2i
> + i4i = -i4i
> + i8i = -i8i
> + i16i = -i16i
> +
> + l1i = .not. l1i
> + l2i = .not. l2i
> + l4i = .not. l4i
> + l8i = .not. l8i
> + l16i = .not. l16i
> +
> + r4i = -r4i
> + r8i = -r8i
> +
> + c4i = -c4i
> + c8i = -c8i
> +
> + ch1i = "z"
> + ch4i = "y"
> +
> + !$acc parallel firstprivate(i1i, i2i, i4i, i8i, i16i) &
> + !$acc copyout(i1o, i2o, i4o, i8o, i16o) &
> + !$acc firstprivate(l1i, l2i, l4i, l8i, l16i) &
> + !$acc copyout(l1o, l2o, l4o, l8o, l16o) &
> + !$acc firstprivate(r4i, r8i) copyout(r4o, r8o) &
> + !$acc firstprivate(c4i, c8i) copyout(c4o, c8o) &
> + !$acc firstprivate(ch1i, ch4i) &
> + !$acc copyout(ch1o, ch4o)
> + i1o = i1i
> + i2o = i2i
> + i4o = i4i
> + i8o = i8i
> + i16o = i16i
> +
> + l1o = l1i
> + l2o = l2i
> + l4o = l4i
> + l8o = l8i
> + l16o = l16i
> +
> + r4o = r4i
> + r8o = r8i
> +
> + c4o = c4i
> + c8o = c8i
> +
> + ch1o = ch1i
> + ch4o = ch4i
> + !$acc end parallel
> +
> + if (i1i /= i1o) call abort
> + if (i2i /= i2o) call abort
> + if (i4i /= i4o) call abort
> + if (i8i /= i8o) call abort
> + if (i16i /= i16o) call abort
> +
> + if (l1i .neqv. l1o) call abort
> + if (l2i .neqv. l2o) call abort
> + if (l4i .neqv. l4o) call abort
> + if (l8i .neqv. l8o) call abort
> + if (l16i .neqv. l16o) call abort
> +
> + if (r4i /= r4o) call abort
> + if (r8i /= r8o) call abort
> +
> + if (c4i /= c4o) call abort
> + if (c8i /= c8o) call abort
> +
> + if (ch1i /= ch1o) call abort
> + if (ch4i /= ch4o) call abort
> +end subroutine subtest