Richard Biener <rguent...@suse.de> writes:

> On Mon, 28 Oct 2019, Jiufu Guo wrote:
>
>> Richard Biener <rguent...@suse.de> writes:
>> 
>> > On Fri, 25 Oct 2019, Jiufu Guo wrote:
>> >
>> >> Hi,
>> >> 
>> >> In PR88760, there are a few disscussion about improve or tune unroller for
>> >> targets. And we would agree to enable unroller for small loops at O2 
>> >> first.
>> >> And we could see performance improvement(~10%) for below code:
>> >> ```
>> >>   subroutine foo (i, i1, block)
>> >>     integer :: i, i1
>> >>     integer :: block(9, 9, 9)
>> >>     block(i:9,1,i1) = block(i:9,1,i1) - 10
>> >>   end subroutine foo
>> >> 
>> >> ```
>> >> This kind of code occurs a few times in exchange2 benchmark.
>> >> 
>> >> Similar C code:
>> >> ```
>> >>   for (i = 0; i < n; i++)
>> >>     arr[i] = arr[i] - 10;
>> >> ```
>> >> 
>> >> On powerpc64le, for O2 , enable -funroll-loops and limit
>> >> PARAM_MAX_UNROLL_TIMES=2 and PARAM_MAX_UNROLLED_INSNS=20, we can see >2%
>> >> overall improvement for SPEC2017.
>> >
>> > Note the behavior with LTO will be surprising (and generally when
>> > functions are compiled with different -O level via the optimize
>> > attribute).  It's a bad idea to base --param values on the value
>> > of a global optimization option that can be set per function
>> > (see PR92046).
>> 
>> Thanks Richard,
>> --param values are not save per function. If using different method to
>> compile and link (e.g. compiling with -O2 -flto, linking with -flto -O2
>> -funroll-loops --param xxx), compiling and linking would use different
>> --param value. 
>
> Yes, and more so if you have most units commpiled with -O2 but one
> with -O3 then you get implicit "global" -O3 at link-time and thus
> -O3 --param settings for all -O2 compiled functions.
>
>> >
>> > A better patch would have been to adjust via the target hooks for
>> > unroll adjust.
>> Yeap. And in unroll adjust target hook, we can do fine tunning with more
>> heuristic thougths. I'm going to refine this way.
>
> Thanks.  Note enabling unrolling at -O2 for the target like you did
> should work OK for LTO (even if some units explicitely disable it
> via -fno-unroll-loop).
>
> Richard.
Thanks Richard!
I just send out a new patch which is using loop unroll adjust hook.
The patch is still checking global option setting: x_flag_unroll_loops
to avoid changing the behavior of explicit -funroll-loops.

And then if there is no explicit -funroll-loops on some case,
conservative unrolling (only unroll small loops 2 times) will be
adopted. For example:

-flto -c t.c -O2 -funroll-loops 
-flto t.o

this linking may not be treated as explicit -funroll-loops. 
I thinking to keep this behavior untill removing global setting check.
How do you think?

Thanks in advance.

Jiufu Guo
BR.

>
>> Thanks again.
>> 
>> Jiufu Guo
>> BR.
>> 
>> >
>> > Thanks,
>> > Richard.
>> >
>> >> This patch is only for rs6000 in which we see visible performance 
>> >> improvement.
>> >> 
>> >> Bootstrapped on powerpc64le, and cases are updated. Is this ok for trunk?
>> >>
>> >> Jiufu Guo
>> >> BR
>> >> 
>> >> 
>> >> gcc/
>> >> 2019-10-25  Jiufu Guo  <guoji...@linux.ibm.com>       
>> >> 
>> >>   PR tree-optimization/88760
>> >>   * config/rs6000/rs6000-common.c (rs6000_option_optimization_table): for
>> >>   O2, enable funroll-loops.
>> >>   * config/rs6000/rs6000.c (rs6000_option_override_internal): if unroller
>> >>   is enabled throught O2, set constrains to PARAM_MAX_UNROLL_TIMES=2 and
>> >>   PARAM_MAX_UNROLLED_INSNS=20 for small loops.
>> >>   
>> >> gcc.testsuite/
>> >> 2019-10-25  Jiufu Guo  <guoji...@linux.ibm.com>
>> >> 
>> >>   PR tree-optimization/88760
>> >>   * gcc.target/powerpc/small-loop-unroll.c: New test.
>> >>   * c-c++-common/tsan/thread_leak2.c: Update test.
>> >>   * gcc.dg/pr59643.c: Update test.
>> >>   * gcc.target/powerpc/loop_align.c: Update test.
>> >>   * gcc.target/powerpc/ppc-fma-1.c: Update test.
>> >>   * gcc.target/powerpc/ppc-fma-2.c: Update test.
>> >>   * gcc.target/powerpc/ppc-fma-3.c: Update test.
>> >>   * gcc.target/powerpc/ppc-fma-4.c: Update test.
>> >>   * gcc.target/powerpc/pr78604.c: Update test.
>> >> 
>> >> ---
>> >>  gcc/common/config/rs6000/rs6000-common.c             |  1 +
>> >>  gcc/config/rs6000/rs6000.c                           | 20 
>> >> ++++++++++++++++++++
>> >>  gcc/testsuite/c-c++-common/tsan/thread_leak2.c       |  1 +
>> >>  gcc/testsuite/gcc.dg/pr59643.c                       |  1 +
>> >>  gcc/testsuite/gcc.target/powerpc/loop_align.c        |  2 +-
>> >>  gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c         |  2 +-
>> >>  gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c         |  2 +-
>> >>  gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c         |  2 +-
>> >>  gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c         |  2 +-
>> >>  gcc/testsuite/gcc.target/powerpc/pr78604.c           |  2 +-
>> >>  gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c | 13 +++++++++++++
>> >>  11 files changed, 42 insertions(+), 6 deletions(-)
>> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c
>> >> 
>> >> diff --git a/gcc/common/config/rs6000/rs6000-common.c 
>> >> b/gcc/common/config/rs6000/rs6000-common.c
>> >> index 4b0c205..b947196 100644
>> >> --- a/gcc/common/config/rs6000/rs6000-common.c
>> >> +++ b/gcc/common/config/rs6000/rs6000-common.c
>> >> @@ -35,6 +35,7 @@ static const struct default_options 
>> >> rs6000_option_optimization_table[] =
>> >>      { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
>> >>      /* Enable -fsched-pressure for first pass instruction scheduling.  */
>> >>      { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
>> >> +    { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
>> >>      { OPT_LEVELS_NONE, 0, NULL, 0 }
>> >>    };
>> >>  
>> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> >> index a129137d..9a8ff9a 100644
>> >> --- a/gcc/config/rs6000/rs6000.c
>> >> +++ b/gcc/config/rs6000/rs6000.c
>> >> @@ -4540,6 +4540,26 @@ rs6000_option_override_internal (bool 
>> >> global_init_p)
>> >>                        global_options.x_param_values,
>> >>                        global_options_set.x_param_values);
>> >>  
>> >> +      /* unroll very small loops 2 time if no -funroll-loops.  */
>> >> +      if (!global_options_set.x_flag_unroll_loops
>> >> +   && !global_options_set.x_flag_unroll_all_loops)
>> >> + {
>> >> +   maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
>> >> +                          global_options.x_param_values,
>> >> +                          global_options_set.x_param_values);
>> >> +
>> >> +   maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
>> >> +                          global_options.x_param_values,
>> >> +                          global_options_set.x_param_values);
>> >> +
>> >> +   /* If fweb or frename-registers are not specificed in command-line,
>> >> +      do not turn them on implicitly.  */
>> >> +   if (!global_options_set.x_flag_web)
>> >> +     global_options.x_flag_web = 0;
>> >> +   if (!global_options_set.x_flag_rename_registers)
>> >> +     global_options.x_flag_rename_registers = 0;
>> >> + }
>> >> +
>> >>        /* If using typedef char *va_list, signal that
>> >>    __builtin_va_start (&ap, 0) can be optimized to
>> >>    ap = __builtin_next_arg (0).  */
>> >> diff --git a/gcc/testsuite/c-c++-common/tsan/thread_leak2.c 
>> >> b/gcc/testsuite/c-c++-common/tsan/thread_leak2.c
>> >> index c9b8046..17aa5c6 100644
>> >> --- a/gcc/testsuite/c-c++-common/tsan/thread_leak2.c
>> >> +++ b/gcc/testsuite/c-c++-common/tsan/thread_leak2.c
>> >> @@ -1,4 +1,5 @@
>> >>  /* { dg-shouldfail "tsan" } */
>> >> +/* { dg-additional-options "-fno-unroll-loops" { target { powerpc-*-* 
>> >> powerpc64-*-* powerpc64le-*-* } } } */
>> >>  
>> >>  #include <pthread.h>
>> >>  #include <unistd.h>
>> >> diff --git a/gcc/testsuite/gcc.dg/pr59643.c 
>> >> b/gcc/testsuite/gcc.dg/pr59643.c
>> >> index de78d60..3aef439 100644
>> >> --- a/gcc/testsuite/gcc.dg/pr59643.c
>> >> +++ b/gcc/testsuite/gcc.dg/pr59643.c
>> >> @@ -1,6 +1,7 @@
>> >>  /* PR tree-optimization/59643 */
>> >>  /* { dg-do compile } */
>> >>  /* { dg-options "-O3 -fdump-tree-pcom-details" } */
>> >> +/* { dg-additional-options "-fno-unroll-loops" { target { powerpc-*-* 
>> >> powerpc64-*-* powerpc64le-*-* } } } */
>> >>  
>> >>  void
>> >>  foo (double *a, double *b, double *c, double d, double e, int n)
>> >> diff --git a/gcc/testsuite/gcc.target/powerpc/loop_align.c 
>> >> b/gcc/testsuite/gcc.target/powerpc/loop_align.c
>> >> index ebe3782..ef67f77 100644
>> >> --- a/gcc/testsuite/gcc.target/powerpc/loop_align.c
>> >> +++ b/gcc/testsuite/gcc.target/powerpc/loop_align.c
>> >> @@ -1,6 +1,6 @@
>> >>  /* { dg-do compile { target { powerpc*-*-* } } } */
>> >>  /* { dg-skip-if "" { powerpc*-*-darwin* powerpc-ibm-aix* } } */
>> >> -/* { dg-options "-O2 -mdejagnu-cpu=power7 -falign-functions=16" } */
>> >> +/* { dg-options "-O2 -mdejagnu-cpu=power7 -falign-functions=16 
>> >> -fno-unroll-loops" } */
>> >>  /* { dg-final { scan-assembler ".p2align 5" } } */
>> >>  
>> >>  void f(double *a, double *b, double *c, unsigned long n) {
>> >> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c 
>> >> b/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c
>> >> index b4945e6..2a5b92c 100644
>> >> --- a/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c
>> >> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c
>> >> @@ -1,7 +1,7 @@
>> >>  /* { dg-do compile { target { powerpc*-*-* } } } */
>> >>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
>> >>  /* { dg-require-effective-target powerpc_vsx_ok } */
>> >> -/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math" 
>> >> } */
>> >> +/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math 
>> >> -fno-unroll-loops" } */
>> >>  /* { dg-final { scan-assembler-times "xvmadd" 4 } } */
>> >>  /* { dg-final { scan-assembler-times "xsmadd\|fmadd\ " 2 } } */
>> >>  /* { dg-final { scan-assembler-times "fmadds" 2 } } */
>> >> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c 
>> >> b/gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c
>> >> index 5ed630a..bf2c67f 100644
>> >> --- a/gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c
>> >> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c
>> >> @@ -1,7 +1,7 @@
>> >>  /* { dg-do compile { target { powerpc*-*-* } } } */
>> >>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
>> >>  /* { dg-require-effective-target powerpc_vsx_ok } */
>> >> -/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math 
>> >> -ffp-contract=off" } */
>> >> +/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math 
>> >> -ffp-contract=off -fno-unroll-loops" } */
>> >>  /* { dg-final { scan-assembler-times "xvmadd" 2 } } */
>> >>  /* { dg-final { scan-assembler-times "xsmadd\|fmadd\ " 1 } } */
>> >>  /* { dg-final { scan-assembler-times "fmadds" 1 } } */
>> >> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c 
>> >> b/gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c
>> >> index ef252b3..8608116 100644
>> >> --- a/gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c
>> >> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c
>> >> @@ -2,7 +2,7 @@
>> >>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
>> >>  /* { dg-require-effective-target powerpc_altivec_ok } */
>> >>  /* { dg-require-effective-target powerpc_fprs } */
>> >> -/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power6 -maltivec 
>> >> -ffast-math" } */
>> >> +/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power6 -maltivec 
>> >> -ffast-math -fno-unroll-loops" } */
>> >>  /* { dg-final { scan-assembler-times "vmaddfp" 2 } } */
>> >>  /* { dg-final { scan-assembler-times "fmadd " 2 } } */
>> >>  /* { dg-final { scan-assembler-times "fmadds" 2 } } */
>> >> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c 
>> >> b/gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c
>> >> index c2eaf1a..291c2ee 100644
>> >> --- a/gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c
>> >> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c
>> >> @@ -2,7 +2,7 @@
>> >>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
>> >>  /* { dg-require-effective-target powerpc_altivec_ok } */
>> >>  /* { dg-require-effective-target powerpc_fprs } */
>> >> -/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power6 -maltivec 
>> >> -ffast-math -ffp-contract=off" } */
>> >> +/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power6 -maltivec 
>> >> -ffast-math -ffp-contract=off -fno-unroll-loops" } */
>> >>  /* { dg-final { scan-assembler-times "vmaddfp" 1 } } */
>> >>  /* { dg-final { scan-assembler-times "fmadd " 1 } } */
>> >>  /* { dg-final { scan-assembler-times "fmadds" 1 } } */
>> >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr78604.c 
>> >> b/gcc/testsuite/gcc.target/powerpc/pr78604.c
>> >> index 76d8945..35bfdb3 100644
>> >> --- a/gcc/testsuite/gcc.target/powerpc/pr78604.c
>> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr78604.c
>> >> @@ -1,7 +1,7 @@
>> >>  /* { dg-do compile { target { powerpc*-*-* } } } */
>> >>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
>> >>  /* { dg-require-effective-target powerpc_p8vector_ok } */
>> >> -/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize 
>> >> -fdump-tree-vect-details" } */
>> >> +/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize 
>> >> -fdump-tree-vect-details -fno-unroll-loops" } */
>> >>  
>> >>  #ifndef SIZE
>> >>  #define SIZE 1024
>> >> diff --git a/gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c 
>> >> b/gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c
>> >> new file mode 100644
>> >> index 0000000..888ea50b
>> >> --- /dev/null
>> >> +++ b/gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c
>> >> @@ -0,0 +1,13 @@
>> >> +/* { dg-do compile } */
>> >> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll" } */
>> >> +
>> >> +void __attribute__ ((noinline)) foo(int n, int *arr)
>> >> +{
>> >> +  int i;
>> >> +  for (i = 0; i < n; i++)
>> >> +    arr[i] = arr[i] - 10;
>> >> +}
>> >> +/* { dg-final { scan-rtl-dump-times "Unrolled loop 1 times" 1 
>> >> "loop2_unroll" } } */
>> >> +/* { dg-final { scan-assembler-times "lwz" 3 } } */
>> >> +/* { dg-final { scan-assembler-times "stw" 3 } } */
>> >> +
>> >> 
>> 

Reply via email to