Hi Jiufu,

On Fri, Oct 25, 2019 at 10:44:39PM +0800, Jiufu Guo wrote:
> In PR88760, there are a few disscussion about improve or tune unroller for
> targets. And we would agree to enable unroller for small loops at O2 first.

[ snip ]

>       PR tree-optimization/88760
>       * config/rs6000/rs6000-common.c (rs6000_option_optimization_table): for
>       O2, enable funroll-loops.

        * config/rs6000/rs6000-common.c (rs6000_option_optimization_table):
        Enable -funroll-loops for -O2 and above.

>       * config/rs6000/rs6000.c (rs6000_option_override_internal): if unroller
>       is enabled throught O2, set constrains to PARAM_MAX_UNROLL_TIMES=2 and
>       PARAM_MAX_UNROLLED_INSNS=20 for small loops.

        * config/rs6000/rs6000.c (rs6000_option_override_internal): Set
        PARAM_MAX_UNROLL_TIMES to 2 and PARAM_MAX_UNROLLED_INSNS to 20 if the
        unroller is not explicitly enabled.

> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -4540,6 +4540,26 @@ rs6000_option_override_internal (bool global_init_p)
>                            global_options.x_param_values,
>                            global_options_set.x_param_values);
>  
> +      /* unroll very small loops 2 time if no -funroll-loops.  */
> +      if (!global_options_set.x_flag_unroll_loops
> +       && !global_options_set.x_flag_unroll_all_loops)
> +     {
> +       maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
> +                              global_options.x_param_values,
> +                              global_options_set.x_param_values);
> +
> +       maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
> +                              global_options.x_param_values,
> +                              global_options_set.x_param_values);
> +
> +       /* If fweb or frename-registers are not specificed in command-line,

(specified)

> +          do not turn them on implicitly.  */
> +       if (!global_options_set.x_flag_web)
> +         global_options.x_flag_web = 0;
> +       if (!global_options_set.x_flag_rename_registers)
> +         global_options.x_flag_rename_registers = 0;
> +     }

This web and rnreg thing needs to be in the changelog, too.

> diff --git a/gcc/testsuite/c-c++-common/tsan/thread_leak2.c 
> b/gcc/testsuite/c-c++-common/tsan/thread_leak2.c
> index c9b8046..17aa5c6 100644
> --- a/gcc/testsuite/c-c++-common/tsan/thread_leak2.c
> +++ b/gcc/testsuite/c-c++-common/tsan/thread_leak2.c
> @@ -1,4 +1,5 @@
>  /* { dg-shouldfail "tsan" } */
> +/* { dg-additional-options "-fno-unroll-loops" { target { powerpc-*-* 
> powerpc64-*-* powerpc64le-*-* } } } */

You can write that as just  { target { powerpc*-*-* } }

Could you explain why this is needed here?  In the test file itself,
preferably.

> --- a/gcc/testsuite/gcc.dg/pr59643.c
> +++ b/gcc/testsuite/gcc.dg/pr59643.c
> @@ -1,6 +1,7 @@
>  /* PR tree-optimization/59643 */
>  /* { dg-do compile } */
>  /* { dg-options "-O3 -fdump-tree-pcom-details" } */
> +/* { dg-additional-options "-fno-unroll-loops" { target { powerpc-*-* 
> powerpc64-*-* powerpc64le-*-* } } } */

Same here.  How does this patch change behaviour here, anyway?  The test
uses -O3?

> --- a/gcc/testsuite/gcc.target/powerpc/loop_align.c
> +++ b/gcc/testsuite/gcc.target/powerpc/loop_align.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
>  /* { dg-skip-if "" { powerpc*-*-darwin* powerpc-ibm-aix* } } */
> -/* { dg-options "-O2 -mdejagnu-cpu=power7 -falign-functions=16" } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power7 -falign-functions=16 
> -fno-unroll-loops" } */
>  /* { dg-final { scan-assembler ".p2align 5" } } */

For this one it is pretty much obvious.

> --- a/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
>  /* { dg-require-effective-target powerpc_vsx_ok } */
> -/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math" } */
> +/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math 
> -fno-unroll-loops" } */

And here, and all other powerpc-specific tests (they count the generated
machine instructions).

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll" } */
> +
> +void __attribute__ ((noinline)) foo(int n, int *arr)
> +{
> +  int i;
> +  for (i = 0; i < n; i++)
> +    arr[i] = arr[i] - 10;
> +}
> +/* { dg-final { scan-rtl-dump-times "Unrolled loop 1 times" 1 "loop2_unroll" 
> } } */
> +/* { dg-final { scan-assembler-times "lwz" 3 } } */
> +/* { dg-final { scan-assembler-times "stw" 3 } } */

/* { dg-final { scan-assembler-times {\mlwz\M} 3 } } */
/* { dg-final { scan-assembler-times {\mstw\M} 3 } } */

so that it doesn't count when lwz or stw is only part of the word.  This
doesn't so easily matter for scan-assembler-times, but still :-)

There also is some fallout in guality, but that's just showing the state
of guality.


So please fix those trivialities and explain why those two testcases need
a change, and it should be good to go.  Nice patch Jiufu!


Segher

Reply via email to