Hi Jiufu, On Fri, Oct 25, 2019 at 10:44:39PM +0800, Jiufu Guo wrote: > In PR88760, there are a few disscussion about improve or tune unroller for > targets. And we would agree to enable unroller for small loops at O2 first.
[ snip ] > PR tree-optimization/88760 > * config/rs6000/rs6000-common.c (rs6000_option_optimization_table): for > O2, enable funroll-loops. * config/rs6000/rs6000-common.c (rs6000_option_optimization_table): Enable -funroll-loops for -O2 and above. > * config/rs6000/rs6000.c (rs6000_option_override_internal): if unroller > is enabled throught O2, set constrains to PARAM_MAX_UNROLL_TIMES=2 and > PARAM_MAX_UNROLLED_INSNS=20 for small loops. * config/rs6000/rs6000.c (rs6000_option_override_internal): Set PARAM_MAX_UNROLL_TIMES to 2 and PARAM_MAX_UNROLLED_INSNS to 20 if the unroller is not explicitly enabled. > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -4540,6 +4540,26 @@ rs6000_option_override_internal (bool global_init_p) > global_options.x_param_values, > global_options_set.x_param_values); > > + /* unroll very small loops 2 time if no -funroll-loops. */ > + if (!global_options_set.x_flag_unroll_loops > + && !global_options_set.x_flag_unroll_all_loops) > + { > + maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, > + global_options.x_param_values, > + global_options_set.x_param_values); > + > + maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, > + global_options.x_param_values, > + global_options_set.x_param_values); > + > + /* If fweb or frename-registers are not specificed in command-line, (specified) > + do not turn them on implicitly. */ > + if (!global_options_set.x_flag_web) > + global_options.x_flag_web = 0; > + if (!global_options_set.x_flag_rename_registers) > + global_options.x_flag_rename_registers = 0; > + } This web and rnreg thing needs to be in the changelog, too. > diff --git a/gcc/testsuite/c-c++-common/tsan/thread_leak2.c > b/gcc/testsuite/c-c++-common/tsan/thread_leak2.c > index c9b8046..17aa5c6 100644 > --- a/gcc/testsuite/c-c++-common/tsan/thread_leak2.c > +++ b/gcc/testsuite/c-c++-common/tsan/thread_leak2.c > @@ -1,4 +1,5 @@ > /* { dg-shouldfail "tsan" } */ > +/* { dg-additional-options "-fno-unroll-loops" { target { powerpc-*-* > powerpc64-*-* powerpc64le-*-* } } } */ You can write that as just { target { powerpc*-*-* } } Could you explain why this is needed here? In the test file itself, preferably. > --- a/gcc/testsuite/gcc.dg/pr59643.c > +++ b/gcc/testsuite/gcc.dg/pr59643.c > @@ -1,6 +1,7 @@ > /* PR tree-optimization/59643 */ > /* { dg-do compile } */ > /* { dg-options "-O3 -fdump-tree-pcom-details" } */ > +/* { dg-additional-options "-fno-unroll-loops" { target { powerpc-*-* > powerpc64-*-* powerpc64le-*-* } } } */ Same here. How does this patch change behaviour here, anyway? The test uses -O3? > --- a/gcc/testsuite/gcc.target/powerpc/loop_align.c > +++ b/gcc/testsuite/gcc.target/powerpc/loop_align.c > @@ -1,6 +1,6 @@ > /* { dg-do compile { target { powerpc*-*-* } } } */ > /* { dg-skip-if "" { powerpc*-*-darwin* powerpc-ibm-aix* } } */ > -/* { dg-options "-O2 -mdejagnu-cpu=power7 -falign-functions=16" } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power7 -falign-functions=16 > -fno-unroll-loops" } */ > /* { dg-final { scan-assembler ".p2align 5" } } */ For this one it is pretty much obvious. > --- a/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c > +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c > @@ -1,7 +1,7 @@ > /* { dg-do compile { target { powerpc*-*-* } } } */ > /* { dg-skip-if "" { powerpc*-*-darwin* } } */ > /* { dg-require-effective-target powerpc_vsx_ok } */ > -/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math" } */ > +/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math > -fno-unroll-loops" } */ And here, and all other powerpc-specific tests (they count the generated machine instructions). > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-rtl-loop2_unroll" } */ > + > +void __attribute__ ((noinline)) foo(int n, int *arr) > +{ > + int i; > + for (i = 0; i < n; i++) > + arr[i] = arr[i] - 10; > +} > +/* { dg-final { scan-rtl-dump-times "Unrolled loop 1 times" 1 "loop2_unroll" > } } */ > +/* { dg-final { scan-assembler-times "lwz" 3 } } */ > +/* { dg-final { scan-assembler-times "stw" 3 } } */ /* { dg-final { scan-assembler-times {\mlwz\M} 3 } } */ /* { dg-final { scan-assembler-times {\mstw\M} 3 } } */ so that it doesn't count when lwz or stw is only part of the word. This doesn't so easily matter for scan-assembler-times, but still :-) There also is some fallout in guality, but that's just showing the state of guality. So please fix those trivialities and explain why those two testcases need a change, and it should be good to go. Nice patch Jiufu! Segher