[Bug tree-optimization/87214] New: [9 Regression] SPEC CPU2017, CPU2006 520/620, 403 runfails after r263772 with march=skylake-avx512

2018-09-04 Thread alexander.nesterovskiy at intel dot com
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: alexander.nesterovskiy at intel dot com Target Milestone: --- There are runfails for the following benchmarks since r263772: SPEC2017 520

[Bug tree-optimization/86702] [9 Regression] SPEC CPU2006 400.perlbench, CPU2017 500.perlbench_r ~3% performance drop after r262247

2018-08-02 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86702 --- Comment #4 from Alexander Nesterovskiy --- I've noticed performance regressions on different targets and with different compilation options, not only highly optimized like "-march=skylake-avx512 -Ofast -flto -funroll-loops" but with "-O2" too

[Bug tree-optimization/86702] New: [8/9 Regression] SPEC CPU2006 400.perlbench, CPU2017 500.perlbench_r ~3% performance drop after r262247

2018-07-27 Thread alexander.nesterovskiy at intel dot com
: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: alexander.nesterovskiy at intel dot com Target Milestone: --- Created attachment 44453 --> https://gcc.gnu.org/bugzi

[Bug tree-optimization/86054] [8/9 Regression] SPEC CPU2006 416.gamess miscompare after r259592 with march=skylake-avx512

2018-06-05 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86054 --- Comment #2 from Alexander Nesterovskiy --- Thanks, "-fno-aggressive-loop-optimizations" helps.

[Bug tree-optimization/86054] New: [8.1/9 Regression] SPEC CPU2006 416.gamess miscompare after r259592 with march=skylake-avx512

2018-06-05 Thread alexander.nesterovskiy at intel dot com
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: alexander.nesterovskiy at intel dot com Target Milestone: --- Created attachment 44237 --> https://gcc.gnu.org/bugzilla/attachment.cgi

[Bug middle-end/82344] [8 Regression] SPEC CPU2006 435.gromacs ~10% performance regression with trunk@250855

2018-03-27 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82344 --- Comment #7 from Alexander Nesterovskiy --- Yes, I've checked it - current performance is about previous level and execution of these piece of code takes the same amount of time.

[Bug tree-optimization/84419] [8 Regression] SPEC CPU2017/CPU2006 521/621, 527/627, 554/654, 445, 454, 481, 416 runfails after r256628 with march=skylake-avx512

2018-02-20 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84419 --- Comment #6 from Alexander Nesterovskiy --- All the mentioned SPEC CPU2017/CPU2006 521/621, 527/627, 554/654, 445, 454, 481, 416 have finished successfully. Patch was applied to r257732.

[Bug tree-optimization/84419] [8 Regression] SPEC CPU2017/CPU2006 521/621, 527/627, 554/654, 445, 454, 481, 416 runfails after r256628 with march=skylake-avx512

2018-02-19 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84419 --- Comment #5 from Alexander Nesterovskiy --- Yes, looks like the problem is with unaligned access (there is no fail in reproducer when starting a loop with i=0). It seems that your patch works - there are no runfails for reproducer, 445, 521, 5

[Bug target/82862] [8 Regression] SPEC CPU2006 465.tonto performance regression with r253975 (up to 40% drop for particular loop)

2018-02-19 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82862 --- Comment #8 from Alexander Nesterovskiy --- I'd say that it's not just fixed but improved with an impressive gain. It is about +4% on HSW AVX2 and about +8% on SKX AVX512 after r257734 (compared to r257732) for a 465.tonto SPEC rate. Comparin

[Bug tree-optimization/84419] [8 Regression] SPEC CPU2017/CPU2006 521/621, 527/627, 554/654, 445, 454, 481, 416 runfails after r256628 with march=skylake-avx512

2018-02-16 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84419 --- Comment #2 from Alexander Nesterovskiy --- I've made a quite small reproducer: --- $ cat reproducer.c #include #include #define SIZE 400 int foo[SIZE]; char bar[SIZE]; void __attribute__ ((noinline)) foo_func(void) { int i; for (i =

[Bug tree-optimization/84419] New: [8 Regression] SPEC CPU2017/CPU2006 521/621, 527/627, 554/654, 445, 454, 481, 416 runfails after r256628 with march=skylake-avx512

2018-02-16 Thread alexander.nesterovskiy at intel dot com
Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: alexander.nesterovskiy at intel dot com Target Milestone: --- There are runfails for

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-02-08 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #27 from Alexander Nesterovskiy --- Place of interest here is a loop in mat_times_vec function. For r253678 a mat_times_vec.constprop._loopfn.0 is created with autopar. For r256990 the mat_times_vec is inlined into bi_cgstab_block and

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-02-08 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #26 from Alexander Nesterovskiy --- Created attachment 43361 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43361&action=edit r253678 vs r256990_work_spin

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-02-02 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #24 from Alexander Nesterovskiy --- Yes, it looks like more time is being spent in synchronizing. r256990 really changes the way autopar works: For r253679...r256989 the most of work was in main thread0 mostly (thread0 ~91%, threads1-

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-02-02 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #23 from Alexander Nesterovskiy --- Created attachment 43326 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43326&action=edit r253678 vs r256990

[Bug ipa/84149] New: [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-01-31 Thread alexander.nesterovskiy at intel dot com
Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: alexander.nesterovskiy at intel dot com CC: marxin at gcc dot gnu.org Target Milestone: --- Minimal options to reproduce regression (x86, 64-bit): -O3

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2018-01-30 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 --- Comment #20 from Alexander Nesterovskiy --- I've made test runs on Broadwell and Skylake, RHEL 7.3. 410.bwaves became faster after r256990 but not as fast as it was on r253678. Comparing 410.bwaves performance, "-Ofast -funroll-loops -flto -

[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)

2017-12-21 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326 --- Comment #6 from Alexander Nesterovskiy --- Thanks! I see performance gain on 648.exchange2_s (~6% on Broadwell and ~3% on Skylake-X) reverting performance to r255266 level (Skylake-X regression was ~3%). And loops unrolled with 2 and 3 iterat

[Bug tree-optimization/83326] New: [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)

2017-12-08 Thread alexander.nesterovskiy at intel dot com
Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: alexander.nesterovskiy at intel dot com Target Milestone: --- Created attachment 42815 --> https://gcc.gnu.

[Bug tree-optimization/82862] New: [8 Regression] SPEC CPU2006 465.tonto performance regression with trunk@253975 (up to 40% drop for particular loop)

2017-11-06 Thread alexander.nesterovskiy at intel dot com
: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: alexander.nesterovskiy at intel dot com Target Milestone: --- Created attachment 42552 --> ht

[Bug tree-optimization/82604] New: [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

2017-10-18 Thread alexander.nesterovskiy at intel dot com
Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: alexander.nesterovskiy at intel dot com Target Milestone: --- Minimal options to reproduce

[Bug fortran/82362] [8 Regression] SPEC CPU2006 436.cactusADM ~7% performance deviation with trunk@251713

2017-10-02 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82362 --- Comment #4 from Alexander Nesterovskiy --- (In reply to Richard Biener from comment #2) > I suppose you swapped the revs here Yep, sorry. It's supposed to be: r251711: 99,5% 99,6% 99,8% 100,0% 100,3% 100,6% 100,6% r251713: 92,8% 92

[Bug fortran/82362] New: [8 Regression] SPEC CPU2006 436.cactusADM ~7% performance deviation with trunk@251713

2017-09-29 Thread alexander.nesterovskiy at intel dot com
Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: alexander.nesterovskiy at intel dot com Target Milestone: --- r251713 brings reasonable improvement to alloca. However there is a side effect of this patch - 436

[Bug rtl-optimization/82344] New: [8 Regression] SPEC CPU2006 435.gromacs ~10% performance regression with trunk@250855

2017-09-27 Thread alexander.nesterovskiy at intel dot com
Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: alexander.nesterovskiy at intel dot com Target Milestone: --- Created attachment 42246 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42246&

[Bug tree-optimization/82220] [8 Regression] SPEC CPU2006 482.sphinx3 ~10% performance regression with trunk@250416

2017-09-15 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82220 --- Comment #2 from Alexander Nesterovskiy --- Yes, I've applied a patch and looks like it helped: --- Overhead SamplesSymbol trunk@252796 + patch 31.57%412037mgau_eval 30.54%--> 397608vector_gautbl_eval_logs3

[Bug tree-optimization/82220] New: [8 Regression] SPEC CPU2006 482.sphinx3 ~10% performance regression with trunk@250416

2017-09-15 Thread alexander.nesterovskiy at intel dot com
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: alexander.nesterovskiy at intel dot com Target Milestone: --- Calculation of a min_profitable_iters threshold was changed in r250416 In 482.sphinx3 a