https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604
Bug ID: 82604 Summary: [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: alexander.nesterovskiy at intel dot com Target Milestone: --- Minimal options to reproduce regression (4 threads is just for example, there can be more): -Ofast -funroll-loops -flto -ftree-parallelize-loops=4 Auto-parallelization became mostly useless for 410.bwaves after r253679. CPU time distributes like this: Thread0 Thread1 Thread2 Thread3 r253679: ~91% ~3% ~3% ~3% r253678: ~34% ~22% ~22% ~22% Linking with "-fopt-info-loop-optimized" shows that twice less loops have parallelized: --- gfortran -Ofast -funroll-loops -flto -ftree-parallelize-loops=4 -g -fopt-info-loop-optimized=loop.optimized *.o grep parallelizing loop.optimized -c --- r253679: 19 r253678: 38 Most valuable missed parallelization is "block_solver.f:170:0: note: parallelizing outer loop 2" in the hottest function "mat_times_vec".