zeusmp regressed by about 5% again with the PRE fix for PR41101, which is r151561. The problem is that PRE now finds a partial redundancy (where in reality there isn't any) and the PHI node to compensate for this prevents vectorization of a loop due to its value used outside that loop. Testcase extracted from zeusmp:
% cat hsmoc-1.f subroutine hsmoc ( ) implicit NONE integer ijkn parameter(ijkn = 128+5) real*8 dt, fact, db(ijkn), w1dt(ijkn) integer i, is, ie, j, js, je common /rootr/ dt common /scratch/ w1dt do 9 i=is,ie do 807 j=js-1,je+1 db (j ) = j 807 continue fact = dt * i do 808 j=js,je+1 w1dt(j)= fact * db (j) 808 continue 9 continue return end (compile with -march=barcelona -O3 -ffast-math -funroll-loops -fpeel-loops) The problem is the access to 'dt' (rootr.dt), which PRE thinks is partially redundant in the first loop (!?), hence it creates this code: pretmp.11_53 = rootr.dt; Loop-i: prephitmp.12_51 = PHI <pretmp.11_53(9), D.1376_20(20)> ... Loop-j1 prephitmp.12_49 = PHI <prephitmp.12_51(11), pretmp.11_52(14)> ... pretmp.11_52 = rootr.dt; goto Loop-j1 prephitmp.12_23 = PHI <prephitmp.12_51(12), prephitmp.12_49(13)> D.1376_20 = prephitmp.12_23; ... Loop-j2 Notice especially how we now read rootr.dt in the backedge for loop-j1, which is much more often than before. Originally we access it ie-is times, now we access it (ie-is)*(je-js) times. It's possible that this alone explains the speed regression, and not necessarily the missed vectorization. But the missed vectorization was much easier to detect. -- Summary: r151561 (PRE fix) regresses zeusmp Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: matz at gcc dot gnu dot org GCC host triplet: x86_64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41783