https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81494
Bug ID: 81494 Summary: [8 Regression] 454.calculix miscompares with -Ofast after r249919 Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* The additional reduction vectorizations cause (Intel Haswell, -Ofast -march=native) *** Miscompare of hyperviscoplastic.dat, see /gcc/spec/cpu2006/benchspec/CPU2006/454.calculix/run/run_peak_ref_amd64-m64-gcc42-nn.0000/hyperviscoplastic.dat.mis 59371: 30 -2.6001E-01 2.9824E-04 4.0952E-01 30 -2.6001E-01 2.9831E-04 4.0952E-01 ^ 59887: 546 -3.7471E-01 -8.3602E-03 9.9912E-02 546 -3.7471E-01 -8.3600E-03 9.9912E-02 ^ 60706: 1365 -3.3556E-01 -5.7161E-03 1.7904E-01 1365 -3.3556E-01 -5.7159E-03 1.7904E-01 ^ 60708: 1367 -3.4907E-01 -1.3187E-03 1.3018E-01 1367 -3.4907E-01 -1.3185E-03 1.3018E-01 ^ 60713: 1372 -3.4396E-01 -5.0246E-03 1.5362E-01 1372 -3.4396E-01 -5.0244E-03 1.5362E-01 ^ 60735: 1394 -3.4182E-01 9.2686E-03 1.3116E-01 1394 -3.4182E-01 9.2688E-03 1.3116E-01 ^ 60890: 1549 1.5540E-01 1.1337E-05 1.7656E-01 1549 1.5540E-01 1.1352E-05 1.7656E-01 ^ 61333: 1992 -1.3659E-01 1.4045E-04 1.5981E-01 1992 -1.3659E-01 1.4043E-04 1.5981E-01 ^ 61461: 2120 -1.3232E-05 -1.1836E-01 1.1935E-02 2120 -1.3238E-05 -1.1836E-01 1.1936E-02 ^ 61475: 2134 -7.2402E-02 -2.0436E-04 1.0560E-01 2134 -7.2402E-02 -2.0439E-04 1.0560E-01 additional vectorized loops are: +results.f:1160:0: note: loop vectorized +results.f:914:0: note: loop vectorized those are actually the same loop, once for stresses and once for temperature and thermal flux: if(calcul_qa) then do m1=1,nope do m2=1,3 qa=qa+dabs(fn(m2,konl(m1))-q(m2,m1)) enddo enddo nal=nal+3*nope endif qa, q and fn are real*8, konl is integer. nope is a parametric constant of either 20, 8, 10, 4, 15 or 6. Relevant calls are from nonlingeo.c only. the patch triggers vectorization because without only LIM store motion makes this a reduction (qa is a by-reference paramater) and nothing after that associates the additions in a way the vectorizer is happy with to vectorize the detected reduction chain. So it's not wrong-code but unfortunate (even more unfortunate it is in the results gathering routine and not the actual benchmark part).