https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81494
Bug ID: 81494
Summary: [8 Regression] 454.calculix miscompares with -Ofast
after r249919
Product: gcc
Version: 8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
Target: x86_64-*-*
The additional reduction vectorizations cause (Intel Haswell, -Ofast
-march=native)
*** Miscompare of hyperviscoplastic.dat, see
/gcc/spec/cpu2006/benchspec/CPU2006/454.calculix/run/run_peak_ref_amd64-m64-gcc42-nn.0000/hyperviscoplastic.dat.mis
59371: 30 -2.6001E-01 2.9824E-04 4.0952E-01
30 -2.6001E-01 2.9831E-04 4.0952E-01
^
59887: 546 -3.7471E-01 -8.3602E-03 9.9912E-02
546 -3.7471E-01 -8.3600E-03 9.9912E-02
^
60706: 1365 -3.3556E-01 -5.7161E-03 1.7904E-01
1365 -3.3556E-01 -5.7159E-03 1.7904E-01
^
60708: 1367 -3.4907E-01 -1.3187E-03 1.3018E-01
1367 -3.4907E-01 -1.3185E-03 1.3018E-01
^
60713: 1372 -3.4396E-01 -5.0246E-03 1.5362E-01
1372 -3.4396E-01 -5.0244E-03 1.5362E-01
^
60735: 1394 -3.4182E-01 9.2686E-03 1.3116E-01
1394 -3.4182E-01 9.2688E-03 1.3116E-01
^
60890: 1549 1.5540E-01 1.1337E-05 1.7656E-01
1549 1.5540E-01 1.1352E-05 1.7656E-01
^
61333: 1992 -1.3659E-01 1.4045E-04 1.5981E-01
1992 -1.3659E-01 1.4043E-04 1.5981E-01
^
61461: 2120 -1.3232E-05 -1.1836E-01 1.1935E-02
2120 -1.3238E-05 -1.1836E-01 1.1936E-02
^
61475: 2134 -7.2402E-02 -2.0436E-04 1.0560E-01
2134 -7.2402E-02 -2.0439E-04 1.0560E-01
additional vectorized loops are:
+results.f:1160:0: note: loop vectorized
+results.f:914:0: note: loop vectorized
those are actually the same loop, once for stresses and once for temperature
and thermal flux:
if(calcul_qa) then
do m1=1,nope
do m2=1,3
qa=qa+dabs(fn(m2,konl(m1))-q(m2,m1))
enddo
enddo
nal=nal+3*nope
endif
qa, q and fn are real*8, konl is integer. nope is a parametric constant
of either 20, 8, 10, 4, 15 or 6. Relevant calls are from nonlingeo.c only.
the patch triggers vectorization because without only LIM store motion
makes this a reduction (qa is a by-reference paramater) and nothing
after that associates the additions in a way the vectorizer is happy
with to vectorize the detected reduction chain.
So it's not wrong-code but unfortunate (even more unfortunate it is in
the results gathering routine and not the actual benchmark part).