https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81494

            Bug ID: 81494
           Summary: [8 Regression] 454.calculix miscompares with -Ofast
                    after r249919
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-*

The additional reduction vectorizations cause (Intel Haswell, -Ofast
-march=native)

*** Miscompare of hyperviscoplastic.dat, see
/gcc/spec/cpu2006/benchspec/CPU2006/454.calculix/run/run_peak_ref_amd64-m64-gcc42-nn.0000/hyperviscoplastic.dat.mis
59371:     30 -2.6001E-01  2.9824E-04  4.0952E-01
           30 -2.6001E-01  2.9831E-04  4.0952E-01
                                    ^
59887:    546 -3.7471E-01 -8.3602E-03  9.9912E-02
          546 -3.7471E-01 -8.3600E-03  9.9912E-02
                                    ^
60706:   1365 -3.3556E-01 -5.7161E-03  1.7904E-01
         1365 -3.3556E-01 -5.7159E-03  1.7904E-01
                                    ^
60708:   1367 -3.4907E-01 -1.3187E-03  1.3018E-01
         1367 -3.4907E-01 -1.3185E-03  1.3018E-01
                                    ^
60713:   1372 -3.4396E-01 -5.0246E-03  1.5362E-01
         1372 -3.4396E-01 -5.0244E-03  1.5362E-01
                                    ^
60735:   1394 -3.4182E-01  9.2686E-03  1.3116E-01
         1394 -3.4182E-01  9.2688E-03  1.3116E-01
                                    ^
60890:   1549  1.5540E-01  1.1337E-05  1.7656E-01
         1549  1.5540E-01  1.1352E-05  1.7656E-01
                                    ^
61333:   1992 -1.3659E-01  1.4045E-04  1.5981E-01
         1992 -1.3659E-01  1.4043E-04  1.5981E-01
                                    ^
61461:   2120 -1.3232E-05 -1.1836E-01  1.1935E-02
         2120 -1.3238E-05 -1.1836E-01  1.1936E-02
                        ^
61475:   2134 -7.2402E-02 -2.0436E-04  1.0560E-01
         2134 -7.2402E-02 -2.0439E-04  1.0560E-01

additional vectorized loops are:

+results.f:1160:0: note: loop vectorized
+results.f:914:0: note: loop vectorized

those are actually the same loop, once for stresses and once for temperature
and thermal flux:

         if(calcul_qa) then
            do m1=1,nope
               do m2=1,3
                  qa=qa+dabs(fn(m2,konl(m1))-q(m2,m1))
               enddo
            enddo
            nal=nal+3*nope
         endif

qa, q and fn are real*8, konl is integer.  nope is a parametric constant
of either 20, 8, 10, 4, 15 or 6.  Relevant calls are from nonlingeo.c only.

the patch triggers vectorization because without only LIM store motion
makes this a reduction (qa is a by-reference paramater) and nothing
after that associates the additions in a way the vectorizer is happy
with to vectorize the detected reduction chain.

So it's not wrong-code but unfortunate (even more unfortunate it is in
the results gathering routine and not the actual benchmark part).

Reply via email to