https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116760

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=104912
   Last reconfirmed|2024-09-23 00:00:00         |2024-11-25

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Re-confirmed (comparing 14.2 against trunk on Zen4 with -Ofast -flto
-march=native).

Samples: 1M of event 'cycles:Pu', Event count (approx.): 2401109021645          
Overhead       Samples  Command          Shared Object                   Symbol 
  12.03%        230087  gamess_peak.amd  gamess_peak.amd64-m64-gcc42-nn  [.]
twotff_
  11.79%        224014  gamess_base.amd  gamess_base.amd64-m64-gcc42-nn  [.]
forms_
  11.66%        222528  gamess_peak.amd  gamess_peak.amd64-m64-gcc42-nn  [.]
forms_
   8.44%        160676  gamess_peak.amd  gamess_peak.amd64-m64-gcc42-nn  [.]
dirfck_
   8.09%        153197  gamess_base.amd  gamess_base.amd64-m64-gcc42-nn  [.]
dirfck_
   6.27%        119537  gamess_base.amd  gamess_base.amd64-m64-gcc42-nn  [.]
twotff_
   5.89%        111667  gamess_base.amd  gamess_base.amd64-m64-gcc42-nn  [.]
xyzint_
   5.21%         99376  gamess_peak.amd  gamess_peak.amd64-m64-gcc42-nn  [.]
xyzint_
   3.02%         57506  gamess_peak.amd  gamess_peak.amd64-m64-gcc42-nn  [.]
genral_
   2.36%         44702  gamess_base.amd  gamess_base.amd64-m64-gcc42-nn  [.]
genral_
   1.62%         30954  gamess_peak.amd  gamess_peak.amd64-m64-gcc42-nn  [.]
zqout_
   1.56%         29806  gamess_peak.amd  gamess_peak.amd64-m64-gcc42-nn  [.]
twoei_.constprop.2
   1.53%         29092  gamess_base.amd  gamess_base.amd64-m64-gcc42-nn  [.]
twoei_.constprop.2
   1.40%         26663  gamess_base.amd  gamess_base.amd64-m64-gcc42-nn  [.]
zqout_

so the main thing is the usual suspect, the "triangular" loop

            MKL=0
            DO 10 MK=1,NOC
            DO 10 ML=1,MK
               MKL = MKL+1
               XPQKL(MPQ,MKL) = XPQKL(MPQ,MKL) +
     *               VAL1*(CO(MS,MK)*CO(MR,ML)+CO(MS,ML)*CO(MR,MK))
               XPQKL(MRS,MKL) = XPQKL(MRS,MKL) +
     *               VAL3*(CO(MQ,MK)*CO(MP,ML)+CO(MQ,ML)*CO(MP,MK))
   10       CONTINUE     

where previously I massaged costing to have the loop _not_ vectorized
but that doesn't work anymore it seems.

Reply via email to