https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116760
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://gcc.gnu.org/bugzill | |a/show_bug.cgi?id=104912 Last reconfirmed|2024-09-23 00:00:00 |2024-11-25 --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- Re-confirmed (comparing 14.2 against trunk on Zen4 with -Ofast -flto -march=native). Samples: 1M of event 'cycles:Pu', Event count (approx.): 2401109021645 Overhead Samples Command Shared Object Symbol 12.03% 230087 gamess_peak.amd gamess_peak.amd64-m64-gcc42-nn [.] twotff_ 11.79% 224014 gamess_base.amd gamess_base.amd64-m64-gcc42-nn [.] forms_ 11.66% 222528 gamess_peak.amd gamess_peak.amd64-m64-gcc42-nn [.] forms_ 8.44% 160676 gamess_peak.amd gamess_peak.amd64-m64-gcc42-nn [.] dirfck_ 8.09% 153197 gamess_base.amd gamess_base.amd64-m64-gcc42-nn [.] dirfck_ 6.27% 119537 gamess_base.amd gamess_base.amd64-m64-gcc42-nn [.] twotff_ 5.89% 111667 gamess_base.amd gamess_base.amd64-m64-gcc42-nn [.] xyzint_ 5.21% 99376 gamess_peak.amd gamess_peak.amd64-m64-gcc42-nn [.] xyzint_ 3.02% 57506 gamess_peak.amd gamess_peak.amd64-m64-gcc42-nn [.] genral_ 2.36% 44702 gamess_base.amd gamess_base.amd64-m64-gcc42-nn [.] genral_ 1.62% 30954 gamess_peak.amd gamess_peak.amd64-m64-gcc42-nn [.] zqout_ 1.56% 29806 gamess_peak.amd gamess_peak.amd64-m64-gcc42-nn [.] twoei_.constprop.2 1.53% 29092 gamess_base.amd gamess_base.amd64-m64-gcc42-nn [.] twoei_.constprop.2 1.40% 26663 gamess_base.amd gamess_base.amd64-m64-gcc42-nn [.] zqout_ so the main thing is the usual suspect, the "triangular" loop MKL=0 DO 10 MK=1,NOC DO 10 ML=1,MK MKL = MKL+1 XPQKL(MPQ,MKL) = XPQKL(MPQ,MKL) + * VAL1*(CO(MS,MK)*CO(MR,ML)+CO(MS,ML)*CO(MR,MK)) XPQKL(MRS,MKL) = XPQKL(MRS,MKL) + * VAL3*(CO(MQ,MK)*CO(MP,ML)+CO(MQ,ML)*CO(MP,MK)) 10 CONTINUE where previously I massaged costing to have the loop _not_ vectorized but that doesn't work anymore it seems.