https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104122
Bug ID: 104122 Summary: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: hubicka at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-linux Target: x86_64-linux On Zen3 based CPUs, benchmark 510.parest_r from the SPEC 2017 FPrate is faster with -march=generic than with -march=native. LNT reports 11% regression: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=463.457.0&plot.1=471.457.0& However, my own measurements on a different but similar EPYC machine suggest it can be as high as 26%. On a yet another Ryzen machine I can see almost 10% too. I only have older-than-LNT data from the Ryzen machine and we did not see the regression when gcc 11 was released. However it seems that the generic tuning improved while the native one did not. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)