https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108
Tamar Christina <tnfchris at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2025-03-05 Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Tamar Christina <tnfchris at gcc dot gnu.org> --- Confirmed. The only early break vectorization is in the reporting harness in benchmark::CSVReporter::ReportRuns(std::vector<benchmark::BenchmarkReporter::Run, std::allocator<benchmark::BenchmarkReporter::Run> > const&) But.. I can reproduce the slowdown. Take eg BM_UFlat, this is all scalar code. The hot function is snappy::DecompressBranchless<char*>, but for some reason after the PFA patch memmove is no longer inlined. This causes the slowdown as snappy does small memmove often. Will take a look.