https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2025-03-05
           Assignee|unassigned at gcc dot gnu.org      |tnfchris at gcc dot 
gnu.org
     Ever confirmed|0                           |1

--- Comment #2 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Confirmed.

The only early break vectorization is in the reporting harness in
benchmark::CSVReporter::ReportRuns(std::vector<benchmark::BenchmarkReporter::Run,
std::allocator<benchmark::BenchmarkReporter::Run> > const&)

But.. I can reproduce the slowdown.

Take eg BM_UFlat, this is all scalar code.

The hot function is snappy::DecompressBranchless<char*>,
but for some reason after the PFA patch memmove is no longer inlined.

This causes the slowdown as snappy does small memmove often.

Will take a look.

Reply via email to