https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119960

            Bug ID: 119960
           Summary: Regression of code generation
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: arseny.kapoulkine at gmail dot com
  Target Milestone: ---

Created attachment 61208
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61208&action=edit
perf assembly

Starting from gcc15, index decoder benchmark in
https://github.com/zeux/meshoptimizer sees a substantial regression with O2/O3:

Using Zen 4 (7950X) CPU:

gcc14 O2: 4.80 GB/s
gcc14 O3: 7.10 GB/s

gcc15 O2: 5.40 GB/s
gcc15 O3: 4.50 GB/s

clang20 O2: 6.10 GB/s
clang20 O3: 6.10 GB/s

To reproduce this, run:

   make config=release codecbench && ./codecbench -l

.. after cloning the project. You can also use `CXX=` variable to override the
compiler.

The function that regressed is meshopt_decodeIndexBuffer in src/indexcodec.cpp.

I've bisected the regression to:

commit 5ab3f091b3eb42795340d3c9cea8aaec2060693c (HEAD)
Author: Richard Biener <rguent...@suse.de>
Date:   Mon Dec 2 11:07:46 2024 +0100

    tree-optimization/116352 - SLP scheduling and stmt order

I've attached the hot loop as ran under perf, when using gcc15 before the
referenced commit and at the referenced commit.

Reply via email to