https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119960
Bug ID: 119960 Summary: Regression of code generation Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: arseny.kapoulkine at gmail dot com Target Milestone: --- Created attachment 61208 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61208&action=edit perf assembly Starting from gcc15, index decoder benchmark in https://github.com/zeux/meshoptimizer sees a substantial regression with O2/O3: Using Zen 4 (7950X) CPU: gcc14 O2: 4.80 GB/s gcc14 O3: 7.10 GB/s gcc15 O2: 5.40 GB/s gcc15 O3: 4.50 GB/s clang20 O2: 6.10 GB/s clang20 O3: 6.10 GB/s To reproduce this, run: make config=release codecbench && ./codecbench -l .. after cloning the project. You can also use `CXX=` variable to override the compiler. The function that regressed is meshopt_decodeIndexBuffer in src/indexcodec.cpp. I've bisected the regression to: commit 5ab3f091b3eb42795340d3c9cea8aaec2060693c (HEAD) Author: Richard Biener <rguent...@suse.de> Date: Mon Dec 2 11:07:46 2024 +0100 tree-optimization/116352 - SLP scheduling and stmt order I've attached the hot loop as ran under perf, when using gcc15 before the referenced commit and at the referenced commit.