[Bug libstdc++/116140] [15 Regression] 5-35% slowdown of 483.xalancbmk and 523.xalancbmk_r since r15-2356-ge69456ff9a54ba

tnfchris at gcc dot gnu.org via Gcc-bugs Thu, 01 Aug 2024 07:35:22 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140


--- Comment #4 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
It looks like it's because the old unrolled code for the pointer version did a
subtract and used the difference to optimize the IV check away to every 4
elements.  This explains the increase in instruction count.

I hadn't noticed it during benchmarking because on aarch64 the non-pointer
version got recovered with cbz.

This should be fixable while still being vectorizable with

#pragma GCC unroll 4

on the loop.  The generated code looks good, but it looks like the pragma is
being
dropped when used in the template.

I'm away for a few days so Alex is looking into it.

[Bug libstdc++/116140] [15 Regression] 5-35% slowdown of 483.xalancbmk and 523.xalancbmk_r since r15-2356-ge69456ff9a54ba

Reply via email to