https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103850
--- Comment #6 from Martin Reinecke ---
I would have expected that this does not make a significant difference,
assuming that speculative execution works and the branch predictor takes the
jump backwards at the loop's end. In that picture both v
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103850
--- Comment #5 from Richard Biener ---
Note the issue can be reproduced without -ffast-math as well where the
functions are nearly identical so I fear you are running into some
micro-architectural hazard. Maybe
.L3:
vmovapd %ymm2, %ym
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103850
Richard Biener changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Ever confirmed|0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103850
--- Comment #3 from Martin Reinecke ---
Just for completeness, this is the CPU I'm running on:
vendor_id : AuthenticAMD
cpu family : 23
model : 96
model name : AMD Ryzen 7 4800H with Radeon Graphics
stepping: 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103850
--- Comment #2 from Martin Reinecke ---
Thanks! This flag indeed causes both kernels to have the same speed, but (at
least for me) it's slower than both original versions...
slow kernel version: 29.027915 GFlops/s
fast kernel version: 29.008313
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103850
Andrew Pinski changed:
What|Removed |Added
Severity|normal |enhancement
--- Comment #1 from Andrew