https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118818
Alexander Monakov <amonakov at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amonakov at gcc dot gnu.org --- Comment #1 from Alexander Monakov <amonakov at gcc dot gnu.org> --- -mno-recip disables this, the documentation probably needs an update: https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/x86-Options.html#index-mrecip-2 Your benchmark looks latency-limited, but use of rcpss only improves throughput (latency is increased from ~10 cycles for divps to ~16 cycles for rcpps-mul-mul-add-sub replacement sequence).