https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114528
Xi Ruoyao <xry111 at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |xry111 at gcc dot gnu.org --- Comment #3 from Xi Ruoyao <xry111 at gcc dot gnu.org> --- Note that on some uarch the naive implementation may have 1 cycle latency while the "clever" implementation has 2 due to fusion. Maybe we need a tune parameter.