https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106553
Alexander Monakov <amonakov at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amonakov at gcc dot gnu.org --- Comment #1 from Alexander Monakov <amonakov at gcc dot gnu.org> --- Are you sure the testcase is correctly reduced, i.e. does it show the same performance degradation? Latency-wise the scheduler is making the correct decision here: we really want to schedule second-to-last FMA y = v_fma_f32 (y, r2, x); earlier than its predecessor r = v_fma_f32 (y, r2, z); because we need to compute y*r2 before the last FMA.