https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62286
ktkachov at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ktkachov at gcc dot gnu.org, | |terry.guo at arm dot com --- Comment #2 from ktkachov at gcc dot gnu.org --- (In reply to Ramana Radhakrishnan from comment #1) > Because the Cortex-M3 doesn't have those instructions ? It's a testism > probably fixed by an appropriate dg-options values. It's not a testism, it's a costs issue. The FP instructions are dictated by the -mfpu option that is given (-mfpu=vfp is hardcoded in the dg-options here) and in any case Cortex-M3 should support the vmla instructions as far as I know. The RTX costs during combine reject the combination of vnmul.f32 s15, s14, s15 vsub.f32 s15, s15, s13 into vnmla.f32 s15, s13, s14 for example. In particular I think it's the mult_addsub cost. A relevant combine log part is: Trying 57 -> 58: Successfully matched this instruction: (set (reg:SF 134 [ D.4322 ]) (plus:SF (mult:SF (reg:SF 130 [ D.4322 ]) (reg:SF 131 [ D.4322 ])) (reg:SF 133 [ D.4322 ]))) (plus:SF (mult:SF (reg:SF 130 [ D.4322 ]) (reg:SF 131 [ D.4322 ])) (reg:SF 133 [ D.4322 ])) Hot cost: 24 (final) rejecting combination of insns 57 and 58 original costs 12 + 8 = 20 replacement cost 24 Is it actually beneficial for Cortex-M3 to split this up?