https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62286

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ktkachov at gcc dot gnu.org,
                   |                            |terry.guo at arm dot com

--- Comment #2 from ktkachov at gcc dot gnu.org ---
(In reply to Ramana Radhakrishnan from comment #1)
> Because the Cortex-M3 doesn't have those instructions ? It's a testism
> probably fixed by an appropriate dg-options values.

It's not a testism, it's a costs issue.
The FP instructions are dictated by the -mfpu option that is given (-mfpu=vfp
is hardcoded in the dg-options here) and in any case Cortex-M3 should support
the vmla instructions as far as I know.
The RTX costs during combine reject the combination of

         vnmul.f32       s15, s14, s15
         vsub.f32        s15, s15, s13

into 
         vnmla.f32       s15, s13, s14

for example.
In particular I think it's the mult_addsub cost. A relevant combine log part
is:
Trying 57 -> 58:
Successfully matched this instruction:
(set (reg:SF 134 [ D.4322 ])
    (plus:SF (mult:SF (reg:SF 130 [ D.4322 ])
            (reg:SF 131 [ D.4322 ]))
        (reg:SF 133 [ D.4322 ])))
(plus:SF (mult:SF (reg:SF 130 [ D.4322 ])
        (reg:SF 131 [ D.4322 ]))
    (reg:SF 133 [ D.4322 ]))

Hot cost: 24 (final)
rejecting combination of insns 57 and 58
original costs 12 + 8 = 20
replacement cost 24

Is it actually beneficial for Cortex-M3 to split this up?

Reply via email to