On Tue, Nov 7, 2017 at 4:44 PM, Michael Matz <m...@suse.de> wrote:
> Hi,
>
> On Tue, 7 Nov 2017, Richard Biener wrote:
>
>> > With FMA however the situation is different becuase there are rounding
>> > differences. Why we can convert multiplicatoin+add into FMA without
>> > -ffast-math at first place?
>>
>> We do with -ffp-contract=fast which is the default for C.
>
> But note that the reverse transformation can't be done without changing
> rounding behaviour detrimentally.  I.e. the patch as is should be
> conditional on a suboption of fast-math, if the user wrote an FMA he quite
> surely wants the single-rounding and the splitting would break this.  I.e.
> like Marc said, only split FMAs that were generated by the compiler.
>
> At which point it indeed seems a bit nicer to not even generate them in
> the first place.  It's basically a pattern match on the defs/uses of the
> potential FMA.  Before commiting to create the FMA the pass could just as
> well ask the backend before doing it, i.e. all at gimple time.  Would
> introduce some more arch dependencies into GIMPLE, but I don't think
> that's a problem here.

FMA detection on GIMPLE basically happens as a pre-RTL expansion transform,
so yes, that sounds possible.  Note the exact situation is a bit tricky to
detect I suppose if we don't just want to pattern-match reduction-with-FMA
but also the required "emptiness" of the loop besides that FMA instruction.
Note we also have the case of a = (-a) + b * c; which I'd expect to be
similarly slow (quite possibly that pattern will never happen in practice).
And a = a + a * c; is likely not goint to benefit from splitting (the mult has
a dependence as well).

Just some things to keep in mind...

Oh, and then there's RTL combine that _might_ end up matching a FMA anyway.
Possibly not for the reduction case in case we have a single pseudo.

Richard.

>
> Ciao,
> Michael.

Reply via email to