On Tue, Nov 7, 2017 at 4:44 PM, Michael Matz <m...@suse.de> wrote: > Hi, > > On Tue, 7 Nov 2017, Richard Biener wrote: > >> > With FMA however the situation is different becuase there are rounding >> > differences. Why we can convert multiplicatoin+add into FMA without >> > -ffast-math at first place? >> >> We do with -ffp-contract=fast which is the default for C. > > But note that the reverse transformation can't be done without changing > rounding behaviour detrimentally. I.e. the patch as is should be > conditional on a suboption of fast-math, if the user wrote an FMA he quite > surely wants the single-rounding and the splitting would break this. I.e. > like Marc said, only split FMAs that were generated by the compiler. > > At which point it indeed seems a bit nicer to not even generate them in > the first place. It's basically a pattern match on the defs/uses of the > potential FMA. Before commiting to create the FMA the pass could just as > well ask the backend before doing it, i.e. all at gimple time. Would > introduce some more arch dependencies into GIMPLE, but I don't think > that's a problem here.
FMA detection on GIMPLE basically happens as a pre-RTL expansion transform, so yes, that sounds possible. Note the exact situation is a bit tricky to detect I suppose if we don't just want to pattern-match reduction-with-FMA but also the required "emptiness" of the loop besides that FMA instruction. Note we also have the case of a = (-a) + b * c; which I'd expect to be similarly slow (quite possibly that pattern will never happen in practice). And a = a + a * c; is likely not goint to benefit from splitting (the mult has a dependence as well). Just some things to keep in mind... Oh, and then there's RTL combine that _might_ end up matching a FMA anyway. Possibly not for the reduction case in case we have a single pseudo. Richard. > > Ciao, > Michael.