https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121925
--- Comment #4 from Tamar Christina <tnfchris at gcc dot gnu.org> --- (In reply to Richard Biener from comment #3) > (In reply to Tamar Christina from comment #0) > > Given the following vectors > > > > a = [A1 A0] > > b = [C D ] > > b = [C B] I suppose? yeah, I double checked the thing and still made a typo :( > > > c = [E D ] > > [..] > > > rot0 = [E + A0 * C, D + A0 * B] > > rot90 = [E + A1 * B, D - A1 * C] > > rot180 = [E - A0 * C, D - A0 * B] > > rot270 = [E + A1 * B, D - A1 * C] > > so that's all c + mul-with-rot (a, b), I guess fmrot0a fmrot90a fmrot180a > fmrot270a? > > That is, do the instructions also avoid the extra rounding for the add? Yeah, they're fused operation, so need restricting to fp-contraction. Essentially after the operands reshuffling they're treated as a normal FMA. So the accumulator needs to be in the operation.
