http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53949
--- Comment #9 from Oleg Endo <olegendo at gcc dot gnu.org> 2013-05-04 13:39:10 UTC --- (In reply to comment #3) > - Loops with multiple running sums like > for (int i = 0; i < 16; ++i) > { > sum0 += (int64_t)(*a++) * (int64_t)(*b++); > sum1 += (int64_t)(*c++) * (int64_t)(*d++); > } > > result in macl:mach swapping to general reg pairs between subsequent > mac.w instructions. Ideally such loops should be split into multiple > loops like in the previous example. This is basically what -ftree-loop-distribution does. The question would be how to re-use it for this particular case.