https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- OK, so I have a patch to keep the association linear which IMHO is good. It fixes the smaller and my testcase but not the original one which now is linear but still not homogenous. The store groups are as follows *_115 = (((((*_115) - (f00_re_68 * x0_re_108)) - (f10_re_70 * x1_re_140)) - (f00_im_73 * x0_im_114)) - (f10_im_74 * x1_im_142)); *_117 = (((((*_117) + (f00_im_73 * x0_re_108)) + (f10_im_74 * x1_re_140)) - (f00_re_68 * x0_im_114)) - (f10_re_70 * x1_im_142)); *_119 = (((((*_119) - (f01_re_71 * x0_re_108)) - (f11_re_72 * x1_re_140)) - (f01_im_75 * x0_im_114)) - (f11_im_76 * x1_im_142)); *_121 = (((((*_121) + (f01_im_75 * x0_re_108)) + (f11_im_76 * x1_re_140)) - (f01_re_71 * x0_im_114)) - (f11_re_72 * x1_im_142)); (good) *_177 = (((((*_177) - (f00_re_68 * x0_re_170)) - (f00_im_73 * x0_im_176)) - (f10_re_70 * x1_re_202)) - (f10_im_74 * x1_im_204)); *_179 = (((((f00_im_73 * x0_re_170) + (f10_im_74 * x1_re_202)) + (*_179)) - (f00_re_68 * x0_im_176)) - (f10_re_70 * x1_im_204)); *_181 = (((((*_181) - (f01_re_71 * x0_re_170)) - (f01_im_75 * x0_im_176)) - (f11_re_72 * x1_re_202)) - (f11_im_76 * x1_im_204)); *_183 = (((((f01_im_75 * x0_re_170) + (f11_im_76 * x1_re_202)) + (*_183)) - (f01_re_71 * x0_im_176)) - (f11_re_72 * x1_im_204)); already bad. Now, this is sth to tackle in the vectorizer which ideally should not try to match up individual adds during SLP discoverly but instead (if association is allowed) the whole addition chain, commutating within the whole change rather than just swapping individual add operands. I still think the reassoc change I came up with is good since it avoids the need to linearlize in the vectorizer. So testing that now.