> Hmm, reassoc isn't supposed to apply width > 1 before vectorization; > IIRC I "fixed" that at some point, but this is what I see: > > # sum_126 = PHI <sum_145(5), 0(2)> > ... > _23 = *pix_134; > _24 = (unsigned int) _23; > _31 = MEM[(uint8_t *)pix_134 + 1B]; > _32 = (unsigned int) _31; > _118 = _24 + _32; > sum_33 = _118 + sum_126; > ... > > I think the place you add the condition to on swap_ops_for_binary_stmt > simply lacks a !reassoc_insert_powi_p check like we have on the earlier > rewrite_expr_tree_parallel.
An earlier version of my patch had a less strict check, i.e. wouldn't allow the swap for more sequences. This resulted in two regressions on x86 where we expect this exact swap to happend in order for the bswap pass to recognize a bswap idiom. It was something like bleh = ... l1 = MEM... << 8 l2 = MEM... >> 8 bla1 = l1 & ... bla2 = l2 & ... res1 = bla1 | bleh res2 = bla2 | res1 reassoc would reorder res1, bla1, and bla2 so that bswap sees bla1 | bla2 and recognizes this as bswap. When not allowing the swap before vectorization this will fail. Or did I miss the point?