https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533
--- Comment #46 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 30 May 2022, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533 > > --- Comment #45 from Hongtao.liu <crazylht at gmail dot com> --- > A reduced testcase. > > int a[256]; > int b[256]; > > void foo (void) > { > int i; > for (i = 0; i < 256; ++i) > { > int tmp = a[i] + 12345; > tmp *= 914237; > tmp += 12332; > tmp *= 914237; > tmp += 12332; > tmp *= 914237; > tmp -= 13; > tmp *= 8000; > b[i] = tmp; > } > } > > GCC now simply pmulld to pslld + padd + psub, the vectorizer cost model looks > fine, but for scalar version, it's extraly optimized in pass_combine from 4 * > mult + 3 * add to 1 * mult + 2 * add which is not taken in count by > vectorizer. > The vectorized version is not simplified later. > > mov eax, DWORD PTR a[rdx] > add rdx, 4 > add eax, 12345 > imul eax, eax, -1564285888 > sub eax, 333519936 > mov DWORD PTR b[rdx-4], eax > cmp rdx, 1024 > jne .L2 > > > I'm wondering could Gimple also simplify > > tmp *= 914237; > tmp += 12332; > tmp *= 914237; > tmp += 12332; > tmp *= 914237; > tmp -= 13; > tmp *= 8000; > > to > tmp *= -1564285888; > tmp -= 333519936; > > refer to https://godbolt.org/z/qYMYMTxEY > > Then the vectorized code would be more optimal. The issue is that the re-association pass doesn't handle operations with undefined overflow behavior, we do have duplicate bugreports for this. On the RTL level likely simplify-rtx (or the variants used by combine) only have limited support for vector operations.