https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533
--- Comment #45 from Hongtao.liu <crazylht at gmail dot com> --- A reduced testcase. int a[256]; int b[256]; void foo (void) { int i; for (i = 0; i < 256; ++i) { int tmp = a[i] + 12345; tmp *= 914237; tmp += 12332; tmp *= 914237; tmp += 12332; tmp *= 914237; tmp -= 13; tmp *= 8000; b[i] = tmp; } } GCC now simply pmulld to pslld + padd + psub, the vectorizer cost model looks fine, but for scalar version, it's extraly optimized in pass_combine from 4 * mult + 3 * add to 1 * mult + 2 * add which is not taken in count by vectorizer. The vectorized version is not simplified later. mov eax, DWORD PTR a[rdx] add rdx, 4 add eax, 12345 imul eax, eax, -1564285888 sub eax, 333519936 mov DWORD PTR b[rdx-4], eax cmp rdx, 1024 jne .L2 I'm wondering could Gimple also simplify tmp *= 914237; tmp += 12332; tmp *= 914237; tmp += 12332; tmp *= 914237; tmp -= 13; tmp *= 8000; to tmp *= -1564285888; tmp -= 333519936; refer to https://godbolt.org/z/qYMYMTxEY Then the vectorized code would be more optimal.