13 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark

rguenther at suse dot de via Gcc-bugs Mon, 30 May 2022 02:07:53 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533


--- Comment #46 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 30 May 2022, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533
> 
> --- Comment #45 from Hongtao.liu <crazylht at gmail dot com> ---
> A reduced testcase.
> 
> int a[256];
> int b[256];
> 
> void foo (void)
> {
>   int i;
>   for (i = 0; i < 256; ++i)
>     {
>       int tmp = a[i] + 12345;
>       tmp *= 914237;
>       tmp += 12332;
>       tmp *= 914237;
>       tmp += 12332;
>       tmp *= 914237;
>       tmp -= 13;
>       tmp *= 8000;
>       b[i] = tmp;
>     }
> }
> 
> GCC now simply pmulld to pslld + padd + psub, the vectorizer cost model looks
> fine,  but for scalar version, it's extraly optimized in pass_combine from 4 *
> mult + 3 * add to 1 * mult + 2 * add which is not taken in count by 
> vectorizer.
> The vectorized version is not simplified later.
> 
>         mov     eax, DWORD PTR a[rdx]
>         add     rdx, 4
>         add     eax, 12345
>         imul    eax, eax, -1564285888
>         sub     eax, 333519936
>         mov     DWORD PTR b[rdx-4], eax
>         cmp     rdx, 1024
>         jne     .L2
> 
> 
> I'm wondering could Gimple also simplify 
> 
>       tmp *= 914237;
>       tmp += 12332;
>       tmp *= 914237;
>       tmp += 12332;
>       tmp *= 914237;
>       tmp -= 13;
>       tmp *= 8000;
> 
> to 
>      tmp *= -1564285888;
>      tmp -= 333519936;
> 
> refer to https://godbolt.org/z/qYMYMTxEY
> 
> Then the vectorized code would be more optimal.

The issue is that the re-association pass doesn't handle operations
with undefined overflow behavior, we do have duplicate bugreports
for this.

On the RTL level likely simplify-rtx (or the variants used by combine)
only have limited support for vector operations.

[Bug rtl-optimization/53533] [10/11/12/13 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark

Reply via email to