[Bug tree-optimization/110062] missed vectorization in graphicsmagick

rguenth at gcc dot gnu.org via Gcc-bugs Sun, 26 Nov 2023 23:29:49 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062


--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #11)
> trunk -O3 -flto -march=native -fopenmp
>     Operation: Sharpen:
>         257
>         256
>         256
> 
>     Average: 256 Iterations Per Minute
> GCC13 -O3 -flto -march=native -fopenmp
>         257
>         256
>         256
> 
>     Average: 256 Iterations Per Minute
> clang17 O3 -flto -march=native -fopenmp
>    Operation: Sharpen:
>         257
>         256
>         256
>     Average: 256 Iterations Per Minute
> 
> So I guess I will need to try on zen3 to see if there is any difference.
> 
> the internal loop is:
>   0.00 │460:┌─→movzbl      0x2(%rdx,%rax,4),%esi ▒
>   0.02 │    │  vmovss      (%r8,%rax,4),%xmm2    ▒
>   0.95 │    │  vcvtsi2ss   %esi,%xmm0,%xmm1      ▒
>  20.22 │    │  movzbl      0x1(%rdx,%rax,4),%esi ▒
>   0.01 │    │  vfmadd231ss %xmm1,%xmm2,%xmm3     ▒
>  11.97 │    │  vcvtsi2ss   %esi,%xmm0,%xmm1      ▒
>  18.76 │    │  movzbl      (%rdx,%rax,4),%esi    ▒
>   0.00 │    │  inc         %rax                  ▒
>   0.72 │    │  vfmadd231ss %xmm1,%xmm2,%xmm4     ▒
>  12.55 │    │  vcvtsi2ss   %esi,%xmm0,%xmm1      ▒
>  14.95 │    │  vfmadd231ss %xmm1,%xmm2,%xmm5     ▒
>  15.93 │    ├──cmp         %rax,%r13             ▒
>   0.35 │    └──jne         460                                              
> 
> 
> so it still does not get....

As said the VF is going to be prohibitively large, likely the vector code
is never entered and the above is the scalar "epilogue".

[Bug tree-optimization/110062] missed vectorization in graphicsmagick

Reply via email to