https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
Bug ID: 110062 Summary: missed vectorization in graphicsmagick Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- Phoronix claims 31% performance difference between gcc13 and clang on sharpen benchmark of graphicsmagick. On zen3 I reproduce only 4%, but the benchmark has only single short internal loop: 214 97.56% gm gm [.] ConvolveImage.◆ 0.88% gm libgomp.so.1.0.0 [.] 0x000000000002▒ 0.67% gm libc.so.6 [.] __memmove_avx_▒ GCC version: 2.38 │500:┌─→vmovss (%r8,%rax,4),%xmm2 ▒ 0.04 │ │ movzbl 0x2(%rdx,%rax,4),%ebp ▒ 0.09 │ │ vcvtsi2ss %ebp,%xmm0,%xmm1 ▒ 7.44 │ │ movzbl 0x1(%rdx,%rax,4),%ebp ▒ 0.16 │ │ vfmadd231ss %xmm1,%xmm2,%xmm7 ▒ 30.23 │ │ vcvtsi2ss %ebp,%xmm0,%xmm1 ▒ 2.38 │ │ movzbl (%rdx,%rax,4),%ebp ▒ 0.03 │ │ inc %rax ▒ 0.00 │ │ vfmadd231ss %xmm1,%xmm2,%xmm9 ▒ 22.80 │ │ vcvtsi2ss %ebp,%xmm0,%xmm1 ▒ 1.03 │ │ vfmadd231ss %xmm1,%xmm2,%xmm10 ▒ 30.49 │ ├──cmp %rax,%rbx ▒ 0.18 │ └──jne 500 ▒ Clangs: 0.00 │1e70:┌─→movzbl 0x2(%rdx,%rsi,4),%r9d ▒ 0.05 │ │ vbroadcastss (%rcx,%rsi,4),%xmm3 ▒ 0.56 │ │ movzwl (%rdx,%rsi,4),%r11d ▒ 0.05 │ │ inc %rsi ▒ 0.00 │ │ vcvtsi2ss %r9d,%xmm10,%xmm2 ▒ 0.71 │ │ vfmadd231ss %xmm2,%xmm3,%xmm0 ▒ 1.17 │ │ vmovd %r11d,%xmm2 ▒ 0.00 │ │ vpmovzxbd %xmm2,%xmm2 ▒ 0.06 │ │ vcvtdq2ps %xmm2,%xmm2 ▒ 0.89 │ │ vfmadd231ps %xmm2,%xmm3,%xmm1 ▒ 1.98 │ ├──cmp %rsi,%r10 ▒ 0.00 │ └──jne 1e70 ▒ 0.00 │ ↑ jmp 1630 ▒ Probably same issue as in PR109812 but reproduces on zens and loop is even shorter.