https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103905
--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Martin Liška from comment #1)
> Created attachment 52120 [details]
> Isolated test-case
>
> Isolated test-case where only the miscompiled function
> ix86_expand_vec_extract_even_odd uses -O3.
>
> @Uros: Can you please compare -fdump-tree-optimized before and after the
> revision?
When compiled with -O2 -march=bdver1, there are indeed a bunch of suspicious
XOP vpperm instructions in the function:
vmovd 12(%rsp), %xmm6 # 303 [c=9 l=6] *movsi_internal/10
vpperm %xmm3, %xmm0, %xmm1, %xmm0 # 124 [c=4 l=5] mmx_ppermv64
vpaddd %xmm4, %xmm1, %xmm1 # 129 [c=8 l=4] *mmx_addv2si3/2
vpperm %xmm3, %xmm1, %xmm2, %xmm1 # 131 [c=4 l=5] mmx_ppermv64
vpperm .LC165(%rip), %xmm1, %xmm0, %xmm0 # 134 [c=13 l=9]
mmx_ppermv64
vpaddb %xmm0, %xmm0, %xmm0 # 137 [c=8 l=4] *mmx_addv8qi3/2
vpshuflw $0, %xmm6, %xmm1 # 140 [c=8 l=5]
*vec_dupv4hi/1
vpaddb %xmm1, %xmm0, %xmm0 # 142 [c=8 l=4] *mmx_addv8qi3/2
vmovq %xmm0, 32(%rsp,%rdi) # 143 [c=4 l=6] *movv8qi_internal/14
je .L4198 # 150 [c=12 l=2] *jcc
I was not able to test them on my target, so I bet these are the problem.