https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95894
Bug ID: 95894 Summary: vector shift by lane zero generates inter unit move Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- typedef int v4si __attribute__((vector_size(16))); v4si foo (v4si x) { return x << x[0]; } generates foo: .LFB0: .cfi_startproc movd %xmm0, %eax cltq movq %rax, %xmm1 pslld %xmm1, %xmm0 ret while we could use sth like pxor %xmm1, %xmm1 punpckldq %xmm0, %xmm1 pslld %xmm1, %xmm0 to zero-extend x[0] to DImode in a SSE reg. Unfortunately even typedef long v2di __attribute__((vector_size(16))); v2di bar (v2di x) { return x << x[0]; } shows this behavior useless behavior: bar: .LFB1: .cfi_startproc movq %xmm0, %rax cltq movq %rax, %xmm1 psllq %xmm1, %xmm0 ret because the gimplifier casts the shift amount to unsigned int :(