https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838
Bug ID: 91838
Summary: incorrect use of shr and shrx to shift by 64, missed
optimization of vector shift
Product: gcc
Version: 9.2.0
Status: UNCONFIRMED
Keywords: missed-optimization, wrong-code
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*
Test case:
using T = unsigned char; // or ushort, or uint
using V [[gnu::vector_size(8)]] = T;
V f(V x) { return x >> 8 * sizeof(T); }
GCC 10 compiles to either xor or shift (which should better be xor, as well)
GCC 9.2 compiles to:
vmovq rax, xmm0
mov ecx, 64
shr rax, cl
sal rax, (64 - 8*sizeof(T))
vmovq xmm0, rax
The `shr rax, cl`, where cl == 64 is a nop, because shr (and shrx, which is
used when BMI2 is enabled) mask the count with 0x3f. Consequently the last
element of the input vector is unchanged in the output.
In any case, the use of shr/shrx with shifts > 64 (or 32 in case of the 32-bit
variant) should not occur.