https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83894
Bug ID: 83894
Summary: [missed optimization] __v16qu shift instruction
sequence on x86
Product: gcc
Version: 7.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Created attachment 43148
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43148&action=edit
benchmark
shifts of vector builtins with 8-bit integral element type can be optimized
better.
I.e. `v << n` can be implemented as
1. load 0x00ff00ff00ff... and 16-bit shift by n
2. xor (1) with 0xff00ff00ff00... to produce a bitmask
3. 16-bit shift v by n
4. bitwise and of (2) and (3)
I'll attach a benchmark with an intrinsics based implementation.