https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83894
Bug ID: 83894 Summary: [missed optimization] __v16qu shift instruction sequence on x86 Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Created attachment 43148 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43148&action=edit benchmark shifts of vector builtins with 8-bit integral element type can be optimized better. I.e. `v << n` can be implemented as 1. load 0x00ff00ff00ff... and 16-bit shift by n 2. xor (1) with 0xff00ff00ff00... to produce a bitmask 3. 16-bit shift v by n 4. bitwise and of (2) and (3) I'll attach a benchmark with an intrinsics based implementation.