https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83894

            Bug ID: 83894
           Summary: [missed optimization] __v16qu shift instruction
                    sequence on x86
           Product: gcc
           Version: 7.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kretz at kde dot org
  Target Milestone: ---

Created attachment 43148
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43148&action=edit
benchmark

shifts of vector builtins with 8-bit integral element type can be optimized
better.

I.e. `v << n` can be implemented as

1. load 0x00ff00ff00ff... and 16-bit shift by n
2. xor (1) with 0xff00ff00ff00... to produce a bitmask
3. 16-bit shift v by n
4. bitwise and of (2) and (3)

I'll attach a benchmark with an intrinsics based implementation.

Reply via email to