[Bug target/82370] AVX512 can use a memory operand for immediate-count vpsrlw, but gcc doesn't.

peter at cordes dot ca Wed, 04 Oct 2017 01:34:57 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82370


--- Comment #4 from Peter Cordes <peter at cordes dot ca> ---
VPANDQ can be shorter than an equivalent VPAND, for displacements > 127 but <=
16 * 127 or 32 * 127, and that are an exact multiple of the vector width.  EVEX
with disp8 always implies a compressed displacement.  (See Intel manual vol.2
2.6.5
Compressed Displacement (disp8*N) Support in EVEX).


# worst case for EVEX: odd displacement forcing a disp32 while VEX can use
disp8
  c5 f9 db 4e 01                  vpand  0x1(%rsi),%xmm0,%xmm1
  62 f1 fd 08 db 8e 01 00 00 00   vpandq 0x1(%rsi),%xmm0,%xmm1

# Best case for EVEX, where it wins by byte
# (or two vs. a 3-byte VEX + disp32, e.g. if I'd used %r10)
  c5 09 db be 00 02 00 00         vpand  0x200(%rsi),%xmm14,%xmm15
  62 71 8d 08 db 7e 20            vpandq 0x200(%rsi),%xmm14,%xmm15

# But the tables turn with an odd offset, where EVEX has to use disp32
  c5 09 db be ff 01 00 00         vpand  0x1ff(%rsi),%xmm14,%xmm15
  62 71 8d 08 db be ff 01 00 00   vpandq 0x1ff(%rsi),%xmm14,%xmm15

[Bug target/82370] AVX512 can use a memory operand for immediate-count vpsrlw, but gcc doesn't.

Reply via email to