We'd like to add vec_duplicate_optab to x86 backend.  There are 3 ways
to broadcast an integer constant:

1. Load the full size from constant pool directly.
2. Use AVX2/AVX512 broadcast instruction.
3. Emulate broadcast with SSE2 unpack and shuffle instructions.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast

shows that broadcast is a little bit faster on Intel Core i7-8559U:

$ make
gcc -g -I. -O2   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
./test
memory      : 147215
broadcast   : 121213
vec_dup_sse2: 171366
$

broadcast is also smaller:

$ size memory.o broadcast.o
   text    data     bss     dec     hex filename
    132       0       0     132      84 memory.o
    122       0       0     122      7a broadcast.o
$

The preferred choices are

1. Use AVX2/AVX512 broadcast instruction.
2. Load the full size from constant pool directly.
3. Emulate broadcast with SSE2 unpack and shuffle instructions.

The first patch updates vec_duplicate_optab usage to allow it to fail so
that x86 backend can opt out SSE2 broadcast emulation from an integer
constant.

The second patch adds vec_duplicate<mode> expander and updates move
expanders to convert the CONST_WIDE_INT and CONST_VECTO operands to
vector broadcast from an integer with AVX2.

H.J. Lu (2):
  Allow vec_duplicate_optab to fail
  x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

 gcc/config/i386/i386-expand.c                 | 216 +++++++++++++++++-
 gcc/config/i386/i386-protos.h                 |   3 +
 gcc/config/i386/i386.c                        |  31 +++
 gcc/config/i386/sse.md                        |  19 ++
 gcc/doc/md.texi                               |   2 -
 gcc/expr.c                                    |  10 +-
 .../i386/avx512f-broadcast-pr87767-1.c        |   7 +-
 .../i386/avx512f-broadcast-pr87767-5.c        |   5 +-
 .../gcc.target/i386/avx512f_cond_move.c       |   4 +-
 .../i386/avx512vl-broadcast-pr87767-1.c       |  12 +-
 .../i386/avx512vl-broadcast-pr87767-5.c       |   9 +-
 gcc/testsuite/gcc.target/i386/pr100865-1.c    |  13 ++
 gcc/testsuite/gcc.target/i386/pr100865-10a.c  |  33 +++
 gcc/testsuite/gcc.target/i386/pr100865-10b.c  |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-2.c    |  14 ++
 gcc/testsuite/gcc.target/i386/pr100865-3.c    |  15 ++
 gcc/testsuite/gcc.target/i386/pr100865-4a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-4b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-5a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-5b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-6a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-6b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-7a.c   |  17 ++
 gcc/testsuite/gcc.target/i386/pr100865-7b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-8a.c   |  24 ++
 gcc/testsuite/gcc.target/i386/pr100865-8b.c   |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-9a.c   |  25 ++
 gcc/testsuite/gcc.target/i386/pr100865-9b.c   |   7 +
 28 files changed, 534 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-8a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-8b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-9a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-9b.c

-- 
2.31.1

Reply via email to