Simply memcpy and memset inline strategies to avoid branches:
1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
load and store for up to 16 * 16 (256) bytes when the data size is
fixed and known.
2. Inline only if data size is known to be <= 256.
a. Use "rep movsb/stosb" with simple code sequence if the data size
is a constant.
b. Use loop if data size is not a constant.
3. Use memcpy/memset libray function if data size is unknown or > 256.
There are no significant performance impacts on SPEC CPU 2017. There
are visible performance improvements on eembc benchmarks with one
regression.
H.J. Lu (3):
x86: Update memcpy/memset inline strategies for Ice Lake
x86: Update memcpy/memset inline strategies for Skylake family CPUs
x86: Update memcpy/memset inline strategies for -mtune=generic
gcc/config/i386/i386-expand.c | 11 +-
gcc/config/i386/i386-options.c | 12 +-
gcc/config/i386/i386.h | 2 +
gcc/config/i386/x86-tune-costs.h | 185 ++++++++++++++++--
gcc/config/i386/x86-tune.def | 6 +
.../gcc.target/i386/memcpy-strategy-10.c | 11 ++
.../gcc.target/i386/memcpy-strategy-11.c | 18 ++
.../gcc.target/i386/memcpy-strategy-12.c | 9 +
.../gcc.target/i386/memcpy-strategy-13.c | 11 ++
.../gcc.target/i386/memcpy-strategy-5.c | 11 ++
.../gcc.target/i386/memcpy-strategy-6.c | 18 ++
.../gcc.target/i386/memcpy-strategy-7.c | 9 +
.../gcc.target/i386/memcpy-strategy-8.c | 18 ++
.../gcc.target/i386/memcpy-strategy-9.c | 9 +
.../gcc.target/i386/memset-strategy-10.c | 11 ++
.../gcc.target/i386/memset-strategy-11.c | 9 +
.../gcc.target/i386/memset-strategy-3.c | 17 ++
.../gcc.target/i386/memset-strategy-4.c | 17 ++
.../gcc.target/i386/memset-strategy-5.c | 11 ++
.../gcc.target/i386/memset-strategy-6.c | 9 +
.../gcc.target/i386/memset-strategy-7.c | 11 ++
.../gcc.target/i386/memset-strategy-8.c | 9 +
.../gcc.target/i386/memset-strategy-9.c | 17 ++
gcc/testsuite/gcc.target/i386/shrink_wrap_1.c | 2 +-
gcc/testsuite/gcc.target/i386/sw-1.c | 2 +-
25 files changed, 413 insertions(+), 32 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-10.c
create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-11.c
create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-12.c
create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-13.c
create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-5.c
create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-6.c
create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-7.c
create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-8.c
create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-9.c
create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-10.c
create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-11.c
create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-3.c
create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-4.c
create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-5.c
create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-6.c
create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-7.c
create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-8.c
create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-9.c
--
2.30.2