On Sun, Sep 14, 2025 at 11:00 PM Uros Bizjak <[email protected]> wrote:
>
> On Mon, Sep 15, 2025 at 7:57 AM Uros Bizjak <[email protected]> wrote:
> >
> > On Sun, Sep 14, 2025 at 9:14 PM H.J. Lu <[email protected]> wrote:
> > >
> > > If a single instruction can store or move the whole block of memory, use
> > > vector instruction and don't align destination.
> > >
> > > gcc/
> > >
> > > PR target/121934
> > > * config/i386/i386-expand.cc (ix86_expand_set_or_cpymem): If a
> > > single instruction can store or move the whole block of memory,
> > > use vector instruction and don't align destination.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/121934
> > > * gcc.target/i386/pr121934-1a.c: New test.
> > > * gcc.target/i386/pr121934-1b.c: Likewise.
> > > * gcc.target/i386/pr121934-2a.c: Likewise.
> > > * gcc.target/i386/pr121934-2b.c: Likewise.
> > > * gcc.target/i386/pr121934-3a.c: Likewise.
> > > * gcc.target/i386/pr121934-3b.c: Likewise.
> > > * gcc.target/i386/pr121934-4a.c: Likewise.
> > > * gcc.target/i386/pr121934-4b.c: Likewise.
> > > * gcc.target/i386/pr121934-5a.c: Likewise.
> > > * gcc.target/i386/pr121934-5b.c: Likewise.
> >
> > OK.
> >
> > Thanks,
> > Uros.
> >
> > >
> > > Signed-off-by: H.J. Lu <[email protected]>
> > > ---
> > > gcc/config/i386/i386-expand.cc | 62 +++++++++++++--------
> > > gcc/testsuite/gcc.target/i386/pr121934-1a.c | 22 ++++++++
> > > gcc/testsuite/gcc.target/i386/pr121934-1b.c | 7 +++
> > > gcc/testsuite/gcc.target/i386/pr121934-2a.c | 23 ++++++++
> > > gcc/testsuite/gcc.target/i386/pr121934-2b.c | 7 +++
> > > gcc/testsuite/gcc.target/i386/pr121934-3a.c | 23 ++++++++
> > > gcc/testsuite/gcc.target/i386/pr121934-3b.c | 7 +++
> > > gcc/testsuite/gcc.target/i386/pr121934-4a.c | 23 ++++++++
> > > gcc/testsuite/gcc.target/i386/pr121934-4b.c | 7 +++
> > > gcc/testsuite/gcc.target/i386/pr121934-5a.c | 23 ++++++++
> > > gcc/testsuite/gcc.target/i386/pr121934-5b.c | 7 +++
> > > 11 files changed, 187 insertions(+), 24 deletions(-)
> > > create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-1a.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-1b.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-2a.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-2b.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-3a.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-3b.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-4a.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-4b.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-5a.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-5b.c
> > >
> > > diff --git a/gcc/config/i386/i386-expand.cc
> > > b/gcc/config/i386/i386-expand.cc
> > > index dc26b3452cb..b0b9e6da946 100644
> > > --- a/gcc/config/i386/i386-expand.cc
> > > +++ b/gcc/config/i386/i386-expand.cc
> > > @@ -9552,9 +9552,20 @@ ix86_expand_set_or_cpymem (rtx dst, rtx src, rtx
> > > count_exp, rtx val_exp,
> > > if (!issetmem)
> > > srcreg = ix86_copy_addr_to_reg (XEXP (src, 0));
> > >
> > > + bool aligned_dstmem = false;
> > > + unsigned int nunits = issetmem ? STORE_MAX_PIECES : MOVE_MAX;
> > > + bool single_insn_p = count && count <= nunits;
>
> Should the above also consider X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL
> and/or X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL?
Already did:
#define MOVE_MAX \
((TARGET_AVX512F \
&& (ix86_move_max == PVW_AVX512 \
|| ix86_store_max == PVW_AVX512)) \
? 64 \
: ((TARGET_AVX \
&& (ix86_move_max >= PVW_AVX256 \
|| ix86_store_max >= PVW_AVX256)) \
? 32 \
: ((TARGET_SSE2 \
&& TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
&& TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
? 16 : UNITS_PER_WORD)))
#define STORE_MAX_PIECES \
(TARGET_INTER_UNIT_MOVES_TO_VEC \
? ((TARGET_AVX512F && ix86_store_max == PVW_AVX512) \
? 64 \
: ((TARGET_AVX \
&& ix86_store_max >= PVW_AVX256) \
? 32 \
: ((TARGET_SSE2 \
&& TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
? 16 : UNITS_PER_WORD))) \
: UNITS_PER_WORD)
I am checking it in.
Thanks.
> Uros.
--
H.J.