On Sun, Sep 14, 2025 at 11:00 PM Uros Bizjak <[email protected]> wrote:
>
> On Mon, Sep 15, 2025 at 7:57 AM Uros Bizjak <[email protected]> wrote:
> >
> > On Sun, Sep 14, 2025 at 9:14 PM H.J. Lu <[email protected]> wrote:
> > >
> > > If a single instruction can store or move the whole block of memory, use
> > > vector instruction and don't align destination.
> > >
> > > gcc/
> > >
> > >         PR target/121934
> > >         * config/i386/i386-expand.cc (ix86_expand_set_or_cpymem): If a
> > >         single instruction can store or move the whole block of memory,
> > >         use vector instruction and don't align destination.
> > >
> > > gcc/testsuite/
> > >
> > >         PR target/121934
> > >         * gcc.target/i386/pr121934-1a.c: New test.
> > >         * gcc.target/i386/pr121934-1b.c: Likewise.
> > >         * gcc.target/i386/pr121934-2a.c: Likewise.
> > >         * gcc.target/i386/pr121934-2b.c: Likewise.
> > >         * gcc.target/i386/pr121934-3a.c: Likewise.
> > >         * gcc.target/i386/pr121934-3b.c: Likewise.
> > >         * gcc.target/i386/pr121934-4a.c: Likewise.
> > >         * gcc.target/i386/pr121934-4b.c: Likewise.
> > >         * gcc.target/i386/pr121934-5a.c: Likewise.
> > >         * gcc.target/i386/pr121934-5b.c: Likewise.
> >
> > OK.
> >
> > Thanks,
> > Uros.
> >
> > >
> > > Signed-off-by: H.J. Lu <[email protected]>
> > > ---
> > >  gcc/config/i386/i386-expand.cc              | 62 +++++++++++++--------
> > >  gcc/testsuite/gcc.target/i386/pr121934-1a.c | 22 ++++++++
> > >  gcc/testsuite/gcc.target/i386/pr121934-1b.c |  7 +++
> > >  gcc/testsuite/gcc.target/i386/pr121934-2a.c | 23 ++++++++
> > >  gcc/testsuite/gcc.target/i386/pr121934-2b.c |  7 +++
> > >  gcc/testsuite/gcc.target/i386/pr121934-3a.c | 23 ++++++++
> > >  gcc/testsuite/gcc.target/i386/pr121934-3b.c |  7 +++
> > >  gcc/testsuite/gcc.target/i386/pr121934-4a.c | 23 ++++++++
> > >  gcc/testsuite/gcc.target/i386/pr121934-4b.c |  7 +++
> > >  gcc/testsuite/gcc.target/i386/pr121934-5a.c | 23 ++++++++
> > >  gcc/testsuite/gcc.target/i386/pr121934-5b.c |  7 +++
> > >  11 files changed, 187 insertions(+), 24 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-1a.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-1b.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-2a.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-2b.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-3a.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-3b.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-4a.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-4b.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-5a.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr121934-5b.c
> > >
> > > diff --git a/gcc/config/i386/i386-expand.cc 
> > > b/gcc/config/i386/i386-expand.cc
> > > index dc26b3452cb..b0b9e6da946 100644
> > > --- a/gcc/config/i386/i386-expand.cc
> > > +++ b/gcc/config/i386/i386-expand.cc
> > > @@ -9552,9 +9552,20 @@ ix86_expand_set_or_cpymem (rtx dst, rtx src, rtx 
> > > count_exp, rtx val_exp,
> > >    if (!issetmem)
> > >      srcreg = ix86_copy_addr_to_reg (XEXP (src, 0));
> > >
> > > +  bool aligned_dstmem = false;
> > > +  unsigned int nunits = issetmem ? STORE_MAX_PIECES : MOVE_MAX;
> > > +  bool single_insn_p = count && count <= nunits;
>
> Should the above also consider X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL
> and/or X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL?

Already did:

 #define MOVE_MAX \
  ((TARGET_AVX512F \
    && (ix86_move_max == PVW_AVX512 \
        || ix86_store_max == PVW_AVX512)) \
   ? 64 \
   : ((TARGET_AVX \
       && (ix86_move_max >= PVW_AVX256 \
           || ix86_store_max >= PVW_AVX256)) \
      ? 32 \
      : ((TARGET_SSE2 \
          && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
          && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
         ? 16 : UNITS_PER_WORD)))

#define STORE_MAX_PIECES \
  (TARGET_INTER_UNIT_MOVES_TO_VEC \
   ? ((TARGET_AVX512F && ix86_store_max == PVW_AVX512) \
      ? 64 \
      : ((TARGET_AVX \
          && ix86_store_max >= PVW_AVX256) \
          ? 32 \
          : ((TARGET_SSE2 \
              && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
              ? 16 : UNITS_PER_WORD))) \
   : UNITS_PER_WORD)

I am checking it in.

Thanks.

>  Uros.



-- 
H.J.

Reply via email to