On Mon, Jun 23, 2025 at 4:45 PM H.J. Lu <hjl.to...@gmail.com> wrote:
>
> On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > On Mon, Jun 23, 2025 at 3:11 PM Hongtao Liu <crazy...@gmail.com> wrote:
> > >
> > > On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > >
> > > > Extend the remove_redundant_vector pass to handle vector broadcasts from
> > > > constant and variable scalars.  When broadcasting from constants and
> > > > function arguments, we can place a single widest vector broadcast at
> > > > entry of the nearest common dominator for basic blocks with all uses
> > > > since constants and function arguments aren't changed.  For broadcast
> > > > from variables with a single definition, the single definition is
> > > > replaced with the widest broadcast.
> > > >
> > > > gcc/
> > > >
> > > >         PR target/92080
> > > >         * config/i386/i386-expand.cc (ix86_expand_call): Set
> > > >         recursive_function to true for recursive call.
> > > >         * config/i386/i386-features.cc (ix86_place_single_vector_set):
> > > >         Add an argument for inner scalar, default to nullptr.  Set the
> > > >         source from inner scalar if not nullptr.
> > > >         (ix86_get_vector_load_mode): Renamed to ...
> > > >         (ix86_get_vector_cse_mode): This.  Add an argument for scalar 
> > > > mode
> > > >         and handle integer and float scalar modes.
> > > >         (replace_vector_const): Add an argument for scalar mode and pass
> > > >         it to ix86_get_vector_load_mode.
> > > >         (x86_cse_kind): New.
> > > >         (redundant_load): Likewise.
> > > >         (ix86_broadcast_inner): Likewise.
> > > >         (remove_redundant_vector_load): Also support const0_rtx and
> > > >         constm1_rtx broadcasts.  Handle vector broadcasts from constant
> > > >         and variable scalars.
> > > >         * config/i386/i386.h (machine_function): Add recursive_function.
> > > >
> > > > gcc/testsuite/
> > > >
> > > >         * gcc.target/i386/keylocker-aesdecwide128kl.c: Updated to expect
> > > >         movdqa instead pxor.
> > > >         * gcc.target/i386/keylocker-aesdecwide256kl.c: Likewise.
> > > >         * gcc.target/i386/keylocker-aesencwide128kl.c: Likewise.
> > > >         * gcc.target/i386/keylocker-aesencwide256kl.c: Likewise.
> > > >         * gcc.target/i386/pr92080-4.c: New test.
> > > >         * gcc.target/i386/pr92080-5.c: Likewise.
> > > >         * gcc.target/i386/pr92080-6.c: Likewise.
> > > >         * gcc.target/i386/pr92080-7.c: Likewise.
> > > >         * gcc.target/i386/pr92080-8.c: Likewise.
> > > >         * gcc.target/i386/pr92080-9.c: Likewise.
> > > >         * gcc.target/i386/pr92080-10.c: Likewise.
> > > >         * gcc.target/i386/pr92080-11.c: Likewise.
> > > >         * gcc.target/i386/pr92080-12.c: Likewise.
> > > >         * gcc.target/i386/pr92080-13.c: Likewise.
> > > >         * gcc.target/i386/pr92080-14.c: Likewise.
> > > >         * gcc.target/i386/pr92080-15.c: Likewise.
> > > >         * gcc.target/i386/pr92080-16.c: Likewise.
> > > >
> > > > Signed-off-by: H.J. Lu <hjl.to...@gmail.com>
> > > > ---
> > > >  gcc/config/i386/i386-expand.cc                |   3 +
> > > >  gcc/config/i386/i386-features.cc              | 410 ++++++++++++++----
> > > >  gcc/config/i386/i386.h                        |   3 +
> > > >  .../i386/keylocker-aesdecwide128kl.c          |  14 +-
> > > >  .../i386/keylocker-aesdecwide256kl.c          |  14 +-
> > > >  .../i386/keylocker-aesencwide128kl.c          |  14 +-
> > > >  .../i386/keylocker-aesencwide256kl.c          |  14 +-
> > > >  gcc/testsuite/gcc.target/i386/pr92080-10.c    |  13 +
> > > >  gcc/testsuite/gcc.target/i386/pr92080-11.c    |  33 ++
> > > >  gcc/testsuite/gcc.target/i386/pr92080-12.c    |  16 +
> > > >  gcc/testsuite/gcc.target/i386/pr92080-13.c    |  32 ++
> > > >  gcc/testsuite/gcc.target/i386/pr92080-14.c    |  31 ++
> > > >  gcc/testsuite/gcc.target/i386/pr92080-15.c    |  25 ++
> > > >  gcc/testsuite/gcc.target/i386/pr92080-16.c    |  26 ++
> > > >  gcc/testsuite/gcc.target/i386/pr92080-4.c     |  50 +++
> > > >  gcc/testsuite/gcc.target/i386/pr92080-5.c     | 109 +++++
> > > >  gcc/testsuite/gcc.target/i386/pr92080-6.c     |  19 +
> > > >  gcc/testsuite/gcc.target/i386/pr92080-7.c     |  20 +
> > > >  gcc/testsuite/gcc.target/i386/pr92080-8.c     |  16 +
> > > >  gcc/testsuite/gcc.target/i386/pr92080-9.c     |  81 ++++
> > > >  20 files changed, 823 insertions(+), 120 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-10.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-11.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-12.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-13.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-14.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-15.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-16.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-4.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-5.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-6.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-7.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-8.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-9.c
> > > >
>
> > > > +  else
> > > > +    {
> > > > +      while (SUBREG_P (dest))
> > > > +       dest = SUBREG_REG (dest);
> > > > +
> > > > +      /* Skip if the SET destination mode doesn't match.  */
> > > > +      if (GET_MODE (dest) != mode)
> > > > +       return nullptr;
> > >
> > > Can we just require (dest == reg || dest == op), otherwise we need to
> > > make sure GET_MODE of the original dest can cover mode of op(which is
> > > more complicated, need to make sure SUBREG_BYTE is also zero???)
> >
> > I will change it to
> >
> >       /* Skip if the SET destination isn't the broadcast source.  */
> >       if (dest != reg)
> >         return nullptr;
>
> Here is the v4 patch with:
>
>       /* The SET destination must be the broadcast source.  */
>       gcc_assert (dest == op);
I don't understand this, looks like you're post the dump patch instead
of the original one.
>
> OK for master?
>
> Thanks.
>
>
> --
> H.J.



-- 
BR,
Hongtao

Reply via email to