On Wed, Jun 28, 2023 at 3:56 AM Hongyu Wang <[email protected]> wrote:
>
> > I don't think this is desirable. If we inline something with different
> > ISAs, we get some strange mix of ISAs when the function is inlined.
> > OTOH - we already inline with mismatched tune flags if the function is
> > marked with always_inline.
>
> Previously ix86_can_inline_p has
>
> if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
> != callee_opts->x_ix86_isa_flags)
> || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
> != callee_opts->x_ix86_isa_flags2))
> ret = false;
>
> It make sure caller ISA is a super set of callee, and the inlined one
> should follow caller's ISA specification.
>
> IMHO I cannot give a real example that after inline the caller's
> performance get harmed, I added PVW since there might
> be some callee want to limit its vector size and caller may have
> larger preferred vector size. At least with current change
> we get more optimization opportunity for different target_clones.
>
> But I agree the tuning setting may be a factor that affect the
> performance. One possible choice is that if the
> tune for callee is unspecified or default, just inline it to the
> caller with specified arch and tune.
If the user specified a different arch for callee than the caller,
then the compiler will switch on different ISAs (-march is just a
shortcut for different ISA packs), and the programmer is aware that
inlining isn't intended here (we have -mtune, which is not as strong
as -march, but even functions with different -mtune are not inlined
without always_inline attribute). This is documented as:
--q--
On the x86, the inliner does not inline a function that has different
target options than the caller, unless the callee has a subset of the
target options of the caller. For example a function declared with
target("sse3") can inline a function with target("sse2"), since -msse3
implies -msse2.
--/q--
I don't think arch=skylake can be considered as a subset of arch=icelake-server.
I agree that the compiler should reject functions with different PVW.
This is also in accordance with the documentation.
Uros.
>
> Uros Bizjak via Gcc-patches <[email protected]> 于2023年6月27日周二 17:16写道:
>
>
>
> >
> > On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > For function with different target attributes, current logic rejects to
> > > inline the callee when any arch or tune is mismatched. Relax the
> > > condition to honor just prefer_vecotr_width_type and other flags that
> > > may cause safety issue so caller can get more optimization opportunity.
> >
> > I don't think this is desirable. If we inline something with different
> > ISAs, we get some strange mix of ISAs when the function is inlined.
> > OTOH - we already inline with mismatched tune flags if the function is
> > marked with always_inline.
> >
> > Uros.
> >
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}
> > >
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or
> > > tune directly, just check prefer_vector_width_type and make sure
> > > not to inline if they mismatch.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/inline-target-attr.c: New test.
> > > ---
> > > gcc/config/i386/i386.cc | 11 +++++----
> > > .../gcc.target/i386/inline-target-attr.c | 24 +++++++++++++++++++
> > > 2 files changed, 30 insertions(+), 5 deletions(-)
> > > create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index 0761965344b..1d86384ac06 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee)
> > > != (callee_opts->x_target_flags &
> > > ~always_inline_safe_mask))
> > > ret = false;
> > >
> > > - /* See if arch, tune, etc. are the same. */
> > > - else if (caller_opts->arch != callee_opts->arch)
> > > - ret = false;
> > > -
> > > - else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > > + /* Do not inline when specified perfer-vector-width mismatched between
> > > + callee and caller. */
> > > + else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE
> > > + && caller_opts->x_prefer_vector_width_type != PVW_NONE)
> > > + && callee_opts->x_prefer_vector_width_type
> > > + != caller_opts->x_prefer_vector_width_type)
> > > ret = false;
> > >
> > > else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> > > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > new file mode 100644
> > > index 00000000000..995502165f0
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > @@ -0,0 +1,24 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2" } */
> > > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */
> > > +
> > > +__attribute__((target("arch=skylake")))
> > > +int callee (int n)
> > > +{
> > > + int sum = 0;
> > > + for (int i = 0; i < n; i++)
> > > + {
> > > + if (i % 2 == 0)
> > > + sum +=i;
> > > + else
> > > + sum += (i - 1);
> > > + }
> > > + return sum + n;
> > > +}
> > > +
> > > +__attribute__((target("arch=icelake-server")))
> > > +int caller (int n)
> > > +{
> > > + return callee (n) + n;
> > > +}
> > > +
> > > --
> > > 2.31.1
> > >