> If the user specified a different arch for callee than the caller, > then the compiler will switch on different ISAs (-march is just a > shortcut for different ISA packs), and the programmer is aware that > inlining isn't intended here (we have -mtune, which is not as strong > as -march, but even functions with different -mtune are not inlined > without always_inline attribute). This is documented as:
The original issue comes from a case like float callee (float a, float b, float c, float d, float e, float f, float g, float h) { return a * b + c * d + e * f + g + h + a * c + b * c + a * d + b * e + a * f + c * h + b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h; } __attribute__((target_clones("default","arch=icelake-server"))) void caller (int n, float *a, float c1, float c2, float c3, float c4, float c5, float c6, float c7) { for (int i = 0; i < n; i++) { a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7); } } For current gcc, the .icelake_server clone fails to inline callee due to target specific option mismatch, while the .default clone succeeded and the loop get vectorized. I think it is not reasonable that the specific clone with higher arch cannot produce better code. So I think at least we can decide to inline those callee without any arch/tune specified, but for now they are rejected by the strict arch= and tune= check. Uros Bizjak <ubiz...@gmail.com> 于2023年6月28日周三 14:43写道: > > On Wed, Jun 28, 2023 at 3:56 AM Hongyu Wang <wwwhhhyyy...@gmail.com> wrote: > > > > > I don't think this is desirable. If we inline something with different > > > ISAs, we get some strange mix of ISAs when the function is inlined. > > > OTOH - we already inline with mismatched tune flags if the function is > > > marked with always_inline. > > > > Previously ix86_can_inline_p has > > > > if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags) > > != callee_opts->x_ix86_isa_flags) > > || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2) > > != callee_opts->x_ix86_isa_flags2)) > > ret = false; > > > > It make sure caller ISA is a super set of callee, and the inlined one > > should follow caller's ISA specification. > > > > IMHO I cannot give a real example that after inline the caller's > > performance get harmed, I added PVW since there might > > be some callee want to limit its vector size and caller may have > > larger preferred vector size. At least with current change > > we get more optimization opportunity for different target_clones. > > > > But I agree the tuning setting may be a factor that affect the > > performance. One possible choice is that if the > > tune for callee is unspecified or default, just inline it to the > > caller with specified arch and tune. > > If the user specified a different arch for callee than the caller, > then the compiler will switch on different ISAs (-march is just a > shortcut for different ISA packs), and the programmer is aware that > inlining isn't intended here (we have -mtune, which is not as strong > as -march, but even functions with different -mtune are not inlined > without always_inline attribute). This is documented as: > > --q-- > On the x86, the inliner does not inline a function that has different > target options than the caller, unless the callee has a subset of the > target options of the caller. For example a function declared with > target("sse3") can inline a function with target("sse2"), since -msse3 > implies -msse2. > --/q-- > > I don't think arch=skylake can be considered as a subset of > arch=icelake-server. > > I agree that the compiler should reject functions with different PVW. > This is also in accordance with the documentation. > > Uros. > > > > > Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年6月27日周二 17:16写道: > > > > > > > > > > > > On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang <hongyu.w...@intel.com> wrote: > > > > > > > > Hi, > > > > > > > > For function with different target attributes, current logic rejects to > > > > inline the callee when any arch or tune is mismatched. Relax the > > > > condition to honor just prefer_vecotr_width_type and other flags that > > > > may cause safety issue so caller can get more optimization opportunity. > > > > > > I don't think this is desirable. If we inline something with different > > > ISAs, we get some strange mix of ISAs when the function is inlined. > > > OTOH - we already inline with mismatched tune flags if the function is > > > marked with always_inline. > > > > > > Uros. > > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} > > > > > > > > Ok for trunk? > > > > > > > > gcc/ChangeLog: > > > > > > > > * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or > > > > tune directly, just check prefer_vector_width_type and make sure > > > > not to inline if they mismatch. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.target/i386/inline-target-attr.c: New test. > > > > --- > > > > gcc/config/i386/i386.cc | 11 +++++---- > > > > .../gcc.target/i386/inline-target-attr.c | 24 +++++++++++++++++++ > > > > 2 files changed, 30 insertions(+), 5 deletions(-) > > > > create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > > index 0761965344b..1d86384ac06 100644 > > > > --- a/gcc/config/i386/i386.cc > > > > +++ b/gcc/config/i386/i386.cc > > > > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee) > > > > != (callee_opts->x_target_flags & > > > > ~always_inline_safe_mask)) > > > > ret = false; > > > > > > > > - /* See if arch, tune, etc. are the same. */ > > > > - else if (caller_opts->arch != callee_opts->arch) > > > > - ret = false; > > > > - > > > > - else if (!always_inline && caller_opts->tune != callee_opts->tune) > > > > + /* Do not inline when specified perfer-vector-width mismatched > > > > between > > > > + callee and caller. */ > > > > + else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE > > > > + && caller_opts->x_prefer_vector_width_type != PVW_NONE) > > > > + && callee_opts->x_prefer_vector_width_type > > > > + != caller_opts->x_prefer_vector_width_type) > > > > ret = false; > > > > > > > > else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath > > > > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > > b/gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > > new file mode 100644 > > > > index 00000000000..995502165f0 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > > @@ -0,0 +1,24 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O2" } */ > > > > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */ > > > > + > > > > +__attribute__((target("arch=skylake"))) > > > > +int callee (int n) > > > > +{ > > > > + int sum = 0; > > > > + for (int i = 0; i < n; i++) > > > > + { > > > > + if (i % 2 == 0) > > > > + sum +=i; > > > > + else > > > > + sum += (i - 1); > > > > + } > > > > + return sum + n; > > > > +} > > > > + > > > > +__attribute__((target("arch=icelake-server"))) > > > > +int caller (int n) > > > > +{ > > > > + return callee (n) + n; > > > > +} > > > > + > > > > -- > > > > 2.31.1 > > > >