The 08/08/2025 17:44, Richard Sandiford wrote:
> <alfie.richa...@arm.com> writes:
> > From: Alfie Richards <alfie.richa...@arm.com>
> >
> > Adds an optimisation in FMV to redirect to a specific target if possible.
> >
> > A call is redirected to a specific target if both:
> > - the caller can always call the callee version
> > - and, it is possible to rule out all higher priority versions of the callee
> >   fmv set. That is estabilished either by the callee being the highest 
> > priority
> >   version, or each higher priority version of the callee implying that, 
> > were it
> >   resolved, a higher priority version of the caller would have been 
> > selected.
> >
> > For this logic, introduces the new 
> > TARGET_OPTION_FUNCTIONS_B_RESOLVABLE_FROM_A
> > hook. Adds a full implementation for Aarch64, and a weaker default version
> > for other targets.
> >
> > This allows the target to replace the previous optimisation as the new one 
> > is
> > able to cover the same case where two function sets implement the same 
> > versions.
> 
> Thanks for the update.  The new semantics look good to me.
> 
> >
> > gcc/ChangeLog:
> >
> >     * config/aarch64/aarch64.cc (aarch64_functions_b_resolvable_from_a): New
> >     function.
> >     (TARGET_OPTION_FUNCTIONS_B_RESOLVABLE_FROM_A): New define.
> >     * doc/tm.texi: Regenerate.
> >     * doc/tm.texi.in: Add documentation for version_a_implies_version_b.
> >     * multiple_target.cc (redirect_to_specific_clone): Add new optimisation
> >     logic.
> >     (ipa_target_clone): Remove check for TARGET_HAS_FMV_TARGET_ATTRIBUTE.
> >     * target.def: Document new hook..
> >     * attribs.cc: (functions_b_resolvable_from_a) New function.
> >     * attribs.h: (functions_b_resolvable_from_a) New function.
> >
> > gcc/testsuite/ChangeLog:
> >
> >     * g++.target/aarch64/fmv-selection1.C: New test.
> >     * g++.target/aarch64/fmv-selection2.C: New test.
> >     * g++.target/aarch64/fmv-selection3.C: New test.
> >     * g++.target/aarch64/fmv-selection4.C: New test.
> >     * g++.target/aarch64/fmv-selection5.C: New test.
> >     * g++.target/aarch64/fmv-selection6.C: New test.
> >     * g++.target/aarch64/fmv-selection7.C: New test.
> > ---
> >  gcc/attribs.cc                                |  20 ++++
> >  gcc/attribs.h                                 |   1 +
> >  gcc/config/aarch64/aarch64.cc                 |  28 +++++
> >  gcc/doc/tm.texi                               |   8 ++
> >  gcc/doc/tm.texi.in                            |   2 +
> >  gcc/multiple_target.cc                        | 110 +++++++++++-------
> >  gcc/target.def                                |  11 ++
> >  .../g++.target/aarch64/fmv-selection1.C       |  40 +++++++
> >  .../g++.target/aarch64/fmv-selection2.C       |  40 +++++++
> >  .../g++.target/aarch64/fmv-selection3.C       |  25 ++++
> >  .../g++.target/aarch64/fmv-selection4.C       |  30 +++++
> >  .../g++.target/aarch64/fmv-selection5.C       |  28 +++++
> >  .../g++.target/aarch64/fmv-selection6.C       |  27 +++++
> >  .../g++.target/aarch64/fmv-selection7.C       |  65 +++++++++++
> >  14 files changed, 392 insertions(+), 43 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection1.C
> >  create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection2.C
> >  create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection3.C
> >  create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection4.C
> >  create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection5.C
> >  create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection6.C
> >  create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection7.C
> >
> > diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> > index 2ca82674f7c..fc1751a5f2e 100644
> > --- a/gcc/attribs.cc
> > +++ b/gcc/attribs.cc
> > @@ -1095,6 +1095,26 @@ common_function_versions (string_slice fn1 
> > ATTRIBUTE_UNUSED,
> >    gcc_unreachable ();
> >  }
> >  
> > +/* Default implementation of 
> > TARGET_OPTION_FUNCTION_B_RESOLVBLE_FROM_FUNCTION_A.
> 
> TARGET_OPTION_FUNCTIONS_B_RESOLVABLE_FROM_A
> 
> > +   Used to check very basically if FN2 is callable from FN1.
> > +   For now this checks if the version strings are the same.  */
> > +
> > +bool
> > +functions_b_resolvable_from_a (tree fn1, tree fn2, tree base 
> > ATTRIBUTE_UNUSED)
> 
> Calling these decl_a and decl_b might be better, as for the aarch64
> version.  I realise it doesn't matter here, since the function is
> symmetric, but it would make it clearer what "a" and "b" refer to.
> 
> > +{
> > +  const char *attr_name = TARGET_HAS_FMV_TARGET_ATTRIBUTE
> > +                     ? "target"
> > +                     : "target_version";
> > +
> > +  tree attr1 = lookup_attribute (attr_name, DECL_ATTRIBUTES (fn1));
> > +  tree attr2 = lookup_attribute (attr_name, DECL_ATTRIBUTES (fn2));
> 
> Motivated by the documentation suggestion below, I think we should
> assert attr2 (attr_b).  Same for the aarch64 version.
> 
> > +
> > +  if (!attr1 || !attr2)
> > +    return false;
> > +
> > +  return attribute_value_equal (attr1, attr2);
> > +}
> > +
> >  /* Comparator function to be used in qsort routine to sort attribute
> >     specification strings to "target".  */
> >  
> > [...]
> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > index 47755ad32eb..fddca8c83e0 100644
> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi
> > @@ -10972,6 +10972,14 @@ This target hook returns @code{true} if the 
> > target/target-version strings
> >  @var{fn1} and @var{fn2} imply the same function version.
> >  @end deftypefn
> >  
> > +@deftypefn {Target Hook} bool TARGET_OPTION_FUNCTIONS_B_RESOLVABLE_FROM_A 
> > (tree @var{v_a}, tree @var{v_b}, tree @var{base})
> > +Determines if a function with the arch implied by @var{v_b}'s
> 
> Here too, I think decl_a and decl_b would be easier to follow.
> 
> I think it would be worth expanding "arch" to "architecture".
> 
> > +FMV version attributes (either @code{target} or @code{target_version} for
> 
> function multi-versioning (FMV)
> 
> > +target FMV or target_version FMV semantics respectively) would always be
> > +resolvable from a function with @var{v_a}'s FMV version attributes.
> > +Uses a function decl @var{base} to establish the baseline architecture.
> 
> It would be good to expand on what the default is.
> 
> How about:
> 
> ------------------------------------------------------------------------------
> @var{decl_b} is a function declaration with a function multi-versioning (FMV)
> attribute; this attribute is either @code{target} or @code{target_version},
> depending on @code{TARGET_HAS_FMV_TARGET_ATTRIBUTE}.  @var{decl_a} is
> a function declaration that may or may not have an FMV attribute.
> 
> Return true if we have enough information to determine that the
> requirements of @var{decl_b}'s FMV attribute are met whenever @var{decl_a}
> is executed, given that the target supports all features required by
> function declaration @var{base}.
> 
> The default implementation just checks whether @var{decl_a} has the same
> FMV attribute as @var{decl_b}.  This is conservatively correct,
> but ports can do better by taking the relationships between architecture
> features into account.  For example, on AArch64, @code{sve} is present
> whenever @code{sve2} is present.
> ------------------------------------------------------------------------------
> 
> Feel free to reword.  It was just a way of listing the information
> that seemed useful to include.

Thanks for the feedback! This is much better.
> 
> (It looks like we're missing documentation for
> TARGET_HAS_FMV_TARGET_ATTRIBUTE though.)

Good spot.
I will submit some documentation for that separately.
> 
> > +@end deftypefn
> > +
> >  @deftypefn {Target Hook} bool TARGET_CAN_INLINE_P (tree @var{caller}, tree 
> > @var{callee})
> >  This target hook returns @code{false} if the @var{caller} function
> >  cannot inline @var{callee}, based on target specific information.  By
> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > index eaab62652a7..99f47cb9d78 100644
> > --- a/gcc/doc/tm.texi.in
> > +++ b/gcc/doc/tm.texi.in
> > @@ -7132,6 +7132,8 @@ with the target specific attributes.  The default 
> > value is @code{','}.
> >  
> >  @hook TARGET_OPTION_FUNCTION_VERSIONS
> >  
> > +@hook TARGET_OPTION_FUNCTIONS_B_RESOLVABLE_FROM_A
> > +
> >  @hook TARGET_CAN_INLINE_P
> >  
> >  @hook TARGET_UPDATE_IPA_FN_TARGET_INFO
> > diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
> > index 7ab025d7d46..681b586e61c 100644
> > --- a/gcc/multiple_target.cc
> > +++ b/gcc/multiple_target.cc
> > @@ -423,61 +423,86 @@ expand_target_clones (struct cgraph_node *node, bool 
> > definition)
> >    return true;
> >  }
> >  
> > -/* When NODE is a target clone, consider all callees and redirect
> > -   to a clone with equal target attributes.  That prevents multiple
> > -   multi-versioning dispatches and a call-chain can be optimized.
> > -
> > -   This optimisation might pick the wrong version in some cases, since 
> > knowing
> > -   that we meet the target requirements for a matching callee version does 
> > not
> > -   tell us that we won't also meet the target requirements for a higher
> > -   priority callee version at runtime.  Since this is longstanding 
> > behaviour
> > -   for x86 and powerpc, we preserve it for those targets, but skip the 
> > optimisation
> > -   for targets that use the "target_version" attribute for 
> > multi-versioning.  */
> > +/* When NODE is part of an FMV function set, consider all callees and 
> > check if
> > +   any can provably always resolve a certain version and then call that 
> > version
> > +   directly.  */
> >  
> >  static void
> >  redirect_to_specific_clone (cgraph_node *node)
> >  {
> > -  cgraph_function_version_info *fv = node->function_version ();
> > -  if (fv == NULL)
> > +  if (!targetm.compare_version_priority
> > +      || !targetm.target_option.functions_b_resolvable_from_a
> 
> The condition on the last line should always be true.
> 
> > +      || !optimize)
> >      return;
> >  
> > -  gcc_assert (TARGET_HAS_FMV_TARGET_ATTRIBUTE);
> > -  tree attr_target = lookup_attribute ("target", DECL_ATTRIBUTES 
> > (node->decl));
> > -  if (attr_target == NULL_TREE)
> > -    return;
> > -
> > -  /* We need to remember NEXT_CALLER as it could be modified in the loop.  
> > */
> > +  /* We need to remember NEXT_CALLER as it could be modified in the
> > +     loop.  */
> 
> There's no need to reflow this comment.
> 
> >    for (cgraph_edge *e = node->callees; e ; e = e->next_callee)
> >      {
> > -      cgraph_function_version_info *fv2 = e->callee->function_version ();
> > -      if (!fv2)
> > +      /* Only if this is a call to a dispatched symbol.  */
> > +      if (!e->callee->dispatcher_function)
> >     continue;
> >  
> > -      tree attr_target2 = lookup_attribute ("target",
> > -                                       DECL_ATTRIBUTES (e->callee->decl));
> > +      cgraph_function_version_info *callee_v
> > +   = e->callee->function_version ();
> > +      cgraph_function_version_info *caller_v
> > +   = e->caller->function_version ();
> >  
> > -      /* Function is not calling proper target clone.  */
> > -      if (attr_target2 == NULL_TREE
> > -     || !attribute_value_equal (attr_target, attr_target2))
> > -   {
> > -     while (fv2->prev != NULL)
> > -       fv2 = fv2->prev;
> > +      /* If this is not the TU that contains the definition of the default
> > +    version we are not guaranteed to have visibility of all versions
> > +    so cannot reason about them.  */
> > +      if (!callee_v->next->this_node->binds_to_current_def_p ())
> > +   continue;
> >  
> > -     /* Try to find a clone with equal target attribute.  */
> > -     for (; fv2 != NULL; fv2 = fv2->next)
> > -       {
> > -         cgraph_node *callee = fv2->this_node;
> > -         attr_target2 = lookup_attribute ("target",
> > -                                          DECL_ATTRIBUTES (callee->decl));
> > -         if (attr_target2 != NULL_TREE
> > -             && attribute_value_equal (attr_target, attr_target2))
> > +      cgraph_function_version_info *highest_callable_fn = NULL;
> > +      for (cgraph_function_version_info *ver = callee_v->next;
> > +      ver;
> > +      ver = ver->next)
> > +   if (targetm.target_option.functions_b_resolvable_from_a
> > +         (node->decl, ver->this_node->decl, node->decl))
> > +     highest_callable_fn = ver;
> > +
> > +      if (!highest_callable_fn)
> > +   continue;
> > +
> > +      /* If there are higher priority versions of callee and caller has no
> > +    more version information, then not callable.  */
> > +      if (!caller_v && highest_callable_fn->next)
> > +   continue;
> 
> I think this should be:
> 
>     if (highest_callable_fn->next
>       && (!caller_v
>           || !caller_v->next->this_node->binds_to_current_def_p ()))
>       continue;

This logic isn't quite correct. My previous check that
`if (!callee_v->next->this_node->binds_to_current_def_p ())`
relies on the fact that callee_v is the dispatched symbol so the default node
must be in this TU and must be the lowest priority so must be next.

So I need slightly more complex logic to get the defauld caller node and to
check if its in this TU at all.

I will resubmit with that minor logic addition.
> 
> We don't need to check whether node->binds_to_current_def_p (),
> because if node is preempted/overridden, then it doesn't matter
> what node does.  But for highest_callable_fn->next we're assuming
> that all of caller's versions are visible in the current TU.

Agreed.
> 
> > +
> > +      bool inlinable = true;
> > +      /* If every higher priority version would imply a higher priority
> > +    version of caller would have been selected, then this is
> > +    callable.  */
> > +      if (caller_v)
> > +   for (cgraph_function_version_info *callee_ver
> > +          = highest_callable_fn->next;
> > +        callee_ver;
> > +        callee_ver = callee_ver->next)
> > +     {
> > +       bool is_possible = true;
> > +       for (cgraph_function_version_info *caller_ver = caller_v->next;
> > +            caller_ver;
> > +            caller_ver = caller_ver->next)
> > +         if (targetm.target_option.functions_b_resolvable_from_a
> > +              (callee_ver->this_node->decl,
> > +               caller_ver->this_node->decl,
> > +               node->decl))
> >             {
> > -             e->redirect_callee (callee);
> > -             cgraph_edge::redirect_call_stmt_to_callee (e);
> > +             is_possible = false;
> >               break;
> >             }
> > -       }
> > -   }
> > +       if (is_possible)
> > +         {
> > +           inlinable = false;
> > +           break;
> > +         }
> > +     }
> > +      if (!inlinable)
> > +   continue;
> 
> I suppose this is personal preference, but I think the flow would
> be easier to follow if the loop was conditional on highest_callable_fn->next.
> Putting that together:
> 
>     if (highest_callable_fn->next)
>       {
>         /* If there are higher priority versions of callee and caller has no
>            more version information, then not callable.  */
>       if (!caller_v
>           || !caller_v->next->this_node->binds_to_current_def_p ())
>           continue;
> 
>         /* If every higher priority version would imply a higher priority
>            version of caller would have been selected, then this is
>            callable.  */
>       for (cgraph_function_version_info *callee_ver
>              = highest_callable_fn->next;
>            callee_ver;
>            callee_ver = callee_ver->next)
> 
> LGTM with those changes.
> 
> Thanks,
> Richard
> 
> > +
> > +      e->redirect_callee (highest_callable_fn->this_node);
> > +      cgraph_edge::redirect_call_stmt_to_callee (e);
> >      }
> >  }
> >  

-- 
Alfie Richards

Reply via email to