<alfie.richa...@arm.com> writes: > From: Alfie Richards <alfie.richa...@arm.com> > > Adds an optimisation in FMV to redirect to a specific target if possible. > > A call is redirected to a specific target if both: > - the caller can always call the callee version > - and, it is possible to rule out all higher priority versions of the callee > fmv set. That is estabilished either by the callee being the highest > priority > version, or each higher priority version of the callee implying that, were > it > resolved, a higher priority version of the caller would have been selected. > > For this logic, introduces the new TARGET_OPTION_FUNCTIONS_B_RESOLVABLE_FROM_A > hook. Adds a full implementation for Aarch64, and a weaker default version > for other targets. > > This allows the target to replace the previous optimisation as the new one is > able to cover the same case where two function sets implement the same > versions.
Thanks for the update. The new semantics look good to me. > > gcc/ChangeLog: > > * config/aarch64/aarch64.cc (aarch64_functions_b_resolvable_from_a): New > function. > (TARGET_OPTION_FUNCTIONS_B_RESOLVABLE_FROM_A): New define. > * doc/tm.texi: Regenerate. > * doc/tm.texi.in: Add documentation for version_a_implies_version_b. > * multiple_target.cc (redirect_to_specific_clone): Add new optimisation > logic. > (ipa_target_clone): Remove check for TARGET_HAS_FMV_TARGET_ATTRIBUTE. > * target.def: Document new hook.. > * attribs.cc: (functions_b_resolvable_from_a) New function. > * attribs.h: (functions_b_resolvable_from_a) New function. > > gcc/testsuite/ChangeLog: > > * g++.target/aarch64/fmv-selection1.C: New test. > * g++.target/aarch64/fmv-selection2.C: New test. > * g++.target/aarch64/fmv-selection3.C: New test. > * g++.target/aarch64/fmv-selection4.C: New test. > * g++.target/aarch64/fmv-selection5.C: New test. > * g++.target/aarch64/fmv-selection6.C: New test. > * g++.target/aarch64/fmv-selection7.C: New test. > --- > gcc/attribs.cc | 20 ++++ > gcc/attribs.h | 1 + > gcc/config/aarch64/aarch64.cc | 28 +++++ > gcc/doc/tm.texi | 8 ++ > gcc/doc/tm.texi.in | 2 + > gcc/multiple_target.cc | 110 +++++++++++------- > gcc/target.def | 11 ++ > .../g++.target/aarch64/fmv-selection1.C | 40 +++++++ > .../g++.target/aarch64/fmv-selection2.C | 40 +++++++ > .../g++.target/aarch64/fmv-selection3.C | 25 ++++ > .../g++.target/aarch64/fmv-selection4.C | 30 +++++ > .../g++.target/aarch64/fmv-selection5.C | 28 +++++ > .../g++.target/aarch64/fmv-selection6.C | 27 +++++ > .../g++.target/aarch64/fmv-selection7.C | 65 +++++++++++ > 14 files changed, 392 insertions(+), 43 deletions(-) > create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection1.C > create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection2.C > create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection3.C > create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection4.C > create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection5.C > create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection6.C > create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection7.C > > diff --git a/gcc/attribs.cc b/gcc/attribs.cc > index 2ca82674f7c..fc1751a5f2e 100644 > --- a/gcc/attribs.cc > +++ b/gcc/attribs.cc > @@ -1095,6 +1095,26 @@ common_function_versions (string_slice fn1 > ATTRIBUTE_UNUSED, > gcc_unreachable (); > } > > +/* Default implementation of > TARGET_OPTION_FUNCTION_B_RESOLVBLE_FROM_FUNCTION_A. TARGET_OPTION_FUNCTIONS_B_RESOLVABLE_FROM_A > + Used to check very basically if FN2 is callable from FN1. > + For now this checks if the version strings are the same. */ > + > +bool > +functions_b_resolvable_from_a (tree fn1, tree fn2, tree base > ATTRIBUTE_UNUSED) Calling these decl_a and decl_b might be better, as for the aarch64 version. I realise it doesn't matter here, since the function is symmetric, but it would make it clearer what "a" and "b" refer to. > +{ > + const char *attr_name = TARGET_HAS_FMV_TARGET_ATTRIBUTE > + ? "target" > + : "target_version"; > + > + tree attr1 = lookup_attribute (attr_name, DECL_ATTRIBUTES (fn1)); > + tree attr2 = lookup_attribute (attr_name, DECL_ATTRIBUTES (fn2)); Motivated by the documentation suggestion below, I think we should assert attr2 (attr_b). Same for the aarch64 version. > + > + if (!attr1 || !attr2) > + return false; > + > + return attribute_value_equal (attr1, attr2); > +} > + > /* Comparator function to be used in qsort routine to sort attribute > specification strings to "target". */ > > [...] > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > index 47755ad32eb..fddca8c83e0 100644 > --- a/gcc/doc/tm.texi > +++ b/gcc/doc/tm.texi > @@ -10972,6 +10972,14 @@ This target hook returns @code{true} if the > target/target-version strings > @var{fn1} and @var{fn2} imply the same function version. > @end deftypefn > > +@deftypefn {Target Hook} bool TARGET_OPTION_FUNCTIONS_B_RESOLVABLE_FROM_A > (tree @var{v_a}, tree @var{v_b}, tree @var{base}) > +Determines if a function with the arch implied by @var{v_b}'s Here too, I think decl_a and decl_b would be easier to follow. I think it would be worth expanding "arch" to "architecture". > +FMV version attributes (either @code{target} or @code{target_version} for function multi-versioning (FMV) > +target FMV or target_version FMV semantics respectively) would always be > +resolvable from a function with @var{v_a}'s FMV version attributes. > +Uses a function decl @var{base} to establish the baseline architecture. It would be good to expand on what the default is. How about: ------------------------------------------------------------------------------ @var{decl_b} is a function declaration with a function multi-versioning (FMV) attribute; this attribute is either @code{target} or @code{target_version}, depending on @code{TARGET_HAS_FMV_TARGET_ATTRIBUTE}. @var{decl_a} is a function declaration that may or may not have an FMV attribute. Return true if we have enough information to determine that the requirements of @var{decl_b}'s FMV attribute are met whenever @var{decl_a} is executed, given that the target supports all features required by function declaration @var{base}. The default implementation just checks whether @var{decl_a} has the same FMV attribute as @var{decl_b}. This is conservatively correct, but ports can do better by taking the relationships between architecture features into account. For example, on AArch64, @code{sve} is present whenever @code{sve2} is present. ------------------------------------------------------------------------------ Feel free to reword. It was just a way of listing the information that seemed useful to include. (It looks like we're missing documentation for TARGET_HAS_FMV_TARGET_ATTRIBUTE though.) > +@end deftypefn > + > @deftypefn {Target Hook} bool TARGET_CAN_INLINE_P (tree @var{caller}, tree > @var{callee}) > This target hook returns @code{false} if the @var{caller} function > cannot inline @var{callee}, based on target specific information. By > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > index eaab62652a7..99f47cb9d78 100644 > --- a/gcc/doc/tm.texi.in > +++ b/gcc/doc/tm.texi.in > @@ -7132,6 +7132,8 @@ with the target specific attributes. The default value > is @code{','}. > > @hook TARGET_OPTION_FUNCTION_VERSIONS > > +@hook TARGET_OPTION_FUNCTIONS_B_RESOLVABLE_FROM_A > + > @hook TARGET_CAN_INLINE_P > > @hook TARGET_UPDATE_IPA_FN_TARGET_INFO > diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc > index 7ab025d7d46..681b586e61c 100644 > --- a/gcc/multiple_target.cc > +++ b/gcc/multiple_target.cc > @@ -423,61 +423,86 @@ expand_target_clones (struct cgraph_node *node, bool > definition) > return true; > } > > -/* When NODE is a target clone, consider all callees and redirect > - to a clone with equal target attributes. That prevents multiple > - multi-versioning dispatches and a call-chain can be optimized. > - > - This optimisation might pick the wrong version in some cases, since > knowing > - that we meet the target requirements for a matching callee version does > not > - tell us that we won't also meet the target requirements for a higher > - priority callee version at runtime. Since this is longstanding behaviour > - for x86 and powerpc, we preserve it for those targets, but skip the > optimisation > - for targets that use the "target_version" attribute for multi-versioning. > */ > +/* When NODE is part of an FMV function set, consider all callees and check > if > + any can provably always resolve a certain version and then call that > version > + directly. */ > > static void > redirect_to_specific_clone (cgraph_node *node) > { > - cgraph_function_version_info *fv = node->function_version (); > - if (fv == NULL) > + if (!targetm.compare_version_priority > + || !targetm.target_option.functions_b_resolvable_from_a The condition on the last line should always be true. > + || !optimize) > return; > > - gcc_assert (TARGET_HAS_FMV_TARGET_ATTRIBUTE); > - tree attr_target = lookup_attribute ("target", DECL_ATTRIBUTES > (node->decl)); > - if (attr_target == NULL_TREE) > - return; > - > - /* We need to remember NEXT_CALLER as it could be modified in the loop. */ > + /* We need to remember NEXT_CALLER as it could be modified in the > + loop. */ There's no need to reflow this comment. > for (cgraph_edge *e = node->callees; e ; e = e->next_callee) > { > - cgraph_function_version_info *fv2 = e->callee->function_version (); > - if (!fv2) > + /* Only if this is a call to a dispatched symbol. */ > + if (!e->callee->dispatcher_function) > continue; > > - tree attr_target2 = lookup_attribute ("target", > - DECL_ATTRIBUTES (e->callee->decl)); > + cgraph_function_version_info *callee_v > + = e->callee->function_version (); > + cgraph_function_version_info *caller_v > + = e->caller->function_version (); > > - /* Function is not calling proper target clone. */ > - if (attr_target2 == NULL_TREE > - || !attribute_value_equal (attr_target, attr_target2)) > - { > - while (fv2->prev != NULL) > - fv2 = fv2->prev; > + /* If this is not the TU that contains the definition of the default > + version we are not guaranteed to have visibility of all versions > + so cannot reason about them. */ > + if (!callee_v->next->this_node->binds_to_current_def_p ()) > + continue; > > - /* Try to find a clone with equal target attribute. */ > - for (; fv2 != NULL; fv2 = fv2->next) > - { > - cgraph_node *callee = fv2->this_node; > - attr_target2 = lookup_attribute ("target", > - DECL_ATTRIBUTES (callee->decl)); > - if (attr_target2 != NULL_TREE > - && attribute_value_equal (attr_target, attr_target2)) > + cgraph_function_version_info *highest_callable_fn = NULL; > + for (cgraph_function_version_info *ver = callee_v->next; > + ver; > + ver = ver->next) > + if (targetm.target_option.functions_b_resolvable_from_a > + (node->decl, ver->this_node->decl, node->decl)) > + highest_callable_fn = ver; > + > + if (!highest_callable_fn) > + continue; > + > + /* If there are higher priority versions of callee and caller has no > + more version information, then not callable. */ > + if (!caller_v && highest_callable_fn->next) > + continue; I think this should be: if (highest_callable_fn->next && (!caller_v || !caller_v->next->this_node->binds_to_current_def_p ())) continue; We don't need to check whether node->binds_to_current_def_p (), because if node is preempted/overridden, then it doesn't matter what node does. But for highest_callable_fn->next we're assuming that all of caller's versions are visible in the current TU. > + > + bool inlinable = true; > + /* If every higher priority version would imply a higher priority > + version of caller would have been selected, then this is > + callable. */ > + if (caller_v) > + for (cgraph_function_version_info *callee_ver > + = highest_callable_fn->next; > + callee_ver; > + callee_ver = callee_ver->next) > + { > + bool is_possible = true; > + for (cgraph_function_version_info *caller_ver = caller_v->next; > + caller_ver; > + caller_ver = caller_ver->next) > + if (targetm.target_option.functions_b_resolvable_from_a > + (callee_ver->this_node->decl, > + caller_ver->this_node->decl, > + node->decl)) > { > - e->redirect_callee (callee); > - cgraph_edge::redirect_call_stmt_to_callee (e); > + is_possible = false; > break; > } > - } > - } > + if (is_possible) > + { > + inlinable = false; > + break; > + } > + } > + if (!inlinable) > + continue; I suppose this is personal preference, but I think the flow would be easier to follow if the loop was conditional on highest_callable_fn->next. Putting that together: if (highest_callable_fn->next) { /* If there are higher priority versions of callee and caller has no more version information, then not callable. */ if (!caller_v || !caller_v->next->this_node->binds_to_current_def_p ()) continue; /* If every higher priority version would imply a higher priority version of caller would have been selected, then this is callable. */ for (cgraph_function_version_info *callee_ver = highest_callable_fn->next; callee_ver; callee_ver = callee_ver->next) LGTM with those changes. Thanks, Richard > + > + e->redirect_callee (highest_callable_fn->this_node); > + cgraph_edge::redirect_call_stmt_to_callee (e); > } > } >