On Thu, Sep 2, 2021 at 3:11 PM Kewen.Lin <li...@linux.ibm.com> wrote: > > on 2021/9/2 下午7:51, Richard Biener wrote: > > On Thu, Sep 2, 2021 at 1:13 PM Kewen.Lin <li...@linux.ibm.com> wrote: > >> > >> Hi Richi, > >> > >> Thanks for the comments! > >> > >> on 2021/9/2 下午5:25, Richard Biener wrote: > >>> On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin <li...@linux.ibm.com> wrote: > >>>> > >>>> Hi! > >>>> > >>>> Power ISA 2.07 (Power8) introduces transactional memory feature > >>>> but ISA3.1 (Power10) removes it. It exposes one troublesome > >>>> issue as PR102059 shows. Users define some function with > >>>> target pragma cpu=power10 then it calls one function with > >>>> attribute always_inline which inherits command line option > >>>> cpu=power8 which enables HTM implicitly. The current isa_flags > >>>> check doesn't allow this inlining due to "target specific > >>>> option mismatch" and error mesasge is emitted. > >>>> > >>>> Normally, the callee function isn't intended to exploit HTM > >>>> feature, but the default flag setting make it look it has. > >>>> As Richi raised in the PR, we have fp_expressions flag in > >>>> function summary, and allow us to check the function actually > >>>> contains any floating point expressions to avoid overkill. > >>>> So this patch follows the similar idea but is more target > >>>> specific, for this rs6000 port specific requirement on HTM > >>>> feature check, we would like to check rs6000 specific HTM > >>>> built-in functions and inline assembly, it allows targets > >>>> to do their own customized checks and updates. > >>>> > >>>> It introduces two target hooks need_ipa_fn_target_info and > >>>> update_ipa_fn_target_info. The former allows target to do > >>>> some previous check and decides to collect target specific > >>>> information for this function or not. For some special case, > >>>> it can predict the analysis result and push it early without > >>>> any scannings. The latter allows the analyze_function_body > >>>> to pass gimple stmts down just like fp_expressions handlings, > >>>> target can do its own tricks. I put them as one hook initially > >>>> with one boolean to indicates whether it's initial time, but > >>>> the code looks a bit ugly, to separate them seems to have > >>>> better readability. > >>>> > >>>> To make it simple, this patch uses HOST_WIDE_INT to record the > >>>> flags just like what we use for isa_flags. For rs6000's HTM > >>>> need, one HOST_WIDE_INT variable is quite enough, but it seems > >>>> good to have one auto_vec for scalability as I noticed some > >>>> targets have more than one HOST_WIDE_INT flag. For now, this > >>>> target information collection is only for always_inline function, > >>>> function ipa_merge_fn_summary_after_inlining deals with target > >>>> information merging. > >>>> > >>>> The patch has been bootstrapped and regress-tested on > >>>> powerpc64le-linux-gnu Power9. > >>>> > >>>> Is it on the right track? > >>> > >>> + if (always_inline) > >>> + { > >>> + cgraph_node *callee_node = cgraph_node::get (callee); > >>> + if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != > >>> NULL) > >>> + { > >>> + if (dump_file) > >>> + ipa_dump_fn_summary (dump_file, callee_node); > >>> + const vec<HOST_WIDE_INT> &info = > >>> + ipa_fn_summaries->get (callee_node)->target_info; > >>> + if (!info.is_empty ()) > >>> + always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM; > >>> + } > >>> + > >>> + caller_isa &= ~always_inline_safe_mask; > >>> + callee_isa &= ~always_inline_safe_mask; > >>> + } > >>> > >>> that's a bit convoluted but obviously the IPA info can be used for > >>> non-always_inline cases as well. > >>> > >>> As said above the info can be produced for not always-inline functions > >>> as well, the usual case would be for LTO inlining across TUs compiled > >>> with different target options. In your case the special -mcpu=power10 > >>> TU would otherwise not be able to inline from a general -mcpu=power8 TU. > >>> > >> > >> Agree it can be extended to non-always_inline cases. Since always_inline > >> is kind of user "forced" requirement and compiler emits error if it fails > >> to inline, while non-always_inline will have warning instead. Considering > >> the scanning might be considered as costly for some big functions, I > >> guessed it might be good to start from always_inline as the first step. > >> But if different target options among LTO TUs is a common user case, I > >> think it's worth to extending it now. > > > > I was merely looking at this from the perspective of test coverage - with > > restricting it to always-inline we're not going to notice issues very > > reliably I think. > > > > Got it, will extend it to support all inlinable functions in next version. > > >>> On the streaming side we possibly have to take care about the > >>> GPU offloading path where we likely want to avoid pushing host target > >>> bits to the GPU target in some way. > >>> > >> > >> I guess this comment is about lto_stream_offload_p, I just did some quick > >> checks, this flag seems to guard things into section offload_lto, while > >> the function summary has its own section, it seems fine? > > > > Yes, but the data, since its target specific, is interpreted different by > > the host target than by the offload target so IMHO we should drop this > > to a conservative state when offloading? > > > > OK, I can guard the read/write under !lto_stream_offload_p and also test > with offload cases to see whether it takes effect as expected. There are > explicitly set/unset code before write_summary, but seems none for summary > reading, I might miss sth., need to check more. > > >>> Your case is specifically looking for HTM target builtins - for more > >>> general > >>> cases, like for example deciding whether we can inline across, say, > >>> -mlzcnt on x86 the scanning would have to include at least direct > >>> internal-fns > >>> mapping to optabs that could change. With such inlining we might also > >>> work against heuristic tuning decisions based on the ISA availability > >>> making code (much) more expensive to expand without such ISA availability, > >>> but that wouldn't be a correctness issue then. > >> > >> OK,I would assume the hook function parameter gimple* will be also enough > >> for > >> this example. :) > > > > Yes, I think so. > > > >> IMHO, even with this target information collection, we are unable to check > >> all > >> ISA features, it can only work for some "dull" ISA features, like HTM on > >> Power which can only be exploited by builtin (or inline asm), the > >> downstream > >> passes don't try to exploit it in other ways. For some features like VSX, > >> vectorization pass can produce vector code after we generating fn summary > >> and > >> think it doesn't use, it seems no way to get it right in early stage of > >> pass > >> pipeline. > > > > I don't think later passes are an issue - they would operate under the > > constraints of the caller flag and thus still generate valid code. > > Yeah, it makes sense. > > > Yes, if, say, somebody > > disabled VSX on a function in the attempt to not vectorize an unprofitable > > case > > and that function gets inlined it might end up using VSX (as now active in > > the > > caller) to vectorize the unprofitable case. But in general it should work > > The example seems to be what we want to avoid in PR70010 if non-VSX set for > callee > explicitly. :) btw, in the current implementation, I simply mark the function > non-HTM if it has explicit option -mno-htm, it doesn't need any scannings. > > One opposite example, one callee with VSX flag (implicitly or explicitly), but > after scanning it's marked as non-VSX, it's a question that if we want it to > be > inlined to one caller with non-VSX or not. Like your example, if it can be > vectorized before (meant without inlining), then to inline it is bad. While > if > it can not be vectorized before, then to inline it is good. We can't predict > this function will have VSX code or not (vectorized or not, feature exploited > or not) when setting the target information VSX feature bit.
Yeah - I guess this is where any heuristic decision a target might want to make will be different for the always-inline case. > Anyway, target can do its own decision in that hook, it's fine. :) > > > for any ISA feature (and some ISA features might just change what we > > expand common GIMPLE to - differences in those ISA features do not need > > to prevent inlining from a correctness perspective). > > > > Yeah, it would not be a correctness issue. > > BR, > Kewen > > > Richard. > > > >>> > >>> Otherwise the overall bits look OK but I'll leave the details to our IPA > >>> folks. > >>> > >> > >> Thanks again! > >> > >> BR, > >> Kewen > >> > >>> Thanks, > >>> Richard. > >>> > >>>> > >>>> Any comments are highly appreciated! > >>>> > >>>> BR, > >>>> Kewen > >>>> ------ > >>>> gcc/ChangeLog: > >>>> > >>>> PR ipa/102059 > >>>> * config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New > >>>> function. > >>>> * config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p): > >>>> New > >>>> declare. > >>>> * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New > >>>> macro. > >>>> (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise. > >>>> (rs6000_need_ipa_fn_target_info): New function. > >>>> (rs6000_update_ipa_fn_target_info): Likewise. > >>>> (rs6000_can_inline_p): Adjust for ipa function summary target > >>>> info. > >>>> * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function > >>>> summary target info. > >>>> (analyze_function_body): Adjust for ipa function summary target > >>>> info and call hook rs6000_need_ipa_fn_target_info and > >>>> rs6000_update_ipa_fn_target_info. > >>>> (ipa_merge_fn_summary_after_inlining): Adjust for ipa function > >>>> summary target info. > >>>> (inline_read_section): Likewise. > >>>> (ipa_fn_summary_write): Likewise. > >>>> * ipa-fnsummary.h (ipa_fn_summary::target_info): New member. > >>>> * doc/tm.texi: Regenerate. > >>>> * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new > >>>> hook. > >>>> (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise. > >>>> * target.def (update_ipa_fn_target_info): New hook. > >>>> (need_ipa_fn_target_info): Likewise. > >>>> * targhooks.c (default_need_ipa_fn_target_info): New function. > >>>> (default_update_ipa_fn_target_info): Likewise. > >>>> * targhooks.h (default_update_ipa_fn_target_info): New declare. > >>>> (default_need_ipa_fn_target_info): Likewise. > >>>> > >>>> gcc/testsuite/ChangeLog: > >>>> > >>>> PR ipa/102059 > >>>> * gcc.dg/lto/pr102059_0.c: New test. > >>>> * gcc.dg/lto/pr102059_1.c: New test. > >>>> * gcc.dg/lto/pr102059_2.c: New test. > >>>> * gcc.target/powerpc/pr102059-5.c: New test. > >>>> * gcc.target/powerpc/pr102059-6.c: New test. > >>>> > > > BR, > Kewen