On Thu, Sep 2, 2021 at 3:11 PM Kewen.Lin <li...@linux.ibm.com> wrote:
>
> on 2021/9/2 下午7:51, Richard Biener wrote:
> > On Thu, Sep 2, 2021 at 1:13 PM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>
> >> Hi Richi,
> >>
> >> Thanks for the comments!
> >>
> >> on 2021/9/2 下午5:25, Richard Biener wrote:
> >>> On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>>>
> >>>> Hi!
> >>>>
> >>>> Power ISA 2.07 (Power8) introduces transactional memory feature
> >>>> but ISA3.1 (Power10) removes it.  It exposes one troublesome
> >>>> issue as PR102059 shows.  Users define some function with
> >>>> target pragma cpu=power10 then it calls one function with
> >>>> attribute always_inline which inherits command line option
> >>>> cpu=power8 which enables HTM implicitly.  The current isa_flags
> >>>> check doesn't allow this inlining due to "target specific
> >>>> option mismatch" and error mesasge is emitted.
> >>>>
> >>>> Normally, the callee function isn't intended to exploit HTM
> >>>> feature, but the default flag setting make it look it has.
> >>>> As Richi raised in the PR, we have fp_expressions flag in
> >>>> function summary, and allow us to check the function actually
> >>>> contains any floating point expressions to avoid overkill.
> >>>> So this patch follows the similar idea but is more target
> >>>> specific, for this rs6000 port specific requirement on HTM
> >>>> feature check, we would like to check rs6000 specific HTM
> >>>> built-in functions and inline assembly, it allows targets
> >>>> to do their own customized checks and updates.
> >>>>
> >>>> It introduces two target hooks need_ipa_fn_target_info and
> >>>> update_ipa_fn_target_info.  The former allows target to do
> >>>> some previous check and decides to collect target specific
> >>>> information for this function or not.  For some special case,
> >>>> it can predict the analysis result and push it early without
> >>>> any scannings.  The latter allows the analyze_function_body
> >>>> to pass gimple stmts down just like fp_expressions handlings,
> >>>> target can do its own tricks.  I put them as one hook initially
> >>>> with one boolean to indicates whether it's initial time, but
> >>>> the code looks a bit ugly, to separate them seems to have
> >>>> better readability.
> >>>>
> >>>> To make it simple, this patch uses HOST_WIDE_INT to record the
> >>>> flags just like what we use for isa_flags.  For rs6000's HTM
> >>>> need, one HOST_WIDE_INT variable is quite enough, but it seems
> >>>> good to have one auto_vec for scalability as I noticed some
> >>>> targets have more than one HOST_WIDE_INT flag.  For now, this
> >>>> target information collection is only for always_inline function,
> >>>> function ipa_merge_fn_summary_after_inlining deals with target
> >>>> information merging.
> >>>>
> >>>> The patch has been bootstrapped and regress-tested on
> >>>> powerpc64le-linux-gnu Power9.
> >>>>
> >>>> Is it on the right track?
> >>>
> >>> +  if (always_inline)
> >>> +    {
> >>> +      cgraph_node *callee_node = cgraph_node::get (callee);
> >>> +      if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != 
> >>> NULL)
> >>> +       {
> >>> +         if (dump_file)
> >>> +           ipa_dump_fn_summary (dump_file, callee_node);
> >>> +         const vec<HOST_WIDE_INT> &info =
> >>> +           ipa_fn_summaries->get (callee_node)->target_info;
> >>> +         if (!info.is_empty ())
> >>> +           always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM;
> >>> +       }
> >>> +
> >>> +      caller_isa &= ~always_inline_safe_mask;
> >>> +      callee_isa &= ~always_inline_safe_mask;
> >>> +    }
> >>>
> >>> that's a bit convoluted but obviously the IPA info can be used for
> >>> non-always_inline cases as well.
> >>>
> >>> As said above the info can be produced for not always-inline functions
> >>> as well, the usual case would be for LTO inlining across TUs compiled
> >>> with different target options.  In your case the special -mcpu=power10
> >>> TU would otherwise not be able to inline from a general -mcpu=power8 TU.
> >>>
> >>
> >> Agree it can be extended to non-always_inline cases.  Since always_inline
> >> is kind of user "forced" requirement and compiler emits error if it fails
> >> to inline, while non-always_inline will have warning instead.  Considering
> >> the scanning might be considered as costly for some big functions, I
> >> guessed it might be good to start from always_inline as the first step.
> >> But if different target options among LTO TUs is a common user case, I
> >> think it's worth to extending it now.
> >
> > I was merely looking at this from the perspective of test coverage - with
> > restricting it to always-inline we're not going to notice issues very
> > reliably I think.
> >
>
> Got it, will extend it to support all inlinable functions in next version.
>
> >>> On the streaming side we possibly have to take care about the
> >>> GPU offloading path where we likely want to avoid pushing host target
> >>> bits to the GPU target in some way.
> >>>
> >>
> >> I guess this comment is about lto_stream_offload_p, I just did some quick
> >> checks, this flag seems to guard things into section offload_lto, while
> >> the function summary has its own section, it seems fine?
> >
> > Yes, but the data, since its target specific, is interpreted different by
> > the host target than by the offload target so IMHO we should drop this
> > to a conservative state when offloading?
> >
>
> OK, I can guard the read/write under !lto_stream_offload_p and also test
> with offload cases to see whether it takes effect as expected.  There are
> explicitly set/unset code before write_summary, but seems none for summary
> reading, I might miss sth., need to check more.
>
> >>> Your case is specifically looking for HTM target builtins - for more 
> >>> general
> >>> cases, like for example deciding whether we can inline across, say,
> >>> -mlzcnt on x86 the scanning would have to include at least direct 
> >>> internal-fns
> >>> mapping to optabs that could change.  With such inlining we might also
> >>> work against heuristic tuning decisions based on the ISA availability
> >>> making code (much) more expensive to expand without such ISA availability,
> >>> but that wouldn't be a correctness issue then.
> >>
> >> OK,I would assume the hook function parameter gimple* will be also enough 
> >> for
> >> this example.  :)
> >
> > Yes, I think so.
> >
> >> IMHO, even with this target information collection, we are unable to check 
> >> all
> >> ISA features, it can only work for some "dull" ISA features, like HTM on
> >> Power which can only be exploited by builtin (or inline asm), the 
> >> downstream
> >> passes don't try to exploit it in other ways.  For some features like VSX,
> >> vectorization pass can produce vector code after we generating fn summary 
> >> and
> >> think it doesn't use, it seems no way to get it right in early stage of 
> >> pass
> >> pipeline.
> >
> > I don't think later passes are an issue - they would operate under the
> > constraints of the caller flag and thus still generate valid code.
>
> Yeah, it makes sense.
>
> > Yes, if, say, somebody
> > disabled VSX on a function in the attempt to not vectorize an unprofitable 
> > case
> > and that function gets inlined it might end up using VSX (as now active in 
> > the
> > caller) to vectorize the unprofitable case.  But in general it should work
>
> The example seems to be what we want to avoid in PR70010 if non-VSX set for 
> callee
> explicitly. :)  btw, in the current implementation, I simply mark the function
> non-HTM if it has explicit option -mno-htm, it doesn't need any scannings.
>
> One opposite example, one callee with VSX flag (implicitly or explicitly), but
> after scanning it's marked as non-VSX, it's a question that if we want it to 
> be
> inlined to one caller with non-VSX or not.  Like your example, if it can be
> vectorized before (meant without inlining), then to inline it is bad.  While 
> if
> it can not be vectorized before, then to inline it is good.  We can't predict
> this function will have VSX code or not (vectorized or not, feature exploited
> or not) when setting the target information VSX feature bit.

Yeah - I guess this is where any heuristic decision a target might want to
make will be different for the always-inline case.

> Anyway, target can do its own decision in that hook, it's fine.  :)
>
> > for any ISA feature (and some ISA features might just change what we
> > expand common GIMPLE to - differences in those ISA features do not need
> > to prevent inlining from a correctness perspective).
> >
>
> Yeah, it would not be a correctness issue.
>
> BR,
> Kewen
>
> > Richard.
> >
> >>>
> >>> Otherwise the overall bits look OK but I'll leave the details to our IPA 
> >>> folks.
> >>>
> >>
> >> Thanks again!
> >>
> >> BR,
> >> Kewen
> >>
> >>> Thanks,
> >>> Richard.
> >>>
> >>>>
> >>>> Any comments are highly appreciated!
> >>>>
> >>>> BR,
> >>>> Kewen
> >>>> ------
> >>>> gcc/ChangeLog:
> >>>>
> >>>>         PR ipa/102059
> >>>>         * config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New
> >>>>         function.
> >>>>         * config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p): 
> >>>> New
> >>>>         declare.
> >>>>         * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New 
> >>>> macro.
> >>>>         (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
> >>>>         (rs6000_need_ipa_fn_target_info): New function.
> >>>>         (rs6000_update_ipa_fn_target_info): Likewise.
> >>>>         (rs6000_can_inline_p): Adjust for ipa function summary target 
> >>>> info.
> >>>>         * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function
> >>>>         summary target info.
> >>>>         (analyze_function_body): Adjust for ipa function summary target
> >>>>         info and call hook rs6000_need_ipa_fn_target_info and
> >>>>         rs6000_update_ipa_fn_target_info.
> >>>>         (ipa_merge_fn_summary_after_inlining): Adjust for ipa function
> >>>>         summary target info.
> >>>>         (inline_read_section): Likewise.
> >>>>         (ipa_fn_summary_write): Likewise.
> >>>>         * ipa-fnsummary.h (ipa_fn_summary::target_info): New member.
> >>>>         * doc/tm.texi: Regenerate.
> >>>>         * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
> >>>>         hook.
> >>>>         (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
> >>>>         * target.def (update_ipa_fn_target_info): New hook.
> >>>>         (need_ipa_fn_target_info): Likewise.
> >>>>         * targhooks.c (default_need_ipa_fn_target_info): New function.
> >>>>         (default_update_ipa_fn_target_info): Likewise.
> >>>>         * targhooks.h (default_update_ipa_fn_target_info): New declare.
> >>>>         (default_need_ipa_fn_target_info): Likewise.
> >>>>
> >>>> gcc/testsuite/ChangeLog:
> >>>>
> >>>>         PR ipa/102059
> >>>>         * gcc.dg/lto/pr102059_0.c: New test.
> >>>>         * gcc.dg/lto/pr102059_1.c: New test.
> >>>>         * gcc.dg/lto/pr102059_2.c: New test.
> >>>>         * gcc.target/powerpc/pr102059-5.c: New test.
> >>>>         * gcc.target/powerpc/pr102059-6.c: New test.
> >>>>
>
>
> BR,
> Kewen

Reply via email to