On Thu, Sep 2, 2021 at 1:13 PM Kewen.Lin <li...@linux.ibm.com> wrote:
>
> Hi Richi,
>
> Thanks for the comments!
>
> on 2021/9/2 下午5:25, Richard Biener wrote:
> > On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>
> >> Hi!
> >>
> >> Power ISA 2.07 (Power8) introduces transactional memory feature
> >> but ISA3.1 (Power10) removes it.  It exposes one troublesome
> >> issue as PR102059 shows.  Users define some function with
> >> target pragma cpu=power10 then it calls one function with
> >> attribute always_inline which inherits command line option
> >> cpu=power8 which enables HTM implicitly.  The current isa_flags
> >> check doesn't allow this inlining due to "target specific
> >> option mismatch" and error mesasge is emitted.
> >>
> >> Normally, the callee function isn't intended to exploit HTM
> >> feature, but the default flag setting make it look it has.
> >> As Richi raised in the PR, we have fp_expressions flag in
> >> function summary, and allow us to check the function actually
> >> contains any floating point expressions to avoid overkill.
> >> So this patch follows the similar idea but is more target
> >> specific, for this rs6000 port specific requirement on HTM
> >> feature check, we would like to check rs6000 specific HTM
> >> built-in functions and inline assembly, it allows targets
> >> to do their own customized checks and updates.
> >>
> >> It introduces two target hooks need_ipa_fn_target_info and
> >> update_ipa_fn_target_info.  The former allows target to do
> >> some previous check and decides to collect target specific
> >> information for this function or not.  For some special case,
> >> it can predict the analysis result and push it early without
> >> any scannings.  The latter allows the analyze_function_body
> >> to pass gimple stmts down just like fp_expressions handlings,
> >> target can do its own tricks.  I put them as one hook initially
> >> with one boolean to indicates whether it's initial time, but
> >> the code looks a bit ugly, to separate them seems to have
> >> better readability.
> >>
> >> To make it simple, this patch uses HOST_WIDE_INT to record the
> >> flags just like what we use for isa_flags.  For rs6000's HTM
> >> need, one HOST_WIDE_INT variable is quite enough, but it seems
> >> good to have one auto_vec for scalability as I noticed some
> >> targets have more than one HOST_WIDE_INT flag.  For now, this
> >> target information collection is only for always_inline function,
> >> function ipa_merge_fn_summary_after_inlining deals with target
> >> information merging.
> >>
> >> The patch has been bootstrapped and regress-tested on
> >> powerpc64le-linux-gnu Power9.
> >>
> >> Is it on the right track?
> >
> > +  if (always_inline)
> > +    {
> > +      cgraph_node *callee_node = cgraph_node::get (callee);
> > +      if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL)
> > +       {
> > +         if (dump_file)
> > +           ipa_dump_fn_summary (dump_file, callee_node);
> > +         const vec<HOST_WIDE_INT> &info =
> > +           ipa_fn_summaries->get (callee_node)->target_info;
> > +         if (!info.is_empty ())
> > +           always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM;
> > +       }
> > +
> > +      caller_isa &= ~always_inline_safe_mask;
> > +      callee_isa &= ~always_inline_safe_mask;
> > +    }
> >
> > that's a bit convoluted but obviously the IPA info can be used for
> > non-always_inline cases as well.
> >
> > As said above the info can be produced for not always-inline functions
> > as well, the usual case would be for LTO inlining across TUs compiled
> > with different target options.  In your case the special -mcpu=power10
> > TU would otherwise not be able to inline from a general -mcpu=power8 TU.
> >
>
> Agree it can be extended to non-always_inline cases.  Since always_inline
> is kind of user "forced" requirement and compiler emits error if it fails
> to inline, while non-always_inline will have warning instead.  Considering
> the scanning might be considered as costly for some big functions, I
> guessed it might be good to start from always_inline as the first step.
> But if different target options among LTO TUs is a common user case, I
> think it's worth to extending it now.

I was merely looking at this from the perspective of test coverage - with
restricting it to always-inline we're not going to notice issues very
reliably I think.

> > On the streaming side we possibly have to take care about the
> > GPU offloading path where we likely want to avoid pushing host target
> > bits to the GPU target in some way.
> >
>
> I guess this comment is about lto_stream_offload_p, I just did some quick
> checks, this flag seems to guard things into section offload_lto, while
> the function summary has its own section, it seems fine?

Yes, but the data, since its target specific, is interpreted different by
the host target than by the offload target so IMHO we should drop this
to a conservative state when offloading?

> > Your case is specifically looking for HTM target builtins - for more general
> > cases, like for example deciding whether we can inline across, say,
> > -mlzcnt on x86 the scanning would have to include at least direct 
> > internal-fns
> > mapping to optabs that could change.  With such inlining we might also
> > work against heuristic tuning decisions based on the ISA availability
> > making code (much) more expensive to expand without such ISA availability,
> > but that wouldn't be a correctness issue then.
>
> OK,I would assume the hook function parameter gimple* will be also enough for
> this example.  :)

Yes, I think so.

> IMHO, even with this target information collection, we are unable to check all
> ISA features, it can only work for some "dull" ISA features, like HTM on
> Power which can only be exploited by builtin (or inline asm), the downstream
> passes don't try to exploit it in other ways.  For some features like VSX,
> vectorization pass can produce vector code after we generating fn summary and
> think it doesn't use, it seems no way to get it right in early stage of pass
> pipeline.

I don't think later passes are an issue - they would operate under the
constraints
of the caller flag and thus still generate valid code.  Yes, if, say, somebody
disabled VSX on a function in the attempt to not vectorize an unprofitable case
and that function gets inlined it might end up using VSX (as now active in the
caller) to vectorize the unprofitable case.  But in general it should work
for any ISA feature (and some ISA features might just change what we
expand common GIMPLE to - differences in those ISA features do not need
to prevent inlining from a correctness perspective).

Richard.

> >
> > Otherwise the overall bits look OK but I'll leave the details to our IPA 
> > folks.
> >
>
> Thanks again!
>
> BR,
> Kewen
>
> > Thanks,
> > Richard.
> >
> >>
> >> Any comments are highly appreciated!
> >>
> >> BR,
> >> Kewen
> >> ------
> >> gcc/ChangeLog:
> >>
> >>         PR ipa/102059
> >>         * config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New
> >>         function.
> >>         * config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p): New
> >>         declare.
> >>         * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New 
> >> macro.
> >>         (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
> >>         (rs6000_need_ipa_fn_target_info): New function.
> >>         (rs6000_update_ipa_fn_target_info): Likewise.
> >>         (rs6000_can_inline_p): Adjust for ipa function summary target info.
> >>         * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function
> >>         summary target info.
> >>         (analyze_function_body): Adjust for ipa function summary target
> >>         info and call hook rs6000_need_ipa_fn_target_info and
> >>         rs6000_update_ipa_fn_target_info.
> >>         (ipa_merge_fn_summary_after_inlining): Adjust for ipa function
> >>         summary target info.
> >>         (inline_read_section): Likewise.
> >>         (ipa_fn_summary_write): Likewise.
> >>         * ipa-fnsummary.h (ipa_fn_summary::target_info): New member.
> >>         * doc/tm.texi: Regenerate.
> >>         * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
> >>         hook.
> >>         (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
> >>         * target.def (update_ipa_fn_target_info): New hook.
> >>         (need_ipa_fn_target_info): Likewise.
> >>         * targhooks.c (default_need_ipa_fn_target_info): New function.
> >>         (default_update_ipa_fn_target_info): Likewise.
> >>         * targhooks.h (default_update_ipa_fn_target_info): New declare.
> >>         (default_need_ipa_fn_target_info): Likewise.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>         PR ipa/102059
> >>         * gcc.dg/lto/pr102059_0.c: New test.
> >>         * gcc.dg/lto/pr102059_1.c: New test.
> >>         * gcc.dg/lto/pr102059_2.c: New test.
> >>         * gcc.target/powerpc/pr102059-5.c: New test.
> >>         * gcc.target/powerpc/pr102059-6.c: New test.
> >>

Reply via email to