https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80791

--- Comment #24 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to bin cheng from comment #23)
> (In reply to Kewen Lin from comment #22)
> > As the discussion above, on Power any IV should have an extend (sign/zero)
> > if its width is less than the GPR width (POINTER_SIZE equivalent here).
> > Although we don't model this precisely on Power, in most cases it's trivial
> > since we have TYPE_PRECISION check for those inconsistent width uses, extra
> > cost will be added, the IV is still the one we prefer even with extension
> > consideration in cost modeling, that is, considering extension or not
> > doesn't affect the result.
> I wonder if a pre-Loop-Opt pass promotes (maybe only induction) variables to
> GPR size for usual/simple cases could be helpful? That is, lift burden by
> canonicalizing input to IVOPTs as much as possible.  As a mono pass, the
> extensibility is not good enough to cover every case, especially based on
> possible following optimization.

It's a very good idea to make some prepass canonicalization, but for this case
it seems we can't guarantee it's safe to promote. Compiler knows nothing about
narrower range of argument m of function f (without lto), it's possible that m
holds INT_MAX (or some values easy to wrap), if we aggressively promote it, it
will have different semantic without extend. I agree that we can determine the
range (range analysis) and get rid of the extend for some cases, or maybe
versioning overflow conditions to get a specialized version without any
possibility to overflow but it sounds too complicated. 
> 
> > 
> > In this case, we have one GENERIC use and one CMP use. Cand 4 was chosen as
> > the best, but later RTL opts can eliminate CMP use by using counter register
> > for the loop closing, it means we don't need to consider CMP use in ivopts
> > phase. Since there is no any uses for 32bit IV, the extension cost can NOT
> > be covered (ignored) as usual.
> > 
> > The cost for GENERIC use looks like [cand $index ($iv_cost + $comp_cost)]:
> >   1) without considering extension cost, cand 4 (4 + 4) vs. cand 6 (5 + 0) 
> >   2) with considering extension cost, cand 4 (8 + 4) vs. cand 6 (9 + 0)
> > Since cand 6 can't be used for CMP use, so if we still need to consider CMP,
> > cand 4 is always selected, the cost would be larger introducing cand 6. 
> > But if can predict CMP is useless and model extension, the cost would be:
> > cand 4 (8 + 4) vs. cand 6 (9 + 0). Cand 6 is better.
> Elimination of CMP with doloop is not modeled in IVOPTs now, would be great
> to have that.  With that support, cand 6 would be chosen regardless of
> extension here, right?  Of course, computing cost based on possibly
> eliminable cand/use is totally another issue and is even harder.

Yes! The key thing here is that the eliminable CMP isn't modeled, the extension
modeling isn't important (that's why it's optional in below step 5) :)
>
>
> > 
> > I did some hacks locally and it works. But the most tricky and hardest part
> > would be how to predict CMP will be optimized away with CTR eventually, 
> > the correctness of predict is more important. It looks better to think about
> > the simplest case first. The proposed idea is that:
> >   1) one target specific hook/flag to enable this adjust
> >   2) one target specific predict function to determine the loop can benefit
> > do_loop CTR transformation (like innermost loop, no calls, niter determined
> > etc.)
> >   3) check only one CMP with biv after find_interesting_uses, biv's width <
> > POINTER_SIZE, remove the group
> >   4) mark the biv preserved to avoid to be removed in remove_unused_ivs
> >   5) adjust determine_iv_cost for those IVs which require extension
> > (optinal, for this case, it's not necessary but probably good as more
> > precise modeling)
> > 
> > Hi All,
> > 
> > I'm a new comer to gcc and not sure the above idea is practical enough,
> > could you kindly give me some comments and suggestion?
> 
> Gibberish, I thought IVOPTs has been improved, but if we continue getting
> this kind of sub-optimal issues, maybe we should try to re-implement it, as
> a group of small passes each dedicating to specific transformation, rather
> than a big mono pass.

I could be wrong, but I think IVOPTs becomes pretty good with your and others'
efforts. (Really appreciate!) I'm not sure whether there are more similar
issues, but it looks understandable to have some limitations or some
sub-optimal scenarios, since sometimes we don't have enough or precise inputs,
like this case, it's hard to precisely predict CMP eliminable or not, have to
use heuristics/checks to archive try-your-best optimal. 

Thanks!

> 
> Thanks

Reply via email to