https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80791

--- Comment #23 from bin cheng <amker at gcc dot gnu.org> ---
(In reply to Kewen Lin from comment #22)
> As the discussion above, on Power any IV should have an extend (sign/zero)
> if its width is less than the GPR width (POINTER_SIZE equivalent here).
> Although we don't model this precisely on Power, in most cases it's trivial
> since we have TYPE_PRECISION check for those inconsistent width uses, extra
> cost will be added, the IV is still the one we prefer even with extension
> consideration in cost modeling, that is, considering extension or not
> doesn't affect the result.
I wonder if a pre-Loop-Opt pass promotes (maybe only induction) variables to
GPR size for usual/simple cases could be helpful? That is, lift burden by
canonicalizing input to IVOPTs as much as possible.  As a mono pass, the
extensibility is not good enough to cover every case, especially based on
possible following optimization.

> 
> In this case, we have one GENERIC use and one CMP use. Cand 4 was chosen as
> the best, but later RTL opts can eliminate CMP use by using counter register
> for the loop closing, it means we don't need to consider CMP use in ivopts
> phase. Since there is no any uses for 32bit IV, the extension cost can NOT
> be covered (ignored) as usual.
> 
> The cost for GENERIC use looks like [cand $index ($iv_cost + $comp_cost)]:
>   1) without considering extension cost, cand 4 (4 + 4) vs. cand 6 (5 + 0) 
>   2) with considering extension cost, cand 4 (8 + 4) vs. cand 6 (9 + 0)
> Since cand 6 can't be used for CMP use, so if we still need to consider CMP,
> cand 4 is always selected, the cost would be larger introducing cand 6. 
> But if can predict CMP is useless and model extension, the cost would be:
> cand 4 (8 + 4) vs. cand 6 (9 + 0). Cand 6 is better.
Elimination of CMP with doloop is not modeled in IVOPTs now, would be great to
have that.  With that support, cand 6 would be chosen regardless of extension
here, right?  Of course, computing cost based on possibly eliminable cand/use
is totally another issue and is even harder.


> 
> I did some hacks locally and it works. But the most tricky and hardest part
> would be how to predict CMP will be optimized away with CTR eventually, 
> the correctness of predict is more important. It looks better to think about
> the simplest case first. The proposed idea is that:
>   1) one target specific hook/flag to enable this adjust
>   2) one target specific predict function to determine the loop can benefit
> do_loop CTR transformation (like innermost loop, no calls, niter determined
> etc.)
>   3) check only one CMP with biv after find_interesting_uses, biv's width <
> POINTER_SIZE, remove the group
>   4) mark the biv preserved to avoid to be removed in remove_unused_ivs
>   5) adjust determine_iv_cost for those IVs which require extension
> (optinal, for this case, it's not necessary but probably good as more
> precise modeling)
> 
> Hi All,
> 
> I'm a new comer to gcc and not sure the above idea is practical enough,
> could you kindly give me some comments and suggestion?

Gibberish, I thought IVOPTs has been improved, but if we continue getting this
kind of sub-optimal issues, maybe we should try to re-implement it, as a group
of small passes each dedicating to specific transformation, rather than a big
mono pass.

Thanks

Reply via email to