https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80791
--- Comment #23 from bin cheng <amker at gcc dot gnu.org> --- (In reply to Kewen Lin from comment #22) > As the discussion above, on Power any IV should have an extend (sign/zero) > if its width is less than the GPR width (POINTER_SIZE equivalent here). > Although we don't model this precisely on Power, in most cases it's trivial > since we have TYPE_PRECISION check for those inconsistent width uses, extra > cost will be added, the IV is still the one we prefer even with extension > consideration in cost modeling, that is, considering extension or not > doesn't affect the result. I wonder if a pre-Loop-Opt pass promotes (maybe only induction) variables to GPR size for usual/simple cases could be helpful? That is, lift burden by canonicalizing input to IVOPTs as much as possible. As a mono pass, the extensibility is not good enough to cover every case, especially based on possible following optimization. > > In this case, we have one GENERIC use and one CMP use. Cand 4 was chosen as > the best, but later RTL opts can eliminate CMP use by using counter register > for the loop closing, it means we don't need to consider CMP use in ivopts > phase. Since there is no any uses for 32bit IV, the extension cost can NOT > be covered (ignored) as usual. > > The cost for GENERIC use looks like [cand $index ($iv_cost + $comp_cost)]: > 1) without considering extension cost, cand 4 (4 + 4) vs. cand 6 (5 + 0) > 2) with considering extension cost, cand 4 (8 + 4) vs. cand 6 (9 + 0) > Since cand 6 can't be used for CMP use, so if we still need to consider CMP, > cand 4 is always selected, the cost would be larger introducing cand 6. > But if can predict CMP is useless and model extension, the cost would be: > cand 4 (8 + 4) vs. cand 6 (9 + 0). Cand 6 is better. Elimination of CMP with doloop is not modeled in IVOPTs now, would be great to have that. With that support, cand 6 would be chosen regardless of extension here, right? Of course, computing cost based on possibly eliminable cand/use is totally another issue and is even harder. > > I did some hacks locally and it works. But the most tricky and hardest part > would be how to predict CMP will be optimized away with CTR eventually, > the correctness of predict is more important. It looks better to think about > the simplest case first. The proposed idea is that: > 1) one target specific hook/flag to enable this adjust > 2) one target specific predict function to determine the loop can benefit > do_loop CTR transformation (like innermost loop, no calls, niter determined > etc.) > 3) check only one CMP with biv after find_interesting_uses, biv's width < > POINTER_SIZE, remove the group > 4) mark the biv preserved to avoid to be removed in remove_unused_ivs > 5) adjust determine_iv_cost for those IVs which require extension > (optinal, for this case, it's not necessary but probably good as more > precise modeling) > > Hi All, > > I'm a new comer to gcc and not sure the above idea is practical enough, > could you kindly give me some comments and suggestion? Gibberish, I thought IVOPTs has been improved, but if we continue getting this kind of sub-optimal issues, maybe we should try to re-implement it, as a group of small passes each dedicating to specific transformation, rather than a big mono pass. Thanks