https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80791

--- Comment #22 from Kewen Lin <linkw at gcc dot gnu.org> ---
As the discussion above, on Power any IV should have an extend (sign/zero) if
its width is less than the GPR width (POINTER_SIZE equivalent here). Although
we don't model this precisely on Power, in most cases it's trivial since we
have TYPE_PRECISION check for those inconsistent width uses, extra cost will be
added, the IV is still the one we prefer even with extension consideration in
cost modeling, that is, considering extension or not doesn't affect the result.

In this case, we have one GENERIC use and one CMP use. Cand 4 was chosen as the
best, but later RTL opts can eliminate CMP use by using counter register for
the loop closing, it means we don't need to consider CMP use in ivopts phase.
Since there is no any uses for 32bit IV, the extension cost can NOT be covered
(ignored) as usual.

The cost for GENERIC use looks like [cand $index ($iv_cost + $comp_cost)]:
  1) without considering extension cost, cand 4 (4 + 4) vs. cand 6 (5 + 0) 
  2) with considering extension cost, cand 4 (8 + 4) vs. cand 6 (9 + 0)
Since cand 6 can't be used for CMP use, so if we still need to consider CMP,
cand 4 is always selected, the cost would be larger introducing cand 6. 
But if can predict CMP is useless and model extension, the cost would be: cand
4 (8 + 4) vs. cand 6 (9 + 0). Cand 6 is better.

I did some hacks locally and it works. But the most tricky and hardest part
would be how to predict CMP will be optimized away with CTR eventually, 
the correctness of predict is more important. It looks better to think about
the simplest case first. The proposed idea is that:
  1) one target specific hook/flag to enable this adjust
  2) one target specific predict function to determine the loop can benefit
do_loop CTR transformation (like innermost loop, no calls, niter determined
etc.)
  3) check only one CMP with biv after find_interesting_uses, biv's width <
POINTER_SIZE, remove the group
  4) mark the biv preserved to avoid to be removed in remove_unused_ivs
  5) adjust determine_iv_cost for those IVs which require extension (optinal,
for this case, it's not necessary but probably good as more precise modeling)

Hi All,

I'm a new comer to gcc and not sure the above idea is practical enough, could
you kindly give me some comments and suggestion?

Reply via email to