Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

Kewen.Lin Sat, 24 Aug 2019 07:51:00 -0700

Hi Bin,

on 2019/8/23 下午5:43, Bin.Cheng wrote:
> On Fri, Aug 23, 2019 at 4:27 PM Kewen.Lin <[email protected]> wrote:
>>
>> Hi Bin
>>
>> on 2019/8/23 上午10:19, Bin.Cheng wrote:
>>> On Thu, Aug 22, 2019 at 3:09 PM Kewen.Lin <[email protected]> wrote:
>>>>
>>>> Hi Bin,
>>>>
>>>> on 2019/8/22 下午1:46, Bin.Cheng wrote:
>>>>> On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin <[email protected]> wrote:
>>>>>>
>>>>>> Hi Bin,
>>>>>>
>>>>>> Thanks for your time!
>>>>>>
>>>>>> on 2019/8/21 下午8:32, Bin.Cheng wrote:
>>>>>>> On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> Comparing to the previous versions of implementation mainly based on 
>>>>>>>> the
>>>>>>>> existing IV cands but zeroing the related group/use cost, this new one 
>>>>>>>> is based
>>>>>>>> on Richard and Segher's suggestion introducing one doloop dedicated IV 
>>>>>>>> cand.
>>>>>>>>
>>>>>>>> Some key points are listed below:
>>>>>>>>   1) New field doloop_p in struct iv_cand to indicate doloop dedicated 
>>>>>>>> IV cand.
>>>>>>>>   2) Special name "doloop" assigned.
>>>>>>>>   3) Doloop IV cand with form (niter+1, +, -1)
>>>>>>>>   4) For doloop IV cand, no extra one cost like BIV, assign zero cost 
>>>>>>>> for step.
>>>>>>>>   5) Support may_be_zero (regressed PR is in this case), the base of 
>>>>>>>> doloop IV
>>>>>>>>      can be COND_EXPR, add handlings in cand_value_at and 
>>>>>>>> may_eliminate_iv.
>>>>>>>>   6) Add more expr support in force_expr_to_var_cost for reasonable 
>>>>>>>> cost
>>>>>>>>      calculation on the IV base with may_be_zero (like COND_EXPR).
>>>>>>>>   7) Set zero cost when using doloop IV cand for doloop use.
>>>>>>>>   8) Add three hooks (should we merge _generic and _address?).
>>>>>>>>     *) have_count_reg_decr_p, is to indicate the target has special 
>>>>>>>> hardware
>>>>>>>>        count register, we shouldn't consider the impact of doloop IV 
>>>>>>>> when
>>>>>>>>        calculating register pressures.
>>>>>>>>     *) doloop_cost_for_generic, is the extra cost when using doloop IV 
>>>>>>>> cand for
>>>>>>>>        generic type IV use.
>>>>>>>>     *) doloop_cost_for_address, is the extra cost when using doloop IV 
>>>>>>>> cand for
>>>>>>>>        address type IV use.
>>
>>>> The new patch addressing the comments is attached.
>>>> Could you please have a look again?  Thanks in advance!
>>> Thanks for working on this.  A bit more nit-pickings.
>>>
>>> -    add_candidate_1 (data, base, step, important,
>>> -                    IP_NORMAL, use, NULL, orig_iv);
>>> -  if (ip_end_pos (data->current_loop)
>>> +    add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, 
>>> doloop,
>>> +                    orig_iv);
>>> +  if (!doloop && ip_end_pos (data->current_loop)
>>> Could you add some comments elaborating why ip_end_pos candidate
>>> shouldn't be added for doloop case?  Because the increment position is
>>> wrong.
>>>
>>> Also if you make doloop the last default parameter of add_candidate_1,
>>> you can save more unnecessary changes to calls to add_candidate?
>>>
>>> -    cost = get_computation_cost (data, use, cand, false,
>>> -                                &inv_vars, NULL, &inv_expr);
>>> +    {
>>> +      cost = get_computation_cost (data, use, cand, false, &inv_vars, NULL,
>>> +                                  &inv_expr);
>>> +      if (cand->doloop_p)
>>> +       cost += targetm.doloop_cost_for_generic;
>>> +    }
>>> This adjustment
>>>
>>>    cost = get_computation_cost (data, use, cand, true,
>>>                                &inv_vars, &can_autoinc, &inv_expr);
>>>
>>> +  if (cand->doloop_p)
>>> +    cost += targetm.doloop_cost_for_address;
>>> +
>>> and this adjustment can be moved into get_computation_cost where all
>>> cost adjustments are done.
>>>
>>> +      /* For doloop candidate/use pair, adjust to zero cost.  */
>>> +      if (group->doloop_p && cand->doloop_p)
>>> +       cost = no_cost;
>>> Note above code handles comparing against zero case and decreases the
>>> cost by one (which prefers the same kind candidate as doloop one),
>>> it's very possible to have -1 cost for doloop cand here.  how about
>>> just set to no_cost if it's positive?  Your call.
>>>
>>> +/* For the targets which support doloop, to predict whether later RTL 
>>> doloop
>>> +   transformation will perform on this loop, further detect the doloop use 
>>> and
>>> +   mark the flag doloop_use_p if predicted.  */
>>> +
>>> +void
>>> +predict_and_process_doloop (struct ivopts_data *data)
>>> A better name here? Sorry I don't have another candidate in mind...
>>>
>>> +  data->doloop_use_p = false;
>>> This can be moved to the beginning of above
>>> 'predict_and_process_doloop' function.
>>>
>>> Lastly, could you please add some brief description/comment about
>>> doloop handling as a subsection in the file head comment?
>>>
>>> Otherwise, the ivopt changes look good to me.
>>>
>>> Thanks,
>>> bin
>>>
>>
>> Thanks for your prompt reply!  I've updated the code as your comments,
>> the updated version is attached.  Looking forward to your review again.
> 
> Sorry to bother.
> 
> -      return get_scaled_computation_cost_at (data, at, cost);
> +      cost = get_scaled_computation_cost_at (data, at, cost);
> +      /* For doloop IV cand, add on the extra cost.  */
> +      cost += cand->doloop_p ? targetm.doloop_cost_for_address : 0;
> +      return cost;
> Here the cost is adjusted after scaling, while:
> 
> +  /* For doloop IV cand, add on the extra cost.  */
> +  if (cand->doloop_p && use->type == USE_NONLINEAR_EXPR)
> +    cost += targetm.doloop_cost_for_generic;
> +
>    return get_scaled_computation_cost_at (data, at, cost);
> is adjusted before scaling.  Please work consistently.
>


Thanks for catching!

> +      /* Simply use 1.5 * add cost for now, FIXME if there is some
> more accurate
> +        cost evaluation way.  */
> +      cost = comp_cost (1.5 * add_cost (speed, mode), 0);
> +      break;
> Is 1.5 important for some test cases?  Can we simply use 1 instead?
> Or at least use xxx * 2 / 3 in order to avoid floating number.
> 

No, I was thinking they may deserve a bit more than the add since
the cost was a high value before this patch, two was too much for some
cases in my initial prototype, then I just chose 1.5.
I think it should be fine to use 1 here.


The appended diff:

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 88e7890..31ab858 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -4353,9 +4353,9 @@ force_expr_to_var_cost (tree expr, bool speed)
     case LTGT_EXPR:
     case MAX_EXPR:
     case MIN_EXPR:
-      /* Simply use 1.5 * add cost for now, FIXME if there is some more 
accurate
-        cost evaluation way.  */
-      cost = comp_cost (1.5 * add_cost (speed, mode), 0);
+      /* Simply use add cost for now, FIXME if there is some more accurate cost
+        evaluation way.  */
+      cost = comp_cost (add_cost (speed, mode), 0);
       break;

     default:
@@ -4786,11 +4786,13 @@ get_computation_cost (struct ivopts_data *data, struct 
iv_use *use,
   if (comp_inv && !integer_zerop (comp_inv))
     cost += add_cost (speed, TYPE_MODE (utype));

+  cost = get_scaled_computation_cost_at (data, at, cost);
+
   /* For doloop IV cand, add on the extra cost.  */
   if (cand->doloop_p && use->type == USE_NONLINEAR_EXPR)
     cost += targetm.doloop_cost_for_generic;

-  return get_scaled_computation_cost_at (data, at, cost);
+  return cost;
 }


> Not sure if non-ivopts parts are already approved?  If so, the patch
> is okay with above issues addressed.
> 
> Thanks very much for your time!
> 

Thanks a lot for your time and helpful comments as well!!!


Thanks,
Kewen

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

Reply via email to