Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop

Bin.Cheng Mon, 12 Jun 2017 01:33:07 -0700

On Mon, Jun 12, 2017 at 9:19 AM, Richard Sandiford
<richard.sandif...@linaro.org> wrote:
> "Bin.Cheng" <amker.ch...@gmail.com> writes:
>> On Sat, Jun 10, 2017 at 10:40 AM, Richard Sandiford
>> <richard.sandif...@linaro.org> wrote:
>>> Sorry to return this old patch, but:
>>>
>>> Bin Cheng <bin.ch...@arm.com> writes:
>>>> -/* Calculate the number of iterations under which scalar loop will be
>>>> -   preferred than vectorized loop.  NITERS_PROLOG is the number of
>>>> -   iterations of prolog loop.  If it's integer const, the integer
>>>> -   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
>>>> -   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
>>>> -   true.  This function also store upper bound of the result in BOUND.  */
>>>> +/* Calculate the number of iterations above which vectorized loop will be
>>>> +   preferred than scalar loop.  NITERS_PROLOG is the number of iterations
>>>> +   of prolog loop.  If it's integer const, the integer number is also 
>>>> passed
>>>> +   in INT_NITERS_PROLOG.  BOUND_PROLOG is the upper bound (included) of
>>>> +   number of iterations of prolog loop.  VFM1 is vector factor minus one.
>>>> +   If CHECK_PROFITABILITY is true, TH is the threshold below which scalar
>>>> +   (rather than vectorized) loop will be executed.  This function stores
>>>> +   upper bound (included) of the result in BOUND_SCALAR.  */
>>>>
>>>>  static tree
>>>>  vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
>>>> -                          int bound_prolog, int vf, int th, int *bound,
>>>> -                          bool check_profitability)
>>>> +                          int bound_prolog, int vfm1, int th,
>>>> +                          int *bound_scalar, bool check_profitability)
>>>>  {
>>>>    tree type = TREE_TYPE (niters_prolog);
>>>>    tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
>>>> -                          build_int_cst (type, vf));
>>>> +                          build_int_cst (type, vfm1));
>>>>
>>>> -  *bound = vf + bound_prolog;
>>>> +  *bound_scalar = vfm1 + bound_prolog;
>>>>    if (check_profitability)
>>>>      {
>>>> -      th++;
>>>> +      /* TH indicates the minimum niters of vectorized loop, while we
>>>> +      compute the maximum niters of scalar loop.  */
>>>> +      th--;
>>>
>>> Are you sure about this last change?  It looks like it should be dropping
>> Hi Richard,
>> Thanks for spotting this.  I vaguely remember I got this from the way
>> how niter/th was checked in previous peeling code, but did't double
>> check it now.  I tend to believe there is inconsistence about th,
>> especially with comment like:
>>
>>   /* Threshold of number of iterations below which vectorzation will not be
>>      performed. It is calculated from MIN_PROFITABLE_ITERS and
>>      PARAM_MIN_VECT_LOOP_BOUND. */
>>   unsigned int th;
>>
>> I also tend to believe the inconsistence was introduced partly because
>> niters in vectorizer stands for latch_niters + 1, while latch_niters
>> in rest of the compiler.
>>
>> and...,
>>
>>> the increment rather than replacing it with a decrement.
>>>
>>> It looks like the threshold is already the maximum niters for the scalar
>>> loop.  It's set by:
>>>
>>>   min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
>>>                             * vectorization_factor) - 1);
>>>
>>>   /* Use the cost model only if it is more conservative than user specified
>>>      threshold.  */
>>>   th = (unsigned) min_scalar_loop_bound;
>>>   if (min_profitable_iters
>>>       && (!min_scalar_loop_bound
>>>           || min_profitable_iters > min_scalar_loop_bound))
>>>     th = (unsigned) min_profitable_iters;
>>>
>>>   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
>>>
>>> (Note the "- 1" in the min_scalar_loop_bound.  The multiplication result
>>> is the minimum niters for the vector loop.)
>> To be honest, min_scalar_loop_bound is more likely for something's
>> lower bound which is the niters for the vector loop.  If it refers to
>> the niters scalar loop, it is in actuality the "max" value we should
>> use.  I am not quite sure here, partly because I am not a native
>> speaker.
>>
>>>
>>> min_profitable_iters sounds like it _ought_ to be the minimum niters for
>>> which the vector loop is used, but vect_estimate_min_profitable_iters
>>> instead returns the largest niters for which the scalar loop should be
>>> preferred:
>>>
>>>   /* Cost model disabled.  */
>>>   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
>>>     {
>>>       dump_printf_loc (MSG_NOTE, vect_location, "cost model disabled.\n");
>>>       *ret_min_profitable_niters = 0;
>>>       *ret_min_profitable_estimate = 0;
>>>       return;
>>>     }
>>>   [...]
>>>   /* Because the condition we create is:
>>>      if (niters <= min_profitable_iters)
>>>        then skip the vectorized loop.  */
>>>   min_profitable_iters--;
>>>   [...]
>>>   min_profitable_estimate --;
>>>
>>> Also, when deciding whether the threshold needs to be used, we have:
>>>
>>>   th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
>>>   if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
>>>       && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>>>     {
>>>       if (dump_enabled_p ())
>>>         dump_printf_loc (MSG_NOTE, vect_location,
>>>                          "Profitability threshold is %d loop iterations.\n",
>>>                          th);
>>>       check_profitability = true;
>>>     }
>> Yes, this indeed indicates that th refers to the maximum niters of the
>> scalar loop.  Anyway, we should resolve the inconsistence and make it
>> more clear in variable names etc..
>
> Yeah, agree the variable names are really confusing.  Given the choice
> between changing the names to match the current meaning and changing
> the meaning to match the current names, the latter looks like it would
> give cleaner code.  I'm happy to do that it that sounds like the way
> to go.
I am all for this direction, but Richard B may have some to say here.
BTW, will you also fix the other inconsistence you found?


Thanks,
bin

>
> FWIW, I came across this while trying make sense of the "+ 1" in:
>
>   /* Decide whether we need to create an epilogue loop to handle
>      remaining scalar iterations.  */
>   th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) + 1)
>         / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
>        * LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>
> Thanks,
> Richard

Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop

Reply via email to