On Wed, Jul 20, 2016 at 6:23 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
> On Wed, Jul 20, 2016 at 11:01 AM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>> On Tue, Jul 19, 2016 at 6:15 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>>> On Tue, Jul 19, 2016 at 1:10 PM, Richard Biener
>>> <richard.guent...@gmail.com> wrote:
>>>> On Mon, Jul 18, 2016 at 6:27 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>>>>> Hi,
>>>>> Scalar evolution needs to prove no-overflow for source variable when 
>>>>> handling type conversion.  This is important because otherwise we would 
>>>>> fail to recognize result of the conversion as SCEV, resulting in missing 
>>>>> loop optimizations.  Take case added by this patch as an example, the 
>>>>> loop can't be distributed as memset call because address of memory 
>>>>> reference is not recognized.  At the moment, we rely on type overflow 
>>>>> semantics and loop niter info for no-overflow checking, unfortunately 
>>>>> that's not enough.  This patch introduces new method checking no-overflow 
>>>>> using value range information.  As commented in the patch, value range 
>>>>> can only be used when source operand variable evaluates on every loop 
>>>>> iteration, rather than guarded by some conditions.
>>>>>
>>>>> This together with patch improving loop niter analysis 
>>>>> (https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00736.html) can help 
>>>>> various loop passes like vectorization.
>>>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>>
>>>> @@ -3187,7 +3187,8 @@ idx_infer_loop_bounds (tree base, tree *idx, void 
>>>> *dta)
>>>>    /* If access is not executed on every iteration, we must ensure that 
>>>> overlow
>>>>       may not make the access valid later.  */
>>>>    if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb 
>>>> (data->stmt))
>>>> -      && scev_probably_wraps_p (initial_condition_in_loop_num (ev, 
>>>> loop->num),
>>>> +      && scev_probably_wraps_p (NULL,
>>>>
>>>> use NULL_TREE for the null pointer constant of tree.
>>>>
>>>> +  /* Check if VAR evaluates in every loop iteration.  */
>>>> +  gimple *def;
>>>> +  if ((def = SSA_NAME_DEF_STMT (var)) != NULL
>>>>
>>>> def is never NULL but it might be a GIMPLE_NOP which has a NULL gimple_bb.
>>>> Better check for ! SSA_DEFAULT_DEF_P (var)
>>>>
>>>> +  if (TREE_CODE (step) != INTEGER_CST || !INTEGRAL_TYPE_P (TREE_TYPE 
>>>> (var)))
>>>> +    return false;
>>>>
>>>> this looks like a cheaper test so please do that first.
>>>>
>>>> +  step_wi = step;
>>>> +  type = TREE_TYPE (var);
>>>> +  if (tree_int_cst_sign_bit (step))
>>>> +    {
>>>> +      diff = lower_bound_in_type (type, type);
>>>> +      diff = minv - diff;
>>>> +      step_wi = - step_wi;
>>>> +    }
>>>> +  else
>>>> +    {
>>>> +      diff = upper_bound_in_type (type, type);
>>>> +      diff = diff - maxv;
>>>> +    }
>>>>
>>>> this lacks a comment - it's not obvious to me what the gymnastics
>>>> with lower/upper_bound_in_type are supposed to achieve.
>>>
>>> Thanks for reviewing, I will prepare another version of patch.
>>>>
>>>> As VRP uses niter analysis itself I wonder how this fires back-to-back 
>>>> between
>>> I am not sure if I mis-understood the question.  If the VRP
>>> information comes from loop niter, I think it will not change loop
>>> niter or VRP2 in back because that's the best information we got in
>>> the first place in niter.  If the VRP information comes from other
>>> places (guard conditions?)  SCEV and loop niter after vrp1 might be
>>> improved and thus VRP2.  There should be no problems in either case,
>>> as long as GCC breaks the recursive chain among niter/scev/vrp
>>> correctly.
>>
>> Ok.
>>
>>>> VRP1 and VRP2?  If the def of var dominates the latch isn't it enough to do
>>>> a + 1 to check whether VRP bumped the range up to INT_MAX/MIN?  That is,
>>>> why do we need to add step if not for the TYPE_OVERFLOW_UNDEFINED case
>>>> of VRP handling the ranges optimistically?
>>> Again, please correct me if I mis-understood.  Considering a variable
>>> whose type is unsigned int and scev is {0, 4}_loop, the value range
>>> could be computed as [0, 0xfffffffc], thus MAX + 1 is smaller than
>>> type_MAX, but the scev could be overflow.
>>
>> Yes.  I was wondering about the case where VRP bumps the range to +INF
>> because it gave up during iteration or because overflow behavior is 
>> undefined.
>> Do I understand correctly that the code is mostly to improve the not
>> undefined-overflow case?
> Hi Richard,
>
> I think we resolved these on IRC, here are words for the record.
> The motivation case is for unsigned type loop counter, while the patch
> should work for signed type in theory.  Considering a loop has signed
> char counter i and it's used in array_ref[i + 10], since front-end
> converts signed char addition into unsigned operation, we may need the
> range information to prove (unsigned char)i + 10 doesn't overflow,
> thus address of array reference is a scev.  I am not sure if the
> signed case can be handled by current code, or there are other
> fallouts preventing this patch from working.
>
>>
>> Also I was wondering if the range DEF dominates the latch then why
>> do we necessarily need to add step to verify overflow?  Can't we do better
>> if we for example see that the DEF is the loop header PHI?
> I don't think we can, and there is nothing special for loop header PHI
> in this case, right?

Yes.

>
> Attachment is the updated patch with your comments resolved.
> Bootstrap and test on x86_64 and AArch64, is it OK?

Ok.

Thanks,
Richard.

> Thanks,
> bin

Reply via email to