On Wed, Jul 20, 2016 at 6:23 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Wed, Jul 20, 2016 at 11:01 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Tue, Jul 19, 2016 at 6:15 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> On Tue, Jul 19, 2016 at 1:10 PM, Richard Biener >>> <richard.guent...@gmail.com> wrote: >>>> On Mon, Jul 18, 2016 at 6:27 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>>> Hi, >>>>> Scalar evolution needs to prove no-overflow for source variable when >>>>> handling type conversion. This is important because otherwise we would >>>>> fail to recognize result of the conversion as SCEV, resulting in missing >>>>> loop optimizations. Take case added by this patch as an example, the >>>>> loop can't be distributed as memset call because address of memory >>>>> reference is not recognized. At the moment, we rely on type overflow >>>>> semantics and loop niter info for no-overflow checking, unfortunately >>>>> that's not enough. This patch introduces new method checking no-overflow >>>>> using value range information. As commented in the patch, value range >>>>> can only be used when source operand variable evaluates on every loop >>>>> iteration, rather than guarded by some conditions. >>>>> >>>>> This together with patch improving loop niter analysis >>>>> (https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00736.html) can help >>>>> various loop passes like vectorization. >>>>> Bootstrap and test on x86_64 and AArch64. Is it OK? >>>> >>>> @@ -3187,7 +3187,8 @@ idx_infer_loop_bounds (tree base, tree *idx, void >>>> *dta) >>>> /* If access is not executed on every iteration, we must ensure that >>>> overlow >>>> may not make the access valid later. */ >>>> if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb >>>> (data->stmt)) >>>> - && scev_probably_wraps_p (initial_condition_in_loop_num (ev, >>>> loop->num), >>>> + && scev_probably_wraps_p (NULL, >>>> >>>> use NULL_TREE for the null pointer constant of tree. >>>> >>>> + /* Check if VAR evaluates in every loop iteration. */ >>>> + gimple *def; >>>> + if ((def = SSA_NAME_DEF_STMT (var)) != NULL >>>> >>>> def is never NULL but it might be a GIMPLE_NOP which has a NULL gimple_bb. >>>> Better check for ! SSA_DEFAULT_DEF_P (var) >>>> >>>> + if (TREE_CODE (step) != INTEGER_CST || !INTEGRAL_TYPE_P (TREE_TYPE >>>> (var))) >>>> + return false; >>>> >>>> this looks like a cheaper test so please do that first. >>>> >>>> + step_wi = step; >>>> + type = TREE_TYPE (var); >>>> + if (tree_int_cst_sign_bit (step)) >>>> + { >>>> + diff = lower_bound_in_type (type, type); >>>> + diff = minv - diff; >>>> + step_wi = - step_wi; >>>> + } >>>> + else >>>> + { >>>> + diff = upper_bound_in_type (type, type); >>>> + diff = diff - maxv; >>>> + } >>>> >>>> this lacks a comment - it's not obvious to me what the gymnastics >>>> with lower/upper_bound_in_type are supposed to achieve. >>> >>> Thanks for reviewing, I will prepare another version of patch. >>>> >>>> As VRP uses niter analysis itself I wonder how this fires back-to-back >>>> between >>> I am not sure if I mis-understood the question. If the VRP >>> information comes from loop niter, I think it will not change loop >>> niter or VRP2 in back because that's the best information we got in >>> the first place in niter. If the VRP information comes from other >>> places (guard conditions?) SCEV and loop niter after vrp1 might be >>> improved and thus VRP2. There should be no problems in either case, >>> as long as GCC breaks the recursive chain among niter/scev/vrp >>> correctly. >> >> Ok. >> >>>> VRP1 and VRP2? If the def of var dominates the latch isn't it enough to do >>>> a + 1 to check whether VRP bumped the range up to INT_MAX/MIN? That is, >>>> why do we need to add step if not for the TYPE_OVERFLOW_UNDEFINED case >>>> of VRP handling the ranges optimistically? >>> Again, please correct me if I mis-understood. Considering a variable >>> whose type is unsigned int and scev is {0, 4}_loop, the value range >>> could be computed as [0, 0xfffffffc], thus MAX + 1 is smaller than >>> type_MAX, but the scev could be overflow. >> >> Yes. I was wondering about the case where VRP bumps the range to +INF >> because it gave up during iteration or because overflow behavior is >> undefined. >> Do I understand correctly that the code is mostly to improve the not >> undefined-overflow case? > Hi Richard, > > I think we resolved these on IRC, here are words for the record. > The motivation case is for unsigned type loop counter, while the patch > should work for signed type in theory. Considering a loop has signed > char counter i and it's used in array_ref[i + 10], since front-end > converts signed char addition into unsigned operation, we may need the > range information to prove (unsigned char)i + 10 doesn't overflow, > thus address of array reference is a scev. I am not sure if the > signed case can be handled by current code, or there are other > fallouts preventing this patch from working. > >> >> Also I was wondering if the range DEF dominates the latch then why >> do we necessarily need to add step to verify overflow? Can't we do better >> if we for example see that the DEF is the loop header PHI? > I don't think we can, and there is nothing special for loop header PHI > in this case, right?
Yes. > > Attachment is the updated patch with your comments resolved. > Bootstrap and test on x86_64 and AArch64, is it OK? Ok. Thanks, Richard. > Thanks, > bin