On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford <richard.sandif...@linaro.org> wrote: > "Bin.Cheng" <amker.ch...@gmail.com> writes: >> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford >> <richard.sandif...@linaro.org> wrote: >>> The first loop in the testcase regressed after my recent changes to >>> dr_analyze_innermost. Previously we would treat "i" as an iv even >>> for bb analysis and end up with: >>> >>> DR_BASE_ADDRESS: p or q >>> DR_OFFSET: 0 >>> DR_INIT: 0 or 4 >>> DR_STEP: 16 >>> >>> We now always keep the step as 0 instead, so for an int "i" we'd have: >>> >>> DR_BASE_ADDRESS: p or q >>> DR_OFFSET: (intptr_t) i >>> DR_INIT: 0 or 4 >>> DR_STEP: 0 >>> >>> This is also what we'd like to have for the unsigned "i", but the >>> problem is that strip_constant_offset thinks that the "i + 1" in >>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". >>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA >>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no >>> longer seem to be related to the [i] ones. >> >> Didn't read the change in detail, so sorry if I mis-understood the issue. >> I made changes in scev to better fold type conversion by various sources >> of information, for example, vrp, niters, undefined overflow behavior etc. >> In theory these information should be available for other optimizers without >> querying scev. For the mentioned test, vrp should compute accurate range >> information for "i" so that we can fold (intptr_t) (i + 1) it without >> worrying >> overflow. Note we don't do it in generic folding because (intptr_t) (i) + 1 >> could be more expensive (especially in case of (T)(i + j)), or because the >> CST part is in bigger precision after conversion. >> But such folding is wanted in several places, e.g, IVOPTs. To provide such >> an interface, we changed tree-affine and made it do aggressive fold. I am >> curious if it's possible to use aff_tree to implement strip_constant_offset >> here since aggressive folding is wanted. After all, using additional chrec >> looks like a little heavy wrto the simple test. > > Yeah, using aff_tree does work here when the bounds are constant. > It doesn't look like it works for things like: > > double p[1000]; > double q[1000]; > > void > f4 (unsigned int n) > { > for (unsigned int i = 0; i < n; i += 4) > { > double a = q[i] + p[i]; > double b = q[i + 1] + p[i + 1]; > q[i] = a; > q[i + 1] = b; > } > } > > though, where the bounds on the global arrays guarantee that [i + 1] can't > overflow, even though "n" is unconstrained. The patch as posted handles > this case too. BTW is this a missed optimization in value range analysis? The range information for i should flow in a way like: array boundary -> niters -> scev/vrp. I think that's what niters/scev do in analysis.
Thanks, bin > > Thanks, > Richard > >> >> Thanks, >> bin >>> >>> There seem to be two main uses of DR_OFFSET and DR_INIT. One type >>> expects DR_OFFSET and DR_INIT to be generic expressions whose sum >>> gives the initial offset from DR_BASE_ADDRESS. The other type uses >>> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data >>> references, with the distance between their start addresses being >>> the difference between their DR_INITs. For this second type, the >>> exact form of DR_OFFSET doesn't really matter, and the more it is >>> able to canonicalise the non-constant offset, the better. >>> >>> The patch fixes the PR by adding another offset/init pair to the >>> innermost loop behaviour for this second group. The new pair use chrecs >>> rather than generic exprs for the offset part, with the chrec being >>> analysed in the innermost loop for which the offset isn't invariant. >>> This allows us to vectorise the second function in the testcase >>> as well, which wasn't possible before the earlier patch. >>> >>> Tested on x86_64-linux-gnu and aarch64-linux-gnu. OK to install? >>> >>> Richard >>> >>> >>> gcc/ >>> PR tree-optimization/81635 >>> * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and >>> chrec_init. >>> (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros. >>> (dr_equal_offsets_p): Delete. >>> (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare. >>> * tree-data-ref.c: Include tree-ssa-loop-ivopts.h. >>> (split_constant_offset): Handle POLYNOMIAL_CHREC. >>> (dr_analyze_innermost): Initialize dr_chrec_offset and >>> dr_chrec_init. >>> (operator ==): Use dr_chrec_offsets_equal_p and compare the >>> DR_CHREC_INITs. >>> (prune_runtime_alias_test_list): Likewise. >>> (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and >>> compare >>> the DR_CHREC_INITs. >>> (dr_equal_offsets_p1, dr_equal_offsets_p): Delete. >>> (analyze_offset_scev): New function. >>> (compute_offset_chrecs): Likewise. >>> (dr_chrec_offsets_equal_p): Likewise. >>> (dr_chrec_offsets_compare): Likewise. >>> * tree-loop-distribution.c (compute_alias_check_pairs): Use >>> dr_chrec_offsets_compare. >>> * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use >>> dr_chrec_offsets_compare and compare the DR_CHREC_INITs. >>> (dr_group_sort_cmp): Likewise. >>> (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT. >>> (vect_no_alias_p): Likewise. >>> (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and >>> compare the DR_CHREC_INITs. >>> (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare. >>> * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead >>> of DR_INIT. >>> >>> gcc/testsuite/ >>> PR tree-optimization/81635 >>> * gcc.dg/vect/bb-slp-pr81635.c: New test. >>>