"Bin.Cheng" <amker.ch...@gmail.com> writes: > On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford > <richard.sandif...@linaro.org> wrote: >> The first loop in the testcase regressed after my recent changes to >> dr_analyze_innermost. Previously we would treat "i" as an iv even >> for bb analysis and end up with: >> >> DR_BASE_ADDRESS: p or q >> DR_OFFSET: 0 >> DR_INIT: 0 or 4 >> DR_STEP: 16 >> >> We now always keep the step as 0 instead, so for an int "i" we'd have: >> >> DR_BASE_ADDRESS: p or q >> DR_OFFSET: (intptr_t) i >> DR_INIT: 0 or 4 >> DR_STEP: 0 >> >> This is also what we'd like to have for the unsigned "i", but the >> problem is that strip_constant_offset thinks that the "i + 1" in >> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". >> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA >> name that holds "(intptr_t) (i + 1)", meaning that the accesses no >> longer seem to be related to the [i] ones. > > Didn't read the change in detail, so sorry if I mis-understood the issue. > I made changes in scev to better fold type conversion by various sources > of information, for example, vrp, niters, undefined overflow behavior etc. > In theory these information should be available for other optimizers without > querying scev. For the mentioned test, vrp should compute accurate range > information for "i" so that we can fold (intptr_t) (i + 1) it without worrying > overflow. Note we don't do it in generic folding because (intptr_t) (i) + 1 > could be more expensive (especially in case of (T)(i + j)), or because the > CST part is in bigger precision after conversion. > But such folding is wanted in several places, e.g, IVOPTs. To provide such > an interface, we changed tree-affine and made it do aggressive fold. I am > curious if it's possible to use aff_tree to implement strip_constant_offset > here since aggressive folding is wanted. After all, using additional chrec > looks like a little heavy wrto the simple test.
Yeah, using aff_tree does work here when the bounds are constant. It doesn't look like it works for things like: double p[1000]; double q[1000]; void f4 (unsigned int n) { for (unsigned int i = 0; i < n; i += 4) { double a = q[i] + p[i]; double b = q[i + 1] + p[i + 1]; q[i] = a; q[i + 1] = b; } } though, where the bounds on the global arrays guarantee that [i + 1] can't overflow, even though "n" is unconstrained. The patch as posted handles this case too. Thanks, Richard > > Thanks, > bin >> >> There seem to be two main uses of DR_OFFSET and DR_INIT. One type >> expects DR_OFFSET and DR_INIT to be generic expressions whose sum >> gives the initial offset from DR_BASE_ADDRESS. The other type uses >> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data >> references, with the distance between their start addresses being >> the difference between their DR_INITs. For this second type, the >> exact form of DR_OFFSET doesn't really matter, and the more it is >> able to canonicalise the non-constant offset, the better. >> >> The patch fixes the PR by adding another offset/init pair to the >> innermost loop behaviour for this second group. The new pair use chrecs >> rather than generic exprs for the offset part, with the chrec being >> analysed in the innermost loop for which the offset isn't invariant. >> This allows us to vectorise the second function in the testcase >> as well, which wasn't possible before the earlier patch. >> >> Tested on x86_64-linux-gnu and aarch64-linux-gnu. OK to install? >> >> Richard >> >> >> gcc/ >> PR tree-optimization/81635 >> * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and >> chrec_init. >> (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros. >> (dr_equal_offsets_p): Delete. >> (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare. >> * tree-data-ref.c: Include tree-ssa-loop-ivopts.h. >> (split_constant_offset): Handle POLYNOMIAL_CHREC. >> (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init. >> (operator ==): Use dr_chrec_offsets_equal_p and compare the >> DR_CHREC_INITs. >> (prune_runtime_alias_test_list): Likewise. >> (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare >> the DR_CHREC_INITs. >> (dr_equal_offsets_p1, dr_equal_offsets_p): Delete. >> (analyze_offset_scev): New function. >> (compute_offset_chrecs): Likewise. >> (dr_chrec_offsets_equal_p): Likewise. >> (dr_chrec_offsets_compare): Likewise. >> * tree-loop-distribution.c (compute_alias_check_pairs): Use >> dr_chrec_offsets_compare. >> * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use >> dr_chrec_offsets_compare and compare the DR_CHREC_INITs. >> (dr_group_sort_cmp): Likewise. >> (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT. >> (vect_no_alias_p): Likewise. >> (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and >> compare the DR_CHREC_INITs. >> (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare. >> * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead >> of DR_INIT. >> >> gcc/testsuite/ >> PR tree-optimization/81635 >> * gcc.dg/vect/bb-slp-pr81635.c: New test. >>