On Fri, Feb 24, 2017 at 11:53 AM, Bin Cheng <bin.ch...@arm.com> wrote: > Hi, > As analyzed in PR69564, inefficient code for runtime alias check is generated > in benchmark > scimark2. It is suspected vectorized loop doesn't run enough iterations to > cover the loss > of the inefficient code. This patch improves runtime alias check for > (unsigned) wrapping > types. > > Originally, vectorizer checks if below two ranges overlap with each other > range a: [min_a, max_a] ; abs(length_a) = max_a - min_a > range_b: [min_b, max_b] ; abs(length_b) = max_b - min_b > with condition like: > max_a <= min_b || max_b <= min_a > > With knowledge of length of the two ranges, this patch checks overlap with > condition like: > (min_b - min_a) >= abs(len_a) && (min_b - min_a) <= (- abs(len_b)) > It's better because common sub expressions in b/a can be folded. > > Note there is an implicit condition for above statements that length of range > needs to be > no larger than middle value of the unsigned type. This is always true since > object never > spans over half address space in GCC. > > As mentioned, this is only done for case with wrap type and also constant > segment length > for both a and b, which is the most common case. It is possible to improve > signed cases > or compilation time unknown segment length cases too, but may not worth the > effort. > > Unfortunately, test case is hard to create for such code optimization. > Bootstrap and test on x86)64 and AArch64. Is it OK?
The patch mixes the above together with better folding (cancelling equal offset, decomposing pointer-plus). Please separate those. Note that seg_len is always positive and the abs(len_N) in the comments doesn't make much sense to me. Meanwhile the simplification (ignoring overflow issues) can go, assuming equal lenghts mina + len <= minb || minb + len <= mina -> mina - minb <= -len || minb - mina <= -len -> minb - mina >= len || mina - minb >= len and as we know len >= 0 this is the same as abs (minb - mina) >= len and thus even simpler than your variant. But maybe I'm missing something? As for the overflow issues, you do all computation in sizetype and thus guarantee unsigned compares, but I think to do these kind of "simplifications" you have to use signed quantities which opens up the possibility of objects crossing the signed-address-space wrapping point and the simplification to break apart. That is, your + tree type = TREE_TYPE (addr_base_a); + if (POINTER_TYPE_P (type)) + type = sizetype; ... + if (!TYPE_UNSIGNED (type) + || !TYPE_OVERFLOW_WRAPS (type) is useless (always false, you always get type == sizetype). Btw, the same overflow issues plague "removal" of common DR_OFFSET and friends iff the original condition, max_a <= min_b || max_b >= min_a is correct - we are using relational compares on pointers to possibly different objects here... (and all pointers are TYPE_UNSIGNED) - and you later interpret those as signed comparisons to be able to shift offsets from one side to the other. So for the "real" check you have to use abs ((ssizetype)(minb - mina)) >= len Thanks, Richard. > Thanks, > bin > > 2017-02-23 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/69564 > * tree-vect-loop-manip.c (tree-affine.h): Include. > (create_intersect_range_checks): For types with wrap behavior, check > range overlap for [min_a, max_a]/[min_b, max_b] with condition like: > "(min_b - min_a) >= abs(len_a) && (min_b - min_a) <= (-abs(len_b))". >