Hi, As analyzed in PR69564, inefficient code for runtime alias check is generated in benchmark scimark2. It is suspected vectorized loop doesn't run enough iterations to cover the loss of the inefficient code. This patch improves runtime alias check for (unsigned) wrapping types.
Originally, vectorizer checks if below two ranges overlap with each other range a: [min_a, max_a] ; abs(length_a) = max_a - min_a range_b: [min_b, max_b] ; abs(length_b) = max_b - min_b with condition like: max_a <= min_b || max_b <= min_a With knowledge of length of the two ranges, this patch checks overlap with condition like: (min_b - min_a) >= abs(len_a) && (min_b - min_a) <= (- abs(len_b)) It's better because common sub expressions in b/a can be folded. Note there is an implicit condition for above statements that length of range needs to be no larger than middle value of the unsigned type. This is always true since object never spans over half address space in GCC. As mentioned, this is only done for case with wrap type and also constant segment length for both a and b, which is the most common case. It is possible to improve signed cases or compilation time unknown segment length cases too, but may not worth the effort. Unfortunately, test case is hard to create for such code optimization. Bootstrap and test on x86)64 and AArch64. Is it OK? Thanks, bin 2017-02-23 Bin Cheng <bin.ch...@arm.com> PR tree-optimization/69564 * tree-vect-loop-manip.c (tree-affine.h): Include. (create_intersect_range_checks): For types with wrap behavior, check range overlap for [min_a, max_a]/[min_b, max_b] with condition like: "(min_b - min_a) >= abs(len_a) && (min_b - min_a) <= (-abs(len_b))".
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 5ee2c38..c787bbb 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-into-ssa.h" #include "tree-ssa.h" #include "cfgloop.h" +#include "tree-affine.h" #include "tree-scalar-evolution.h" #include "tree-vectorizer.h" #include "tree-ssa-loop-ivopts.h" @@ -2181,43 +2182,137 @@ create_intersect_range_checks (loop_vec_info loop_vinfo, tree *cond_expr, if (create_intersect_range_checks_index (loop_vinfo, cond_expr, dr_a, dr_b)) return; - tree segment_length_a = dr_a.seg_len; - tree segment_length_b = dr_b.seg_len; + tree seg_len_a = dr_a.seg_len; + tree seg_len_b = dr_b.seg_len; tree addr_base_a = DR_BASE_ADDRESS (dr_a.dr); tree addr_base_b = DR_BASE_ADDRESS (dr_b.dr); tree offset_a = DR_OFFSET (dr_a.dr), offset_b = DR_OFFSET (dr_b.dr); + tree init_a = DR_INIT (dr_a.dr), init_b = DR_INIT (dr_b.dr); + + tree type = TREE_TYPE (addr_base_a); + if (POINTER_TYPE_P (type)) + type = sizetype; + + seg_len_a = fold_convert (type, seg_len_a); + seg_len_b = fold_convert (type, seg_len_b); + offset_a = fold_convert (type, offset_a); + offset_b = fold_convert (type, offset_b); + init_a = fold_convert (type, init_a); + init_b = fold_convert (type, init_b); + /* Check if offset and init can be cancelled. */ + if (operand_equal_p (offset_a, offset_b, 0)) + { + offset_a = fold_convert (type, integer_zero_node); + offset_b = fold_convert (type, integer_zero_node); + } + if (operand_equal_p (init_a, init_b, 0)) + { + init_a = fold_convert (type, integer_zero_node); + init_b = fold_convert (type, integer_zero_node); + } + + offset_a = fold_build2 (PLUS_EXPR, type, offset_a, init_a); + offset_b = fold_build2 (PLUS_EXPR, type, offset_b, init_b); - offset_a = fold_build2 (PLUS_EXPR, TREE_TYPE (offset_a), - offset_a, DR_INIT (dr_a.dr)); - offset_b = fold_build2 (PLUS_EXPR, TREE_TYPE (offset_b), - offset_b, DR_INIT (dr_b.dr)); - addr_base_a = fold_build_pointer_plus (addr_base_a, offset_a); - addr_base_b = fold_build_pointer_plus (addr_base_b, offset_b); + /* Further decompose pointer plus expression. */ + if (TREE_CODE (addr_base_a) == POINTER_PLUS_EXPR) + { + tree offset = fold_convert (type, TREE_OPERAND (addr_base_a, 1)); + offset_a = fold_build2 (PLUS_EXPR, type, offset_a, offset); + addr_base_a = TREE_OPERAND (addr_base_a, 0); + } + if (TREE_CODE (addr_base_b) == POINTER_PLUS_EXPR) + { + tree offset = fold_convert (type, TREE_OPERAND (addr_base_b, 1)); + offset_b = fold_build2 (PLUS_EXPR, type, offset_b, offset); + addr_base_b = TREE_OPERAND (addr_base_b, 0); + } + addr_base_a = fold_convert (type, addr_base_a); + addr_base_a = fold_build2 (PLUS_EXPR, type, addr_base_a, offset_a); + addr_base_b = fold_convert (type, addr_base_b); + addr_base_b = fold_build2 (PLUS_EXPR, type, addr_base_b, offset_b); - tree seg_a_min = addr_base_a; - tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); + tree min_a = addr_base_a; + tree max_a = fold_build2 (PLUS_EXPR, type, min_a, seg_len_a); /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT bytes, e.g., int a[3] -> a[1] range is [a+4, a+16) instead of [a, a+12) */ if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0) { tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a.dr))); - seg_a_min = fold_build_pointer_plus (seg_a_max, unit_size); - seg_a_max = fold_build_pointer_plus (addr_base_a, unit_size); + unit_size = fold_convert (type, unit_size); + min_a = fold_build2 (PLUS_EXPR, type, max_a, unit_size); + max_a = fold_build2 (PLUS_EXPR, type, addr_base_a, unit_size); } - tree seg_b_min = addr_base_b; - tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); + tree min_b = addr_base_b; + tree max_b = fold_build2 (PLUS_EXPR, type, min_b, seg_len_b); if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0) { tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b.dr))); - seg_b_min = fold_build_pointer_plus (seg_b_max, unit_size); - seg_b_max = fold_build_pointer_plus (addr_base_b, unit_size); + unit_size = fold_convert (type, unit_size); + min_b = fold_build2 (PLUS_EXPR, type, max_b, unit_size); + max_b = fold_build2 (PLUS_EXPR, type, addr_base_b, unit_size); } + + if (!TYPE_UNSIGNED (type) + || !TYPE_OVERFLOW_WRAPS (type) + || TREE_CODE (seg_len_a) != INTEGER_CST + || TREE_CODE (seg_len_b) != INTEGER_CST) + { + *cond_expr + = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, + fold_build2 (LE_EXPR, boolean_type_node, max_a, min_b), + fold_build2 (LE_EXPR, boolean_type_node, max_b, min_a)); + return; + } + + /* Ranges [min_a, max_a] and [min_b, max_b] don't overlap if below + condition are satisfied: + + max_a <= min_b || max_b <= min_a + Or + (min_a + abs(len_a)) <= min_b || (min_b + abs(len_b)) <= min_a + + For type which wraps on overflow, it equals to: + + (min_b - min_a) >= abs(len_a) && (min_b - min_a) <= (- abs(len_b)) + + Proof: + 1) If (min_b >= max_a) + 1.1) (min_b - min_a) >= abs(len_a) is true. + 1.2) (min_b - min_a) <= (- abs(len_b)) must be true when 1.1) is + satisfied because (- abs(len_b)) wraps to a large number. + + 2) If (min_a >= max_b) + 2.1) (min_b - min_a) >= abs(len) must be true because the left + part wraps to a large number. + 2.2) (min_b - min_a) <= (- abs(len_b)) equals to condition + (min_a - min_b) >= abs(len_b) because both arms do wrap, + as a result, it is true. + + 3) For all overlap cases, the condition computes false. */ + + /* Compute abs(seg_len_a). */ + if (tree_int_cst_compare (seg_len_a, size_zero_node) < 0) + seg_len_a = fold_build1 (NEGATE_EXPR, type, seg_len_a); + /* Compute (0 - abs(seg_len_b)). */ + if (tree_int_cst_compare (seg_len_b, size_zero_node) > 0) + seg_len_b = fold_build1 (NEGATE_EXPR, type, seg_len_b); + + /* Employ affine tree for better fold. */ + aff_tree aff_a, aff_b; + tree_to_aff_combination (min_a, type, &aff_a); + tree_to_aff_combination (min_b, type, &aff_b); + aff_combination_scale (&aff_a, -1); + aff_combination_add (&aff_b, &aff_a); + + /* Compute (min_b - min_a). */ + tree b_minus_a = fold_convert (type, aff_combination_to_tree (&aff_b)); *cond_expr - = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, - fold_build2 (LE_EXPR, boolean_type_node, seg_a_max, seg_b_min), - fold_build2 (LE_EXPR, boolean_type_node, seg_b_max, seg_a_min)); + = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, + fold_build2 (GE_EXPR, boolean_type_node, b_minus_a, seg_len_a), + fold_build2 (LE_EXPR, boolean_type_node, b_minus_a, seg_len_b)); } /* Function vect_create_cond_for_alias_checks.