On Fri, Sep 30, 2011 at 1:11 PM, Jakub Jelinek <ja...@redhat.com> wrote: > On Fri, Sep 30, 2011 at 12:33:07PM +0200, Richard Guenther wrote: >> > + low = build_int_cst (TREE_TYPE (exp), 0); >> > + high = low; >> > + in_p = 0; >> > + strict_overflow_p = false; >> > + is_bool = TREE_CODE (TREE_TYPE (exp)) == BOOLEAN_TYPE; >> >> Effective boolean are also TYPE_PRECISION () == 1 types. Remember >> we don't preserve conversions from BOOLEAN_TYPE to such types. > > I can replace these with TYPE_PRECISION (TREE_TYPE (exp)) == 1; > checks if you prefer, though maybe it would need to also do > && TYPE_UNSIGNED (TREE_TYPE (exp)), at least if different operands of > the | have different inner 1-bit signedness we could not merge them.
The canonical test for boolean-kind types is now TREE_CODE (TREE_TYPE (exp)) == BOOLEAN_TYPE || TYPE_PRECISION (TREE_TYPE (exp)) == 1 Ada for example has non-1-precision BOOLEAN_TYPEs. But, see very below. >> > + CASE_CONVERT: >> > + is_bool |= TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE; >> >> Likewise. Though I wonder why !=? Does it matter whether the extension >> sign- or zero-extends? > > I think the |= is needed, this loop follows through not just casts, but > then also through the comparisons and again through casts. > It doesn't matter if the types or casts inside of the comparison argument > are bool or not (and likely they will not be), yet we don't want to stop > iterating because of that. > It wants to reconstruct ranges from what e.g. fold in fold_range_test > already created before, so stuff like > (int) (((unsigned) x - 64U) <= 31U) > for + [64, 95] range. > >> > + if (integer_onep (range_binop (LT_EXPR, >> > integer_type_node, >> > + p->low, 0, q->low, 0))) >> >> that's just >> >> tem = fold_binary (LT_EXPR, boolean_type_node, p->low, q->low); >> if (tem && integer_onep (tem)) >> return -1; > > Ok, can change that. > >> which avoids building the LT_EXPR tree if it doesn't fold. Similar below. >> (ISTR some integer_onep () variant that handles a NULL argument ...) > > Couldn't find any. integer_onep needs non-NULL, compare_tree_int too. > >> > + /* Try to merge ranges. */ >> > + for (first = i; i < length; i++) >> > + { >> > + tree low = ranges[i].low; >> > + tree high = ranges[i].high; >> > + int in_p = ranges[i].in_p; >> > + bool strict_overflow_p = ranges[i].strict_overflow_p; >> > + >> > + for (j = i + 1; j < length; j++) >> > + { >> >> That looks quadratic - do we want to limit this with a --param, simply >> partitioning the array into quadratic chunks? > > This isn't quadratic (except in the hypothetical case > where all merge_ranges calls would succeed, but then > build_range_test would fail). In the likely case where if range merges > succeed then update_range_test succeeds too this is just linear (plus the > qsort before that which isn't linear though). > So perhaps: > + if (j > i + 1 > + && update_range_test (ranges + i, ranges + i + 1, j - i - 1, opcode, > + ops, ranges[i].exp, in_p, low, high, > + strict_overflow_p)) > + { > + i = j - 1; > + any_changes = true; > + } > could be > + if (j > i + 1) > + { > + if (update_range_test (ranges + i, ranges + i + 1, j - i - 1, > opcode, > + ops, ranges[i].exp, in_p, low, high, > + strict_overflow_p)) > + any_changes = true; > + i = j - 1; > + } > (then it isn't quadratic), or could try a few times before giving up: > + if (j > i + 1) > + { > + if (update_range_test (ranges + i, ranges + i + 1, j - i - 1, > opcode, > + ops, ranges[i].exp, in_p, low, high, > + strict_overflow_p)) > + { > + any_changes = true; > + i = j - 1; > + } > + else if (update_fail_count == 64) > + i = j - 1; > + else > + update_fail_count = 0; > + } > where int update_fail_count = 0; would be after the > bool strict_overflow_p = ranges[i].strict_overflow_p; > line in the outer loop. > >> > + if (ranges[i].exp != ranges[j].exp) >> > + break; >> >> Or isn't it too bad because of this check? > > The above limits the chunks to a particular SSA_NAME. Within each > chunk for the same SSA_NAME, the ranges are sorted in a way that > merge_ranges will likely succeed, so it isn't quadratic unless > update_range_test fails (see above). > > BTW, the second loop (the one that attempts to optimize > x == 1 || x == 3 into (x & ~2) == 1 etc. is quadratic, which is why > there is > for (j = i + 1; j < length && j < i + 64; j++) > don't think it is a limit people will often run into and thus I don't > think it is worth adding a --param= for that. Ok. >> > + if (!merge_ranges (&in_p, &low, &high, in_p, low, high, >> > + ranges[j].in_p, ranges[j].low, >> > ranges[j].high)) >> > + break; >> >> And this early out? I suppose some comment on why together with >> the sorting this is of limited complexity would help. >> >> > @@ -2447,6 +2864,9 @@ reassociate_bb (basic_block bb) >> > optimize_ops_list (rhs_code, &ops); >> > } >> > >> > + if (rhs_code == BIT_IOR_EXPR || rhs_code == BIT_AND_EXPR) >> >> I think we want to check that the type is boolean here, so we don't setup >> the machinery needlessly? > > It is boolean only in some testcases, the is_bool stuff discussed at the > beginning above was originally just an early return > if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE) > return; > before the loop, but it turned out that often the type of the | operands > is integer, with either bool casted to integer, or with the type of EQ_EXPR > etc. being integer instead of bool. Really? The type of EQ_EXPR should be always either BOOLEAN_TYPE or INTEGRAL_TYPE_P with TYPE_PRECISION == 1. That's what the gimple verifier checks. Or do you mean that fold introduces these kind of types during range-test simplification? Richard. > > Jakub >