Richard Biener <richard.guent...@gmail.com> writes: > On Mon, Jul 30, 2018 at 7:47 PM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> [Sorry, somehow missed this till now] >> >> Richard Biener <richard.guent...@gmail.com> writes: >> > On Mon, Jul 23, 2018 at 5:05 PM Richard Sandiford >> > <richard.sandif...@arm.com> wrote: >> >> >> >> Marc Glisse <marc.gli...@inria.fr> writes: >> >> > On Fri, 20 Jul 2018, Richard Sandiford wrote: >> >> > >> >> >> --- gcc/match.pd 2018-07-18 18:44:22.565914281 +0100 >> >> >> +++ gcc/match.pd 2018-07-20 11:24:33.692045585 +0100 >> >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >> >> >> (if (inverse_conditions_p (@0, @2) >> >> >> && element_precision (type) == element_precision (op_type)) >> >> >> (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) >> >> >> + >> >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for >> >> >> + expressions like: >> >> >> + >> >> >> + A: (@0 + @1 < @2) | (@2 + @1 < @0) >> >> >> + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) >> >> >> + >> >> >> + If pointers are known not to wrap, B checks whether @1 bytes >> >> >> starting at >> >> >> + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 >> >> >> bytes. >> >> >> + A is more efficiently tested as: >> >> >> + >> >> >> + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) >> >> >> + >> >> >> + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced >> >> >> + with @1 - 1. */ >> >> >> +(for ior (truth_orif truth_or bit_ior) >> >> >> + (for cmp (le lt) >> >> >> + (simplify >> >> >> + (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2) >> >> >> + (cmp (pointer_plus:s @2 @1) @0)) >> >> > >> >> > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may >> >> > need some >> >> > care with "cmp == LE_EXPR" below) >> >> > Do you want :s on cmp as well? >> >> > >> >> >> + (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))) >> >> > >> >> > Don't you want TYPE_OVERFLOW_UNDEFINED? >> >> >> >> Thanks, fixed below. Think the cmp == LE_EXPR stuff is still ok with :c, >> >> since the generated code sets cmp to LE_EXPR when matching GE_EXPR. >> >> >> >> Tested as before. OK to install? >> >> >> >> Richard >> >> >> >> >> >> 2018-07-23 Richard Sandiford <richard.sandif...@arm.com> >> >> >> >> gcc/ >> >> * match.pd: Optimise pointer range checks. >> >> >> >> gcc/testsuite/ >> >> * gcc.dg/pointer-range-check-1.c: New test. >> >> * gcc.dg/pointer-range-check-2.c: Likewise. >> >> >> >> Index: gcc/match.pd >> >> =================================================================== >> >> --- gcc/match.pd 2018-07-23 15:56:47.000000000 +0100 >> >> +++ gcc/match.pd 2018-07-23 15:58:33.480269844 +0100 >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >> >> (if (inverse_conditions_p (@0, @2) >> >> && element_precision (type) == element_precision (op_type)) >> >> (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) >> >> + >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for >> >> + expressions like: >> >> + >> >> + A: (@0 + @1 < @2) | (@2 + @1 < @0) >> >> + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) >> >> + >> >> + If pointers are known not to wrap, B checks whether @1 bytes starting >> >> at >> >> + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 >> >> bytes. >> >> + A is more efficiently tested as: >> >> + >> >> + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) >> >> + >> >> + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced >> >> + with @1 - 1. */ >> >> +(for ior (truth_orif truth_or bit_ior) >> >> + (for cmp (le lt) >> >> + (simplify >> >> + (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2) >> >> + (cmp:cs (pointer_plus:s @2 @1) @0)) >> >> + (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))) >> > >> > no need to check flag_trapv, pointer arithmetic is not covered by -ftrapv. >> >> This was there because we're converting pointer arithmetic into integer >> arithmetic, and -ftrapv could cause that new integer code to trap. >> TYPE_OVERFLOW_UNDEFINED says that pointer overflow is undefined, >> but it seemed bad form to make it actually trap, especially when the >> above does some reassociation too. > > Ok, but then maybe check TYPE_OVERFLOW_WRAPS () on the type > you are doing the arithmetic on (sizetype) instead?
OK, done below. >> >> + /* Convert the B form to the A form. */ >> >> + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : >> >> 0); } >> >> + /* Always fails for negative values. */ >> >> + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION >> >> (sizetype)) >> >> + /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let >> >> + tree_swap_operands_p pick a canonical order. */ >> > >> > gimple_resimplify takes care of that - well, it doesn't since minus isn't >> > commutative... I guess you get better CSE later when doing this thus ok, >> > but it does look a bit off here ;) >> > >> > I think you shouldn't use 'sizetype' without guarding this whole thing >> > with TYPE_PRECISION (sizetype) == TYPE_PRECISION (TREE_TYPE (@0)). >> >> OK, hadn't realised that could be false. Would building the appropriate >> unsigned type be OK without the condition, or does it need to be sizetype? > > That would probably work but it might be dead slow on those targets that > choose > to have a lower-precision sizetype (ISTR some have 24bit pointers but 16bit > sizetype because there's no arithmetic for 24bit but in offsetted addressing). > So the transform is likely not a win for those. Ah, yeah. The version below adds the == test. >> > Since the original IL performs an ordered compare of two eventually >> > unrelated >> > pointers (or pointers adjusted in a way that crosses object >> > boundaries) (uh... ;)) >> > I wonder if you can use POINTER_DIFF_EXPR here to avoid the sizetype >> > conversion? Since POINTER_PLUS_EXPR offsets are supposed to be >> > interpreted "signed" that should also handle negative offsets >> > correctly then? >> >> The transform isn't valid when @1 is negative though, so I think we >> need that check either way. > > + /* Always fails for negative values. */ > + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype)) > > shouldn't that be wi::min_precision (off * 2, UNSIGNED) <= > TYPE_PRECISION (sizetype)? Gah, yes :-( >> I could use POINTER_DIFF_EXPR and convert to unsigned though, if that >> seems better. That might also allow us to CSE the retained >> POINTER_PLUS_EXPRs. > > Yeah, that sounds better then. And the CSEing did happen, so we now get the "ideal" sequence for the original alias checks: add x0, x0, 15 sub x6, x0, x1 sub x5, x0, x2 cmp x6, 30 ccmp x5, 30, 0, hi whereas the original patch had an extra add. >> > Also, when @2 == @0 + (@1+1) then the original condition is true but >> > ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not? >> > (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2 >> > -> -1 > @1 * 2 >> > >> > which is false. So I can't really see how you can apply this transform in >> > general (it would be fine for generating alias checks but a bit more >> > pessimistic). >> > >> > But maybe I am missing something? >> >> It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true. > > Hmm, so mathematically this is > > (@0 - @2) % modreduce + @1 > @1 * 2 Probably not % if you operating over Z, since if @0 - @2 = -@1 * 2 (and -@1 * 2 % modreduce == -@1 * 2), the condition above will be false while we want it to be true. But... > then, but I don't want to think too much about this since Marc didn't > object here ;) ...we start out converting the range check for A to: !(-@1 <= @0 - @2 <= @1) i.e. @1 + 1 bytes at @0 and @2 are free from overlap. Then we just apply the usual trick: A <= X <= B -> (unsigned) (X - A) <= (unsigned) (B - A) when A <= B Substituting @0 - @2 for X, -@1 for A and @1 for B gives: !((unsigned) (@0 - @2 + @1) <= (unsigned) (@1 * 2)) -> (unsigned) (@0 - @2 + @1) > (unsigned) (@1 * 2) Then the B case is just an off-by-one variant. Patch tested as before. Thanks, Richard 2018-08-01 Richard Sandiford <richard.sandif...@arm.com> gcc/ * match.pd: Optimise pointer range checks. gcc/testsuite/ * gcc.dg/pointer-range-check-1.c: New test. * gcc.dg/pointer-range-check-2.c: Likewise. Index: gcc/match.pd =================================================================== --- gcc/match.pd 2018-07-24 19:04:14.985363724 +0100 +++ gcc/match.pd 2018-08-01 10:55:10.695766411 +0100 @@ -4937,3 +4937,63 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (inverse_conditions_p (@0, @2) && element_precision (type) == element_precision (op_type)) (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) + +/* For pointers @0 and @2 and nonnegative constant offset @1, look for + expressions like: + + A: (@0 + @1 < @2) | (@2 + @1 < @0) + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) + + If pointers are known not to wrap, B checks whether @1 bytes starting + at @0 and @2 do not overlap, while A tests the same thing for @1 + 1 + bytes. A is more efficiently tested as: + + A: (sizetype) (@0 + @1 - @2) > @1 * 2 + + The equivalent expression for B is given by replacing @1 with @1 - 1: + + B: (sizetype) (@0 + (@1 - 1) - @2) > (@1 - 1) * 2 + + @0 and @2 can be swapped in both expressions without changing the result. + + The folds rely on sizetype's being unsigned (which is always true) + and on its being the same width as the pointer (which we have to check). + + The fold replaces two pointer_plus expressions, two comparisons and + an IOR with a pointer_plus, a pointer_diff, and a comparison, so in + the best case it's a saving of two operations. The A fold retains one + of the original pointer_pluses, so is a win even if both pointer_pluses + are used elsewhere. The B fold is a wash if both pointer_pluses are + used elsewhere, since all we end up doing is replacing a comparison with + a pointer_plus. We do still apply the fold under those circumstances + though, in case applying it to other conditions eventually makes one of the + pointer_pluses dead. */ +(for ior (truth_orif truth_or bit_ior) + (for cmp (le lt) + (simplify + (ior (cmp:cs (pointer_plus@3 @0 INTEGER_CST@1) @2) + (cmp:cs (pointer_plus@4 @2 @1) @0)) + (if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) + && TYPE_OVERFLOW_WRAPS (sizetype) + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (sizetype)) + /* Calculate the rhs constant. */ + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0); + offset_int rhs = off * 2; } + /* Always fails for negative values. */ + (if (wi::min_precision (rhs, UNSIGNED) <= TYPE_PRECISION (sizetype)) + /* Since the order of @0 and @2 doesn't matter, let tree_swap_operands_p + pick a canonical order. This increases the chances of using the + same pointer_plus in multiple checks. */ + (with { bool swap_p = tree_swap_operands_p (@0, @2); + tree rhs_tree = wide_int_to_tree (sizetype, rhs); } + (if (cmp == LT_EXPR) + (gt (convert:sizetype + (pointer_diff:ssizetype { swap_p ? @4 : @3; } + { swap_p ? @0 : @2; })) + { rhs_tree; }) + (gt (convert:sizetype + (pointer_diff:ssizetype + (pointer_plus { swap_p ? @2 : @0; } + { wide_int_to_tree (sizetype, off); }) + { swap_p ? @0 : @2; })) + { rhs_tree; }))))))))) Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c =================================================================== --- /dev/null 2018-07-26 10:26:13.137955424 +0100 +++ gcc/testsuite/gcc.dg/pointer-range-check-1.c 2018-08-01 10:55:10.695766411 +0100 @@ -0,0 +1,37 @@ +/* { dg-do compile { target { ilp32 || lp64 } } } */ +/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */ + +/* All four functions should be folded to: + + (sizetype) (a + 15 - b) < 30. */ + +_Bool +f1 (char *a, char *b) +{ + return (a + 16 <= b) || (b + 16 <= a); +} + +_Bool +f2 (char *a, char *b) +{ + return (a + 15 < b) || (b + 15 < a); +} + +_Bool +f3 (char *a, char *b) +{ + return (a + 16 <= b) | (b + 16 <= a); +} + +_Bool +f4 (char *a, char *b) +{ + return (a + 15 < b) | (b + 15 < a); +} + +/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */ +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */ Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c =================================================================== --- /dev/null 2018-07-26 10:26:13.137955424 +0100 +++ gcc/testsuite/gcc.dg/pointer-range-check-2.c 2018-08-01 10:55:10.695766411 +0100 @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer -fdump-tree-optimized" } */ + +_Bool +f1 (char *a, char *b) +{ + return (a + 16 <= b) || (b + 16 <= a); +} + +_Bool +f2 (char *a, char *b) +{ + return (a + 15 < b) || (b + 15 < a); +} + +_Bool +f3 (char *a, char *b) +{ + return (a + 16 <= b) | (b + 16 <= a); +} + +_Bool +f4 (char *a, char *b) +{ + return (a + 15 < b) | (b + 15 < a); +} + +/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */ +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */