Richard Biener <[email protected]> writes:
> On Mon, Jul 30, 2018 at 7:47 PM Richard Sandiford
> <[email protected]> wrote:
>>
>> [Sorry, somehow missed this till now]
>>
>> Richard Biener <[email protected]> writes:
>> > On Mon, Jul 23, 2018 at 5:05 PM Richard Sandiford
>> > <[email protected]> wrote:
>> >>
>> >> Marc Glisse <[email protected]> writes:
>> >> > On Fri, 20 Jul 2018, Richard Sandiford wrote:
>> >> >
>> >> >> --- gcc/match.pd 2018-07-18 18:44:22.565914281 +0100
>> >> >> +++ gcc/match.pd 2018-07-20 11:24:33.692045585 +0100
>> >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> >> >> (if (inverse_conditions_p (@0, @2)
>> >> >> && element_precision (type) == element_precision (op_type))
>> >> >> (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
>> >> >> +
>> >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
>> >> >> + expressions like:
>> >> >> +
>> >> >> + A: (@0 + @1 < @2) | (@2 + @1 < @0)
>> >> >> + B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
>> >> >> +
>> >> >> + If pointers are known not to wrap, B checks whether @1 bytes
>> >> >> starting at
>> >> >> + @0 and @2 do not overlap, while A tests the same thing for @1 + 1
>> >> >> bytes.
>> >> >> + A is more efficiently tested as:
>> >> >> +
>> >> >> + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
>> >> >> +
>> >> >> + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced
>> >> >> + with @1 - 1. */
>> >> >> +(for ior (truth_orif truth_or bit_ior)
>> >> >> + (for cmp (le lt)
>> >> >> + (simplify
>> >> >> + (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2)
>> >> >> + (cmp (pointer_plus:s @2 @1) @0))
>> >> >
>> >> > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may
>> >> > need some
>> >> > care with "cmp == LE_EXPR" below)
>> >> > Do you want :s on cmp as well?
>> >> >
>> >> >> + (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
>> >> >
>> >> > Don't you want TYPE_OVERFLOW_UNDEFINED?
>> >>
>> >> Thanks, fixed below. Think the cmp == LE_EXPR stuff is still ok with :c,
>> >> since the generated code sets cmp to LE_EXPR when matching GE_EXPR.
>> >>
>> >> Tested as before. OK to install?
>> >>
>> >> Richard
>> >>
>> >>
>> >> 2018-07-23 Richard Sandiford <[email protected]>
>> >>
>> >> gcc/
>> >> * match.pd: Optimise pointer range checks.
>> >>
>> >> gcc/testsuite/
>> >> * gcc.dg/pointer-range-check-1.c: New test.
>> >> * gcc.dg/pointer-range-check-2.c: Likewise.
>> >>
>> >> Index: gcc/match.pd
>> >> ===================================================================
>> >> --- gcc/match.pd 2018-07-23 15:56:47.000000000 +0100
>> >> +++ gcc/match.pd 2018-07-23 15:58:33.480269844 +0100
>> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> >> (if (inverse_conditions_p (@0, @2)
>> >> && element_precision (type) == element_precision (op_type))
>> >> (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
>> >> +
>> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
>> >> + expressions like:
>> >> +
>> >> + A: (@0 + @1 < @2) | (@2 + @1 < @0)
>> >> + B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
>> >> +
>> >> + If pointers are known not to wrap, B checks whether @1 bytes starting
>> >> at
>> >> + @0 and @2 do not overlap, while A tests the same thing for @1 + 1
>> >> bytes.
>> >> + A is more efficiently tested as:
>> >> +
>> >> + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
>> >> +
>> >> + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced
>> >> + with @1 - 1. */
>> >> +(for ior (truth_orif truth_or bit_ior)
>> >> + (for cmp (le lt)
>> >> + (simplify
>> >> + (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2)
>> >> + (cmp:cs (pointer_plus:s @2 @1) @0))
>> >> + (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
>> >
>> > no need to check flag_trapv, pointer arithmetic is not covered by -ftrapv.
>>
>> This was there because we're converting pointer arithmetic into integer
>> arithmetic, and -ftrapv could cause that new integer code to trap.
>> TYPE_OVERFLOW_UNDEFINED says that pointer overflow is undefined,
>> but it seemed bad form to make it actually trap, especially when the
>> above does some reassociation too.
>
> Ok, but then maybe check TYPE_OVERFLOW_WRAPS () on the type
> you are doing the arithmetic on (sizetype) instead?
OK, done below.
>> >> + /* Convert the B form to the A form. */
>> >> + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 :
>> >> 0); }
>> >> + /* Always fails for negative values. */
>> >> + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION
>> >> (sizetype))
>> >> + /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let
>> >> + tree_swap_operands_p pick a canonical order. */
>> >
>> > gimple_resimplify takes care of that - well, it doesn't since minus isn't
>> > commutative... I guess you get better CSE later when doing this thus ok,
>> > but it does look a bit off here ;)
>> >
>> > I think you shouldn't use 'sizetype' without guarding this whole thing
>> > with TYPE_PRECISION (sizetype) == TYPE_PRECISION (TREE_TYPE (@0)).
>>
>> OK, hadn't realised that could be false. Would building the appropriate
>> unsigned type be OK without the condition, or does it need to be sizetype?
>
> That would probably work but it might be dead slow on those targets that
> choose
> to have a lower-precision sizetype (ISTR some have 24bit pointers but 16bit
> sizetype because there's no arithmetic for 24bit but in offsetted addressing).
> So the transform is likely not a win for those.
Ah, yeah. The version below adds the == test.
>> > Since the original IL performs an ordered compare of two eventually
>> > unrelated
>> > pointers (or pointers adjusted in a way that crosses object
>> > boundaries) (uh... ;))
>> > I wonder if you can use POINTER_DIFF_EXPR here to avoid the sizetype
>> > conversion? Since POINTER_PLUS_EXPR offsets are supposed to be
>> > interpreted "signed" that should also handle negative offsets
>> > correctly then?
>>
>> The transform isn't valid when @1 is negative though, so I think we
>> need that check either way.
>
> + /* Always fails for negative values. */
> + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype))
>
> shouldn't that be wi::min_precision (off * 2, UNSIGNED) <=
> TYPE_PRECISION (sizetype)?
Gah, yes :-(
>> I could use POINTER_DIFF_EXPR and convert to unsigned though, if that
>> seems better. That might also allow us to CSE the retained
>> POINTER_PLUS_EXPRs.
>
> Yeah, that sounds better then.
And the CSEing did happen, so we now get the "ideal" sequence for the
original alias checks:
add x0, x0, 15
sub x6, x0, x1
sub x5, x0, x2
cmp x6, 30
ccmp x5, 30, 0, hi
whereas the original patch had an extra add.
>> > Also, when @2 == @0 + (@1+1) then the original condition is true but
>> > ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not?
>> > (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2
>> > -> -1 > @1 * 2
>> >
>> > which is false. So I can't really see how you can apply this transform in
>> > general (it would be fine for generating alias checks but a bit more
>> > pessimistic).
>> >
>> > But maybe I am missing something?
>>
>> It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true.
>
> Hmm, so mathematically this is
>
> (@0 - @2) % modreduce + @1 > @1 * 2
Probably not % if you operating over Z, since if @0 - @2 = -@1 * 2
(and -@1 * 2 % modreduce == -@1 * 2), the condition above will be false
while we want it to be true. But...
> then, but I don't want to think too much about this since Marc didn't
> object here ;)
...we start out converting the range check for A to:
!(-@1 <= @0 - @2 <= @1)
i.e. @1 + 1 bytes at @0 and @2 are free from overlap. Then we just
apply the usual trick:
A <= X <= B -> (unsigned) (X - A) <= (unsigned) (B - A) when A <= B
Substituting @0 - @2 for X, -@1 for A and @1 for B gives:
!((unsigned) (@0 - @2 + @1) <= (unsigned) (@1 * 2))
-> (unsigned) (@0 - @2 + @1) > (unsigned) (@1 * 2)
Then the B case is just an off-by-one variant.
Patch tested as before.
Thanks,
Richard
2018-08-01 Richard Sandiford <[email protected]>
gcc/
* match.pd: Optimise pointer range checks.
gcc/testsuite/
* gcc.dg/pointer-range-check-1.c: New test.
* gcc.dg/pointer-range-check-2.c: Likewise.
Index: gcc/match.pd
===================================================================
--- gcc/match.pd 2018-07-24 19:04:14.985363724 +0100
+++ gcc/match.pd 2018-08-01 10:55:10.695766411 +0100
@@ -4937,3 +4937,63 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(if (inverse_conditions_p (@0, @2)
&& element_precision (type) == element_precision (op_type))
(view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
+
+/* For pointers @0 and @2 and nonnegative constant offset @1, look for
+ expressions like:
+
+ A: (@0 + @1 < @2) | (@2 + @1 < @0)
+ B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
+
+ If pointers are known not to wrap, B checks whether @1 bytes starting
+ at @0 and @2 do not overlap, while A tests the same thing for @1 + 1
+ bytes. A is more efficiently tested as:
+
+ A: (sizetype) (@0 + @1 - @2) > @1 * 2
+
+ The equivalent expression for B is given by replacing @1 with @1 - 1:
+
+ B: (sizetype) (@0 + (@1 - 1) - @2) > (@1 - 1) * 2
+
+ @0 and @2 can be swapped in both expressions without changing the result.
+
+ The folds rely on sizetype's being unsigned (which is always true)
+ and on its being the same width as the pointer (which we have to check).
+
+ The fold replaces two pointer_plus expressions, two comparisons and
+ an IOR with a pointer_plus, a pointer_diff, and a comparison, so in
+ the best case it's a saving of two operations. The A fold retains one
+ of the original pointer_pluses, so is a win even if both pointer_pluses
+ are used elsewhere. The B fold is a wash if both pointer_pluses are
+ used elsewhere, since all we end up doing is replacing a comparison with
+ a pointer_plus. We do still apply the fold under those circumstances
+ though, in case applying it to other conditions eventually makes one of the
+ pointer_pluses dead. */
+(for ior (truth_orif truth_or bit_ior)
+ (for cmp (le lt)
+ (simplify
+ (ior (cmp:cs (pointer_plus@3 @0 INTEGER_CST@1) @2)
+ (cmp:cs (pointer_plus@4 @2 @1) @0))
+ (if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
+ && TYPE_OVERFLOW_WRAPS (sizetype)
+ && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (sizetype))
+ /* Calculate the rhs constant. */
+ (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0);
+ offset_int rhs = off * 2; }
+ /* Always fails for negative values. */
+ (if (wi::min_precision (rhs, UNSIGNED) <= TYPE_PRECISION (sizetype))
+ /* Since the order of @0 and @2 doesn't matter, let tree_swap_operands_p
+ pick a canonical order. This increases the chances of using the
+ same pointer_plus in multiple checks. */
+ (with { bool swap_p = tree_swap_operands_p (@0, @2);
+ tree rhs_tree = wide_int_to_tree (sizetype, rhs); }
+ (if (cmp == LT_EXPR)
+ (gt (convert:sizetype
+ (pointer_diff:ssizetype { swap_p ? @4 : @3; }
+ { swap_p ? @0 : @2; }))
+ { rhs_tree; })
+ (gt (convert:sizetype
+ (pointer_diff:ssizetype
+ (pointer_plus { swap_p ? @2 : @0; }
+ { wide_int_to_tree (sizetype, off); })
+ { swap_p ? @0 : @2; }))
+ { rhs_tree; })))))))))
Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c
===================================================================
--- /dev/null 2018-07-26 10:26:13.137955424 +0100
+++ gcc/testsuite/gcc.dg/pointer-range-check-1.c 2018-08-01
10:55:10.695766411 +0100
@@ -0,0 +1,37 @@
+/* { dg-do compile { target { ilp32 || lp64 } } } */
+/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */
+
+/* All four functions should be folded to:
+
+ (sizetype) (a + 15 - b) < 30. */
+
+_Bool
+f1 (char *a, char *b)
+{
+ return (a + 16 <= b) || (b + 16 <= a);
+}
+
+_Bool
+f2 (char *a, char *b)
+{
+ return (a + 15 < b) || (b + 15 < a);
+}
+
+_Bool
+f3 (char *a, char *b)
+{
+ return (a + 16 <= b) | (b + 16 <= a);
+}
+
+_Bool
+f4 (char *a, char *b)
+{
+ return (a + 15 < b) | (b + 15 < a);
+}
+
+/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 "optimized" } }
*/
+/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 "optimized" } }
*/
+/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */
+/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */
Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c
===================================================================
--- /dev/null 2018-07-26 10:26:13.137955424 +0100
+++ gcc/testsuite/gcc.dg/pointer-range-check-2.c 2018-08-01
10:55:10.695766411 +0100
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer -fdump-tree-optimized" } */
+
+_Bool
+f1 (char *a, char *b)
+{
+ return (a + 16 <= b) || (b + 16 <= a);
+}
+
+_Bool
+f2 (char *a, char *b)
+{
+ return (a + 15 < b) || (b + 15 < a);
+}
+
+_Bool
+f3 (char *a, char *b)
+{
+ return (a + 16 <= b) | (b + 16 <= a);
+}
+
+_Bool
+f4 (char *a, char *b)
+{
+ return (a + 15 < b) | (b + 15 < a);
+}
+
+/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */
+/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 "optimized" } }
*/
+/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */