Richard Biener <richard.guent...@gmail.com> writes:
> On Mon, Jul 30, 2018 at 7:47 PM Richard Sandiford
> <richard.sandif...@arm.com> wrote:
>>
>> [Sorry, somehow missed this till now]
>>
>> Richard Biener <richard.guent...@gmail.com> writes:
>> > On Mon, Jul 23, 2018 at 5:05 PM Richard Sandiford
>> > <richard.sandif...@arm.com> wrote:
>> >>
>> >> Marc Glisse <marc.gli...@inria.fr> writes:
>> >> > On Fri, 20 Jul 2018, Richard Sandiford wrote:
>> >> >
>> >> >> --- gcc/match.pd     2018-07-18 18:44:22.565914281 +0100
>> >> >> +++ gcc/match.pd     2018-07-20 11:24:33.692045585 +0100
>> >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> >> >>    (if (inverse_conditions_p (@0, @2)
>> >> >>         && element_precision (type) == element_precision (op_type))
>> >> >>     (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
>> >> >> +
>> >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
>> >> >> +   expressions like:
>> >> >> +
>> >> >> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
>> >> >> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
>> >> >> +
>> >> >> +   If pointers are known not to wrap, B checks whether @1 bytes 
>> >> >> starting at
>> >> >> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 
>> >> >> bytes.
>> >> >> +   A is more efficiently tested as:
>> >> >> +
>> >> >> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
>> >> >> +
>> >> >> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
>> >> >> +   with @1 - 1.  */
>> >> >> +(for ior (truth_orif truth_or bit_ior)
>> >> >> + (for cmp (le lt)
>> >> >> +  (simplify
>> >> >> +   (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2)
>> >> >> +    (cmp (pointer_plus:s @2 @1) @0))
>> >> >
>> >> > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may
>> >> > need some
>> >> > care with "cmp == LE_EXPR" below)
>> >> > Do you want :s on cmp as well?
>> >> >
>> >> >> +   (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
>> >> >
>> >> > Don't you want TYPE_OVERFLOW_UNDEFINED?
>> >>
>> >> Thanks, fixed below.  Think the cmp == LE_EXPR stuff is still ok with :c,
>> >> since the generated code sets cmp to LE_EXPR when matching GE_EXPR.
>> >>
>> >> Tested as before.  OK to install?
>> >>
>> >> Richard
>> >>
>> >>
>> >> 2018-07-23  Richard Sandiford  <richard.sandif...@arm.com>
>> >>
>> >> gcc/
>> >>         * match.pd: Optimise pointer range checks.
>> >>
>> >> gcc/testsuite/
>> >>         * gcc.dg/pointer-range-check-1.c: New test.
>> >>         * gcc.dg/pointer-range-check-2.c: Likewise.
>> >>
>> >> Index: gcc/match.pd
>> >> ===================================================================
>> >> --- gcc/match.pd        2018-07-23 15:56:47.000000000 +0100
>> >> +++ gcc/match.pd        2018-07-23 15:58:33.480269844 +0100
>> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> >>     (if (inverse_conditions_p (@0, @2)
>> >>          && element_precision (type) == element_precision (op_type))
>> >>      (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
>> >> +
>> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
>> >> +   expressions like:
>> >> +
>> >> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
>> >> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
>> >> +
>> >> +   If pointers are known not to wrap, B checks whether @1 bytes starting 
>> >> at
>> >> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 
>> >> bytes.
>> >> +   A is more efficiently tested as:
>> >> +
>> >> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
>> >> +
>> >> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
>> >> +   with @1 - 1.  */
>> >> +(for ior (truth_orif truth_or bit_ior)
>> >> + (for cmp (le lt)
>> >> +  (simplify
>> >> +   (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2)
>> >> +       (cmp:cs (pointer_plus:s @2 @1) @0))
>> >> +   (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
>> >
>> > no need to check flag_trapv, pointer arithmetic is not covered by -ftrapv.
>>
>> This was there because we're converting pointer arithmetic into integer
>> arithmetic, and -ftrapv could cause that new integer code to trap.
>> TYPE_OVERFLOW_UNDEFINED says that pointer overflow is undefined,
>> but it seemed bad form to make it actually trap, especially when the
>> above does some reassociation too.
>
> Ok, but then maybe check TYPE_OVERFLOW_WRAPS () on the type
> you are doing the arithmetic on (sizetype) instead?

OK, done below.

>> >> +    /* Convert the B form to the A form.  */
>> >> + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 :
>> >> 0); }
>> >> +     /* Always fails for negative values.  */
>> >> + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION
>> >> (sizetype))
>> >> +      /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let
>> >> +        tree_swap_operands_p pick a canonical order.  */
>> >
>> > gimple_resimplify takes care of that - well, it doesn't since minus isn't
>> > commutative...  I guess you get better CSE later when doing this thus ok,
>> > but it does look a bit off here ;)
>> >
>> > I think you shouldn't use 'sizetype' without guarding this whole thing
>> > with TYPE_PRECISION (sizetype) == TYPE_PRECISION (TREE_TYPE (@0)).
>>
>> OK, hadn't realised that could be false.  Would building the appropriate
>> unsigned type be OK without the condition, or does it need to be sizetype?
>
> That would probably work but it might be dead slow on those targets that 
> choose
> to have a lower-precision sizetype (ISTR some have 24bit pointers but 16bit
> sizetype because there's no arithmetic for 24bit but in offsetted addressing).
> So the transform is likely not a win for those.

Ah, yeah.  The version below adds the == test.

>> > Since the original IL performs an ordered compare of two eventually
>> > unrelated
>> > pointers (or pointers adjusted in a way that crosses object
>> > boundaries) (uh... ;))
>> > I wonder if you can use POINTER_DIFF_EXPR here to avoid the sizetype
>> > conversion?  Since POINTER_PLUS_EXPR offsets are supposed to be
>> > interpreted "signed" that should also handle negative offsets
>> > correctly then?
>>
>> The transform isn't valid when @1 is negative though, so I think we
>> need that check either way.
>
> +     /* Always fails for negative values.  */
> +     (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype))
>
> shouldn't that be wi::min_precision (off * 2, UNSIGNED) <=
> TYPE_PRECISION (sizetype)?

Gah, yes :-(

>> I could use POINTER_DIFF_EXPR and convert to unsigned though, if that
>> seems better.  That might also allow us to CSE the retained
>> POINTER_PLUS_EXPRs.
>
> Yeah, that sounds better then.

And the CSEing did happen, so we now get the "ideal" sequence for the
original alias checks:

        add     x0, x0, 15
        sub     x6, x0, x1
        sub     x5, x0, x2
        cmp     x6, 30
        ccmp    x5, 30, 0, hi

whereas the original patch had an extra add.

>> > Also, when @2 == @0 + (@1+1) then the original condition is true but
>> > ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not?
>> >    (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2
>> > -> -1 > @1 * 2
>> >
>> > which is false.  So I can't really see how you can apply this transform in
>> > general (it would be fine for generating alias checks but a bit more
>> > pessimistic).
>> >
>> > But maybe I am missing something?
>>
>> It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true.
>
> Hmm, so mathematically this is
>
>   (@0 - @2) % modreduce + @1 > @1 * 2

Probably not % if you operating over Z, since if @0 - @2 = -@1 * 2
(and -@1 * 2 % modreduce == -@1 * 2), the condition above will be false
while we want it to be true.  But...

> then, but I don't want to think too much about this since Marc didn't
> object here ;)

...we start out converting the range check for A to:

   !(-@1 <= @0 - @2 <= @1)

i.e. @1 + 1 bytes at @0 and @2 are free from overlap.  Then we just
apply the usual trick:

   A <= X <= B -> (unsigned) (X - A) <= (unsigned) (B - A) when A <= B

Substituting @0 - @2 for X, -@1 for A and @1 for B gives:

   !((unsigned) (@0 - @2 + @1) <= (unsigned) (@1 * 2))
->   (unsigned) (@0 - @2 + @1) >  (unsigned) (@1 * 2)

Then the B case is just an off-by-one variant.

Patch tested as before.

Thanks,
Richard


2018-08-01  Richard Sandiford  <richard.sandif...@arm.com>

gcc/
        * match.pd: Optimise pointer range checks.

gcc/testsuite/
        * gcc.dg/pointer-range-check-1.c: New test.
        * gcc.dg/pointer-range-check-2.c: Likewise.

Index: gcc/match.pd
===================================================================
--- gcc/match.pd        2018-07-24 19:04:14.985363724 +0100
+++ gcc/match.pd        2018-08-01 10:55:10.695766411 +0100
@@ -4937,3 +4937,63 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (if (inverse_conditions_p (@0, @2)
         && element_precision (type) == element_precision (op_type))
     (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
+
+/* For pointers @0 and @2 and nonnegative constant offset @1, look for
+   expressions like:
+
+   A: (@0 + @1 < @2) | (@2 + @1 < @0)
+   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
+
+   If pointers are known not to wrap, B checks whether @1 bytes starting
+   at @0 and @2 do not overlap, while A tests the same thing for @1 + 1
+   bytes.  A is more efficiently tested as:
+
+   A: (sizetype) (@0 + @1 - @2) > @1 * 2
+
+   The equivalent expression for B is given by replacing @1 with @1 - 1:
+
+   B: (sizetype) (@0 + (@1 - 1) - @2) > (@1 - 1) * 2
+
+   @0 and @2 can be swapped in both expressions without changing the result.
+
+   The folds rely on sizetype's being unsigned (which is always true)
+   and on its being the same width as the pointer (which we have to check).
+
+   The fold replaces two pointer_plus expressions, two comparisons and
+   an IOR with a pointer_plus, a pointer_diff, and a comparison, so in
+   the best case it's a saving of two operations.  The A fold retains one
+   of the original pointer_pluses, so is a win even if both pointer_pluses
+   are used elsewhere.  The B fold is a wash if both pointer_pluses are
+   used elsewhere, since all we end up doing is replacing a comparison with
+   a pointer_plus.  We do still apply the fold under those circumstances
+   though, in case applying it to other conditions eventually makes one of the
+   pointer_pluses dead.  */
+(for ior (truth_orif truth_or bit_ior)
+ (for cmp (le lt)
+  (simplify
+   (ior (cmp:cs (pointer_plus@3 @0 INTEGER_CST@1) @2)
+       (cmp:cs (pointer_plus@4 @2 @1) @0))
+   (if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
+       && TYPE_OVERFLOW_WRAPS (sizetype)
+       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (sizetype))
+    /* Calculate the rhs constant.  */
+    (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0);
+           offset_int rhs = off * 2; }
+     /* Always fails for negative values.  */
+     (if (wi::min_precision (rhs, UNSIGNED) <= TYPE_PRECISION (sizetype))
+      /* Since the order of @0 and @2 doesn't matter, let tree_swap_operands_p
+        pick a canonical order.  This increases the chances of using the
+        same pointer_plus in multiple checks.  */
+      (with { bool swap_p = tree_swap_operands_p (@0, @2);
+             tree rhs_tree = wide_int_to_tree (sizetype, rhs); }
+       (if (cmp == LT_EXPR)
+       (gt (convert:sizetype
+            (pointer_diff:ssizetype { swap_p ? @4 : @3; }
+                                    { swap_p ? @0 : @2; }))
+           { rhs_tree; })
+       (gt (convert:sizetype
+            (pointer_diff:ssizetype
+             (pointer_plus { swap_p ? @2 : @0; }
+                           { wide_int_to_tree (sizetype, off); })
+             { swap_p ? @0 : @2; }))
+           { rhs_tree; })))))))))
Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c
===================================================================
--- /dev/null   2018-07-26 10:26:13.137955424 +0100
+++ gcc/testsuite/gcc.dg/pointer-range-check-1.c        2018-08-01 
10:55:10.695766411 +0100
@@ -0,0 +1,37 @@
+/* { dg-do compile { target { ilp32 || lp64 } } } */
+/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */
+
+/* All four functions should be folded to:
+
+   (sizetype) (a + 15 - b) < 30.  */
+
+_Bool
+f1 (char *a, char *b)
+{
+  return (a + 16 <= b) || (b + 16 <= a);
+}
+
+_Bool
+f2 (char *a, char *b)
+{
+  return (a + 15 < b) || (b + 15 < a);
+}
+
+_Bool
+f3 (char *a, char *b)
+{
+  return (a + 16 <= b) | (b + 16 <= a);
+}
+
+_Bool
+f4 (char *a, char *b)
+{
+  return (a + 15 < b) | (b + 15 < a);
+}
+
+/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 "optimized" } } 
*/
+/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 "optimized" } } 
*/
+/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */
+/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */
Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c
===================================================================
--- /dev/null   2018-07-26 10:26:13.137955424 +0100
+++ gcc/testsuite/gcc.dg/pointer-range-check-2.c        2018-08-01 
10:55:10.695766411 +0100
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer -fdump-tree-optimized" } */
+
+_Bool
+f1 (char *a, char *b)
+{
+  return (a + 16 <= b) || (b + 16 <= a);
+}
+
+_Bool
+f2 (char *a, char *b)
+{
+  return (a + 15 < b) || (b + 15 < a);
+}
+
+_Bool
+f3 (char *a, char *b)
+{
+  return (a + 16 <= b) | (b + 16 <= a);
+}
+
+_Bool
+f4 (char *a, char *b)
+{
+  return (a + 15 < b) | (b + 15 < a);
+}
+
+/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */
+/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 "optimized" } } 
*/
+/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */

Reply via email to