On Tue, Oct 25, 2016 at 1:22 PM, Bin Cheng <bin.ch...@arm.com> wrote:
> Hi,
> As commented in patch, this one simplifies (cond (cmp x c1) (op x c2) c3) 
> into (op (minmax x c1) c2) if:
>
>      1) OP is PLUS or MINUS.
>      2) CMP is LT, LE, GT or GE.
>      3) C3 == (C1 op C2), and the experation isn't undefined behavior.
>
>    This pattern also handles special cases like:
>
>      1) Comparison's operand x is a unsigned to signed type conversion
>         and c1 is integer zero.  In this case,
>           (signed type)x  < 0  <=>  x  > MAX_VAL(signed type)
>           (signed type)x >= 0  <=>  x <= MAX_VAL(signed type)
>      2) Const c1 may not equal to (C3 op' C2).  In this case we also
>         check equality for (c1+1) and (c1-1) by adjusting comparison
>         code.
>
> Also note: Though signed type is handled by this pattern, it cannot be 
> simplified at the moment because C standard requires additional type 
> promotion.  In order to match&simplify signed type cases, the IR needs to be 
> cleaned up by other optimizers, i.e, VRP.
> For given loop:
> int foo1 (unsigned short a[], unsigned int x)
> {
>   unsigned int i;
>   for (i = 0; i < 1000; i++)
>     {
>       x = a[i];
>       a[i] = (unsigned short)(x <= 32768 ? x + 32768 : 0);
>     }
>   return x;
> }
>
> Generated assembly can be improved from:
> .L4:
>         ldr     q5, [x3, x1]
>         add     w2, w2, 1
>         cmp     w0, w2
>         ushll   v1.4s, v5.4h, 0
>         ushll2  v0.4s, v5.8h, 0
>         add     v4.4s, v1.4s, v2.4s
>         add     v3.4s, v0.4s, v2.4s
>         cmhs    v1.4s, v2.4s, v1.4s
>         cmhs    v0.4s, v2.4s, v0.4s
>         and     v1.16b, v4.16b, v1.16b
>         and     v0.16b, v3.16b, v0.16b
>         xtn     v3.4h, v1.4s
>         xtn2    v3.8h, v0.4s
>         str     q3, [x3, x1]
>         add     x1, x1, 16
>         bhi     .L4
>
> To:
> .L4:
>         ldr     q1, [x3, x1]
>         add     w2, w2, 1
>         cmp     w0, w2
>         umin    v0.8h, v1.8h, v2.8h
>         add     v0.8h, v0.8h, v2.8h
>         str     q0, [x3, x1]
>         add     x1, x1, 16
>         bhi     .L4
>
> Bootstrap and test on x86_64 and AArch64 for whole patch set.  Any comments?

I reminded myself that fold-const.c:fold_cond_expr_with_comparison is
still GENERIC only
(needs to be moved to match.pd).  In that light can we stage the new
transform in a way
to get

  (cond (cmp x c1) (op x c2) c3)

into the form handled by it?  Thus

  (cond (cmp x c1') x c2') + c2

?  Something like c1' = c1 - c2; c2' = c3 - c2 (when valid to do this)?
(I didn't think long enough about this yet).

That said, can you try moving (cond (cmp x c1) x c2) -> minmax (the bottom most
case in fold_cond_expr_with_comparison) to match.pd and see whether you can
get that extended for your case?

A few comments on the pattern in your patch:

+   (cond (cmp@0 (convert?@00 @10) INTEGER_CST@01)
+        (op @10 INTEGER_CST@11)
+        INTEGER_CST@2)

the @00 vs @10 vs @01 vs @11 vs @0 is visually hard to match.  match.pd
allows for any alphanumeric identifier after @ (yes, we mostly use the digit
variants because historically that was the only ones available and there 00
and 01 didn't work ;)).  May I suggest @X @C1, etc. and @cmpexp maybe
for @0?  For simple patterns it really doesn't matter but if you have to scroll
in a 80x24 terminal to see it it doesn't help if you can't see the matched
patterns with the captures when seeing @01 ...

Thanks,
Richard.

> Thanks,
> bin
>
> 2016-10-21  Bin Cheng  <bin.ch...@arm.com>
>
>         * match.pd ((cond (cmp x c1) (op x c2) c3) -> (op (minmax x c1) c2)):
>         New pattern.
>
> gcc/testsuite/ChangeLog
> 2016-10-21  Bin Cheng  <bin.ch...@arm.com>
>
>         * gcc.dg/fold-bopcond-1.c: New test.
>         * gcc.dg/fold-bopcond-2.c: New test.

Reply via email to