On Tue, Oct 25, 2016 at 1:22 PM, Bin Cheng <bin.ch...@arm.com> wrote: > Hi, > As commented in patch, this one simplifies (cond (cmp x c1) (op x c2) c3) > into (op (minmax x c1) c2) if: > > 1) OP is PLUS or MINUS. > 2) CMP is LT, LE, GT or GE. > 3) C3 == (C1 op C2), and the experation isn't undefined behavior. > > This pattern also handles special cases like: > > 1) Comparison's operand x is a unsigned to signed type conversion > and c1 is integer zero. In this case, > (signed type)x < 0 <=> x > MAX_VAL(signed type) > (signed type)x >= 0 <=> x <= MAX_VAL(signed type) > 2) Const c1 may not equal to (C3 op' C2). In this case we also > check equality for (c1+1) and (c1-1) by adjusting comparison > code. > > Also note: Though signed type is handled by this pattern, it cannot be > simplified at the moment because C standard requires additional type > promotion. In order to match&simplify signed type cases, the IR needs to be > cleaned up by other optimizers, i.e, VRP. > For given loop: > int foo1 (unsigned short a[], unsigned int x) > { > unsigned int i; > for (i = 0; i < 1000; i++) > { > x = a[i]; > a[i] = (unsigned short)(x <= 32768 ? x + 32768 : 0); > } > return x; > } > > Generated assembly can be improved from: > .L4: > ldr q5, [x3, x1] > add w2, w2, 1 > cmp w0, w2 > ushll v1.4s, v5.4h, 0 > ushll2 v0.4s, v5.8h, 0 > add v4.4s, v1.4s, v2.4s > add v3.4s, v0.4s, v2.4s > cmhs v1.4s, v2.4s, v1.4s > cmhs v0.4s, v2.4s, v0.4s > and v1.16b, v4.16b, v1.16b > and v0.16b, v3.16b, v0.16b > xtn v3.4h, v1.4s > xtn2 v3.8h, v0.4s > str q3, [x3, x1] > add x1, x1, 16 > bhi .L4 > > To: > .L4: > ldr q1, [x3, x1] > add w2, w2, 1 > cmp w0, w2 > umin v0.8h, v1.8h, v2.8h > add v0.8h, v0.8h, v2.8h > str q0, [x3, x1] > add x1, x1, 16 > bhi .L4 > > Bootstrap and test on x86_64 and AArch64 for whole patch set. Any comments?
I reminded myself that fold-const.c:fold_cond_expr_with_comparison is still GENERIC only (needs to be moved to match.pd). In that light can we stage the new transform in a way to get (cond (cmp x c1) (op x c2) c3) into the form handled by it? Thus (cond (cmp x c1') x c2') + c2 ? Something like c1' = c1 - c2; c2' = c3 - c2 (when valid to do this)? (I didn't think long enough about this yet). That said, can you try moving (cond (cmp x c1) x c2) -> minmax (the bottom most case in fold_cond_expr_with_comparison) to match.pd and see whether you can get that extended for your case? A few comments on the pattern in your patch: + (cond (cmp@0 (convert?@00 @10) INTEGER_CST@01) + (op @10 INTEGER_CST@11) + INTEGER_CST@2) the @00 vs @10 vs @01 vs @11 vs @0 is visually hard to match. match.pd allows for any alphanumeric identifier after @ (yes, we mostly use the digit variants because historically that was the only ones available and there 00 and 01 didn't work ;)). May I suggest @X @C1, etc. and @cmpexp maybe for @0? For simple patterns it really doesn't matter but if you have to scroll in a 80x24 terminal to see it it doesn't help if you can't see the matched patterns with the captures when seeing @01 ... Thanks, Richard. > Thanks, > bin > > 2016-10-21 Bin Cheng <bin.ch...@arm.com> > > * match.pd ((cond (cmp x c1) (op x c2) c3) -> (op (minmax x c1) c2)): > New pattern. > > gcc/testsuite/ChangeLog > 2016-10-21 Bin Cheng <bin.ch...@arm.com> > > * gcc.dg/fold-bopcond-1.c: New test. > * gcc.dg/fold-bopcond-2.c: New test.