On 17/09/15 10:46, Richard Biener wrote:
On Thu, Sep 3, 2015 at 1:11 PM, Andre Vieira
<andre.simoesdiasvie...@arm.com> wrote:
On 01/09/15 15:01, Richard Biener wrote:

On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira
<andre.simoesdiasvie...@arm.com> wrote:

Hi Marc,

On 28/08/15 19:07, Marc Glisse wrote:


(not a review, I haven't even read the whole patch)

On Fri, 28 Aug 2015, Andre Vieira wrote:

2015-08-03  Andre Vieira  <andre.simoesdiasvie...@arm.com>

    * match.pd: Added new patterns:
      ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
      (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>}
C1)



+(for op0 (rshift rshift lshift lshift bit_and bit_and)
+ op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
+ op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)

You can nest for-loops, it seems clearer as:
(for op0 (rshift lshift bit_and)
     (for op1 (bit_ior bit_xor)
          op2 (bit_xor bit_ior)



Will do, thank you for pointing it out.



+(simplify
+ (op2:c
+  (op1:c
+   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)

I suspect you will want more :s (single_use) and less :c
(canonicalization
should put constants in second position).

I can't find the definition of :s (single_use).


Sorry for that - didn't get along updating it yet :/  It restricts
matching to
sub-expressions that have a single-use.  So

+  a &= 0xd123;
+  unsigned short tem = a ^ 0x6040;
+  a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071).  */
... use of tem ...

we shouldn't do the simplifcation here because the expression
(a & 0x123) ^ 0x6040 is kept live by 'tem'.

GCC internals do point out
that canonicalization does put constants in the second position, didnt
see
that first. Thank you for pointing it out.

+       C1 = wi::bit_and_not (C1,C2);

Space after ','.

Will do.

Having wide_int_storage in many places is surprising, I can't find
similar
code anywhere else in gcc.



I tried looking for examples of something similar, I think I ended up
using
wide_int because I was able to convert easily to and from it and it has
the
"mask" and "wide_int_to_tree" functions. I welcome any suggestions on
what I
should be using here for integer constant transformations and
comparisons.


Using wide-ints is fine, but you shouldn't need 'wide_int_storage'
constructors - those
are indeed odd.  Is it just for the initializers of wide-ints?

+    wide_int zero_mask = wi::zero (prec);
+    wide_int C0 = wide_int_storage (@1);
+    wide_int C1 = wide_int_storage (@2);
+    wide_int C2 = wide_int_storage (@3);
...
+       zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false,
prec));

tree_to_uhwi (@1) should do the trick as well

+       C1 = wi::bit_and_not (C1,C2);
+       cst_emit = wi::bit_or (C1, C2);

the ops should be replacable with @2 and @3, the store to C1 obviously not
(but you can use a tree temporary and use wide_int_to_tree here to avoid
the back-and-forth for the case where C1 is not assigned to).

Note that transforms only doing association are prone to endless recursion
in case some other pattern does the reverse op...

Richard.


BR,
Andre


Thank you for all the comments, see reworked version:

Two new algorithmic optimisations:
   1.((X op0 C0) op1 C1) op2 C2)
     with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
     zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0)
     and 0's otherwise.
     if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted)
     if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
     if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
   2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)


This patch does two algorithmic optimisations that target patterns like:
(((x << 24) | 0x00FFFFFF) ^ 0xFF000000) and ((x ^ 0x40000002) >> 1) |
0x80000000.

The transformation uses the knowledge of which bits are zero after
operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing
run-time operations.
The two examples above would be transformed into (X << 24) ^ 0xFFFFFFFF and
(X >> 1) ^ 0xa0000001 respectively.

The second transformation enables more applications of the first. Also some
targets may benefit from delaying shift operations. I am aware that such an
optimization, in combination with one or more optimizations that cause the
reverse transformation, may lead to an infinite loop. Though such behavior
has not been detected during regression testing and bootstrapping on
aarch64.

+/* (X bit_op C0) rshift C1 -> (X rshift C0) bit_op (C0 rshift C1) */
+(for bit_op (bit_ior bit_xor bit_and)
+(simplify
+ (rshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
+ (bit_op
+  (rshift @0 @2)
+  { wide_int_to_tree (type, wi::rshift (@1, @2, TYPE_SIGN (type))); })))
+
+/* (X bit_op C0) lshift C1 -> (X lshift C0) bit_op (C0 lshift C1) */
+(for bit_op (bit_ior bit_xor bit_and)
+(simplify
+ (lshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
+ (bit_op
+  (lshift @0 @2)
+  { wide_int_to_tree (type, wi::lshift (@1, @2)); })))

Will do, good to see that my second transformation still picks up the shift @1 @2 as a constant. I'm assuming there is some constant folding going on between simplify transformations?


this may be one case where not using wide-ints to be able to combine the
patterns makes sense.  Thus,

(for shift (lshift rshift)
  (simplify
   (shift ...)
   (bit_op
    (shift @0 @2)
    (shift @1 @2))))

note that I'm worried we'd take on "invalid" ubsan here when the above
applies to

int foo (int i)
{
   return (i & 0x7fffffff) >> 3;
}
int main () { return foo (0x80000007); }

and i is negative.  That's because right-shift of negative values
invokes undefined
behavior.  IIRC in the middle-end we will not be taking advantage of
that but the
simplifications apply to GENERIC as well and thus may hit before ubsan
instrumentation triggers(?)  It would be nice if you could double-check that.

I was looking into this and I understand your worries, though, I found out that for the particular case of shifts and bit_and there already is a simplify transformation that does the same, regardless of the sign.

/* Fold (X & C2) << C1 into (X << C1) & (C2 << C1)
   (X & C2) >> C1 into (X >> C1) & (C2 >> C1).  */
(for shift (lshift rshift)
 (simplify
  (shift (convert?:s (bit_and:s @0 INTEGER_CST@2)) INTEGER_CST@1)
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
(with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); }
    (bit_and (shift (convert @0) @1) { mask; })))))

Now, I don't quite understand what you mean by ubsan instrumentation, will GCC introduce guards into code where it detects potential undefined behavior? Also, it might be worth to note that right shift of negative values is denoted as "implementation defined" by the C standard. GCC however doesn't include it in its list of implementation defined behavior so I guess that is why you refer to it as undefined?

Should we perhaps disable transformations where we can not guarantee that the right shift produced is not one of negative values? I.e. prohibit this transformation for:
1) signed types and op1 == bit_xor
2) signed types and op1 == bit_and and C1 has sign bit set.

Also would it be useful in cases where you have signed shift and bit_and, and C1 is not negative, to do the transformation but replace the signed shift by an unsigned shift. This to make sure we don't introduce undefined/implementation defined behavior were there was none.

This does however change the current behavior!


+ (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) && wi::fits_uhwi_p (@1))

you only need fits_uhwi_p (@1) in the op0 != BIT_AND_EXPR case it
seems, so better
move it down to those cases.

So I used to, but I had the problem that I didn't know how to make it "fail" the matching if this was not the case. For instance if op0 is a lshift for which the constant doesn't fit uhwi, then it would fall through and never set the zero mask, potentially leading to a wrong transformation. Now I'm not sure this can happen, since that would mean that either constant @2 or @3 need to be all 1's and that might already be caught by some other transformation, but its wrong to rely on such behavior IMO. So for now I have changed it to:

(simplify
 (op2
  (op1:s
   (op0@4 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
 (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) &&
      (op0 == BIT_AND_EXPR || wi::fits_uhwi_p (@1)))


It would be cool to have a FAIL expression, usable in the with clauses, to make the pattern match fail a bit like the one in the machine description language.


+   (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec)))

I think you can write

    (if (wi::bit_and (...) == 0)

or at least wi:eq_p (wi::bit_and (...), 0).


wi::bit_and (...) == 0 seems to be doing the trick.

I wonder if we shouldn't improve the pattern by handling (X op0 C0)
transparently
via using get_nonzero_bits (yeah, that's not exactly zero_mask but its
inverse AFAIK).
We'd wrap get_nonzero_bits in a helper that can handle GENERIC and your
&, >>, << cases (hmm, such function must already exist somewhere...).  So you'd
reduce the pattern to

+ (for op1 (bit_ior bit_xor)
+      op2 (bit_xor bit_ior)
+(simplify
+ (op2
+  (op1:s @0 INTEGER_CST@2) INTEGER_CST@3))
    (with
     {
       wide_int zero_mask_not = get_nonzero_bits (@0);
...
     }

This would make use of value-range information determined by VRP for example.

I'll go look for such a function.


note that with your pattern you'd want to capture (op0:s @0 INTEGER_CST@1)
like via (op0@4 @0 INTEGER_CST@1) so you can re-use it in the replacement
like so:

+   (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec)))
+    (op2 @4 { wide_int_to_tree (type, cst_emit); })
+    (if (wi::eq_p (wi::bit_and (@3, zero_mask_not), wi::zero (prec)))
+     (op1 @4 { wide_int_to_tree (type, cst_emit); }))))))))

the expression doesn't need a :s then obviously.

Yeah makes sense.

Thanks and sorry for the delay in reviewing this.
Richard.


Thank you for all the comments!

gcc/ChangeLog:

2015-08-03  Andre Vieira  <andre.simoesdiasvie...@arm.com>

   * match.pd: Added new patterns:
     ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
     (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)

gcc/testsuite/ChangeLog:

2015-08-03  Andre Vieira  <andre.simoesdiasvie...@arm.com>
             Hale Wang  <hale.w...@arm.com>

   * gcc.dg/tree-ssa/forwprop-33.c: New test.


Reply via email to