On December 17, 2014 12:29:44 PM CET, Marek Polacek <pola...@redhat.com> wrote: >On Wed, Dec 17, 2014 at 10:12:06AM +0100, Richard Biener wrote: >> On Wed, 17 Dec 2014, Jakub Jelinek wrote: >> >> > On Wed, Dec 17, 2014 at 09:46:44AM +0100, Marek Polacek wrote: >> > > This adds a transformation of (x & ~m) | (y & m), which (on >GIMPLE) >> > > has 4 ops to x ^ ((x ^ y) & m) that has 3 ops on GIMPLE. In >fact, >> > > the latter is then transformed to (a ^ b) & m ^ a, which also has >3 >> > > ops. >> > >> > So why don't you transform it to ((x ^ y) & m) ^ x directly (i.e. >swap >> > @0 with the (bit_and ...) ? >> >> Yeah, I'd prefer that as well. Also you probably should make sure >> the (x & ~m) and (y & m) exprs are single-use (see other examples >> in match.pd using has_single_use). > >Ok. We have only one use of has_single_use so far in match.pd, so I >hope what I have is correct... I suppose the point is not to hinder >CSEing those two subexpressions if they have multiple uses. > >> > BTW, the advantage of (x & ~m) | (y & m) form is that there are >fewer >> > dependencies, at least if the target has andn instruction (e.g. >SPARC, >> > Alpha, PA, MMIX, IA64 have them), just the final or depends on the >result of >> > both and and andnot instructions. While in the 2 xor forms, and >> > depends on the first xor result and the second xor depends on the >and >> > result, so if there are multiple ALU units available, the and | >andn >> > form can use both the units, while the second one is unnecessarily >> > serialized. >> >> Right - I've pointed this out as well. Of course this simply >> asks for more clever expansion of ((x ^ y) & m) ^ x rather >> than disabling this transform. Should be doable with TER >> in the BIT_XOR_EXPR expansion code. Or figure out if teaching >> combine/simplify RTX is better. Of course this form requires >> one more register ... which means with high register pressure >> or on register starved machines the GIMPLE canonical form might >> be better (in some cases). Which means that LRA should know >> how to transform it back? (interesting kind of "reload" ;)) > >I'd rather leave the expansion/combine stuff to someone else ;). > >Bootstrapped/regtested on x86_64-linux and ppc64-linux, ok for trunk?
OK. Thanks, Richard. >2014-12-17 Marek Polacek <pola...@redhat.com> > > PR middle-end/63568 > * match.pd: Add (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x pattern. > > * gcc.dg/pr63568.c: New test. > >diff --git gcc/match.pd gcc/match.pd >index dbca99e..4d4bc9f 100644 >--- gcc/match.pd >+++ gcc/match.pd >@@ -382,6 +382,13 @@ along with GCC; see the file COPYING3. If not see > (bit_not (bit_not @0)) > @0) > >+/* (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x */ >+(simplify >+ (bit_ior:c (bit_and:c@3 @0 (bit_not @2)) (bit_and:c@4 @1 @2)) >+ (if ((TREE_CODE (@3) != SSA_NAME || has_single_use (@3)) >+ && (TREE_CODE (@4) != SSA_NAME || has_single_use (@4))) >+ (bit_xor (bit_and (bit_xor @0 @1) @2) @0))) >+ > > /* Associate (p +p off1) +p off2 as (p +p (off1 + off2)). */ > (simplify >diff --git gcc/testsuite/gcc.dg/pr63568.c >gcc/testsuite/gcc.dg/pr63568.c >index e69de29..fb42bea 100644 >--- gcc/testsuite/gcc.dg/pr63568.c >+++ gcc/testsuite/gcc.dg/pr63568.c >@@ -0,0 +1,54 @@ >+/* PR middle-end/63568 */ >+/* { dg-do compile } */ >+/* { dg-options "-fdump-tree-original" } */ >+ >+int >+fn1 (int a, int b, int m) >+{ >+ return (a & ~m) | (b & m); >+} >+ >+int >+fn2 (int a, int b, int m) >+{ >+ return (a & ~m) | (m & b); >+} >+ >+int >+fn3 (int a, int b, int m) >+{ >+ return (~m & a) | (m & b); >+} >+ >+int >+fn4 (int a, int b, int m) >+{ >+ return (~m & a) | (b & m); >+} >+ >+int >+fn5 (int a, int b, int m) >+{ >+ return (b & m) | (a & ~m); >+} >+ >+int >+fn6 (int a, int b, int m) >+{ >+ return (m & b) | (a & ~m); >+} >+ >+int >+fn7 (int a, int b, int m) >+{ >+ return (m & b) | (~m & a); >+} >+ >+int >+fn8 (int a, int b, int m) >+{ >+ return (b & m) | (~m & a); >+} >+ >+/* { dg-final { scan-tree-dump-not " \\| " "original" } } */ >+/* { dg-final { cleanup-tree-dump "original" } } */ > > Marek