My apologies in advance for a middle-end patch without a test case. The patch below
implements a simple/safe missing transformation in the RTL optimizers, that transforms (A&C)^(B&C) into the equivalent (A^B)&C, when C doesn't side-effect, such as a constant. I originally identified this opportunity in gfortran, where: integer function foo(i) result(res) integer(kind=16), intent(in) :: i res = poppar(i) end function currently on x86_64 with -O2 -march=corei7 gfortran produces: foo_: popcntq (%rdi), %rdx popcntq 8(%rdi), %rax andl $1, %edx andl $1, %eax xorl %edx, %eax ret But with this patch, now produces: foo_: popcntq (%rdi), %rdx popcntq 8(%rdi), %rax xorl %edx, %eax andl $1, %eax ret The equivalent C/C++ testcase is: unsigned int foo(unsigned int x, unsigned int y) { return __builtin_parityll(x) ^ __builtin_parityll(y); } where GCC currently generates: foo: movl %esi, %eax movl %edi, %edi popcntq %rdi, %rdi popcntq %rax, %rax andl $1, %edi andl $1, %eax xorl %edi, %eax ret and with this patch, it instead generates: foo: movl %esi, %eax movl %edi, %edi popcntq %rdi, %rdi popcntq %rax, %rax xorl %edi, %eax andl $1, %eax ret The trouble is I'm just about to submit a patch to improve constant folding of parity in the middle-end's match.pd, which will generate different RTL code sequences for the above two examples. Hopefully, folks agree it's better to have a RTL optimization that's difficult to test, than not perform this simplification at all. The semantics/correctness of this transformation are tested by the run-time tests in gfortran.dg/popcnt_poppar_2.F90 This patch has been tested with "make bootstrap" and "make -k check" on x86_64-pc-linux-gnu with no regressions. If approved, I'd very much appreciate it if someone could commit this change for me. 2020-06-11 Roger Sayle <ro...@nextmovesoftware.com> * simplify-rtx.c (simplify_binary_operation_1): Simplify (X & C) ^ (Y & C) to (X ^ Y) & C, when C is simple (i.e. a constant). Thanks very much in advance, Roger -- Roger Sayle NextMove Software Cambridge, UK
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 28c2dc6..ccf5f6d 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -3128,6 +3128,17 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, mode); } + /* Convert (xor (and A C) (and B C)) into (and (xor A B) C). */ + if (GET_CODE (op0) == AND + && GET_CODE (op1) == AND + && rtx_equal_p (XEXP (op0, 1), XEXP (op1, 1)) + && ! side_effects_p (XEXP (op0, 1))) + return simplify_gen_binary (AND, mode, + simplify_gen_binary (XOR, mode, + XEXP (op0, 0), + XEXP (op1, 0)), + XEXP (op0, 1)); + /* Convert (xor (and A B) B) to (and (not A) B). The latter may correspond to a machine insn or result in further simplifications if B is a constant. */