g:c23a9c87cc62bd177fd0d4db6ad34b34e1b9a31f uses nonzero_bits
information to convert sign_extends into zero_extends.
That change is semantically correct in itself, but for the
testcase in the PR, it leads to a series of unfortunate events,
as described below.

We try to combine:

Trying 24 -> 25:
   24: r116:DI=sign_extend(r115:SI)
      REG_DEAD r115:SI
   25: r117:SI=[r116:DI*0x4+r118:DI]
      REG_DEAD r116:DI
      REG_EQUAL [r116:DI*0x4+`constellation_64qam']

which previously succeeded, giving:

(set (reg:SI 117 [ constellation_64qam[_5] ])
    (mem/u:SI (plus:DI (mult:DI (sign_extend:DI (reg:SI 115))
                (const_int 4 [0x4]))
            (reg/f:DI 118)) [1 constellation_64qam[_5]+0 S4 A32]))

However, nonzero_bits can tell that only the low 6 bits of r115
can be nonzero.  The commit above therefore converts the sign_extend
to a zero_extend.  Using the same nonzero_bits information, we then
"expand" the zero_extend to:

  (and:DI (subreg:DI (reg:SI r115) 0)
          (const_int 63))

Substituting into the mult gives the unsimplified expression:

  (mult:DI (and:DI (subreg:DI (reg:SI r115) 0)
                   (const_int 63))
           (const_int 4))

The simplification rules for mult convert this to an ashift by 2.
Then, this rule in simplify_shift_const_1:

          /* If we have (shift (logical)), move the logical to the outside
             to allow it to possibly combine with another logical and the
             shift to combine with another shift.  This also canonicalizes to
             what a ZERO_EXTRACT looks like.  Also, some machines have
             (and (shift)) insns.  */

moves the shift inside the "and", so that the expression becomes:

  (and:DI (ashift:DI (subreg:DI (reg:SI r115) 0)
                     (const_int 2))
          (const_int 252))

We later recanonicalise to a mult (since this is an address):

  (and:DI (mult:DI (subreg:DI (reg:SI r115) 0)
                   (const_int 4))
          (const_int 252))

But we fail to transform this back to the natural substitution:

  (mult:DI (zero_extend:DI (reg:SI r115))
           (const_int 4))

There are several other cases in which make_compound_operation
needs to look more than one level down in order to complete a
compound operation.  For example:

(a) the ashiftrt handling uses extract_left_shift to look through
    things like logic ops in order to find a partnering ashift
    operation

(b) the "and" handling looks through subregs, xors and iors
    to find a partnerning lshiftrt

This patch takes the same approach for mult.

gcc/
        PR rtl-optimization/106594
        * combine.cc (make_compound_operation_and): Look through
        multiplications by a power of two.

gcc/testsuite/
        * gcc.target/aarch64/pr106594.c: New test.
---
 gcc/combine.cc                              | 17 +++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/pr106594.c | 20 ++++++++++++++++++++
 2 files changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr106594.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 7d446d02cb4..36d04ad6703 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -7996,6 +7996,23 @@ make_compound_operation_and (scalar_int_mode mode, rtx x,
        break;
       }
 
+    case MULT:
+      /* Recurse through a power of 2 multiplication (as can be found
+        in an address), using the relationship:
+
+        (and (mult X 2**N1) N2) == (mult (and X (lshifrt N2 N1)) 2**N1).  */
+      if (CONST_INT_P (XEXP (x, 1))
+         && pow2p_hwi (INTVAL (XEXP (x, 1))))
+       {
+         int shift = exact_log2 (INTVAL (XEXP (x, 1)));
+         rtx sub = make_compound_operation_and (mode, XEXP (x, 0),
+                                                mask >> shift, in_code,
+                                                next_code);
+         if (sub)
+           return gen_rtx_MULT (mode, sub, XEXP (x, 1));
+       }
+      break;
+
     default:
       break;
     }
diff --git a/gcc/testsuite/gcc.target/aarch64/pr106594.c 
b/gcc/testsuite/gcc.target/aarch64/pr106594.c
new file mode 100644
index 00000000000..beda8e050a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr106594.c
@@ -0,0 +1,20 @@
+/* { dg-options "-O2" } */
+
+extern const int constellation_64qam[64];
+
+void foo(int nbits,
+         const char *p_src,
+         int *p_dst) {
+
+  while (nbits > 0U) {
+    char first = *p_src++;
+
+    char index1 = ((first & 0x3) << 4) | (first >> 4);
+
+    *p_dst++ = constellation_64qam[index1];
+
+    nbits--;
+  }
+}
+
+/* { dg-final { scan-assembler {ldr\tw[0-9]+, \[x[0-9]+, w[0-9]+, [su]xtw 
#?2\]} } } */
-- 
2.25.1

Reply via email to