Hi!

As discussed in the PR, the following testcase is miscompiled on RISC-V
64-bit, because num_sign_bit_copies in one spot pretends the bits in
a paradoxical SUBREG beyond SUBREG_REG SImode are all sign bit copies:
5444              /* For paradoxical SUBREGs on machines where all register 
operations
5445                 affect the entire register, just look inside.  Note that 
we are
5446                 passing MODE to the recursive call, so the number of sign 
bit
5447                 copies will remain relative to that mode, not the inner 
mode.
5448    
5449                 This works only if loads sign extend.  Otherwise, if we 
get a
5450                 reload for the inner part, it may be loaded from the 
stack, and
5451                 then we lose all sign bit copies that existed before the 
store
5452                 to the stack.  */
5453              if (WORD_REGISTER_OPERATIONS
5454                  && load_extend_op (inner_mode) == SIGN_EXTEND
5455                  && paradoxical_subreg_p (x)
5456                  && MEM_P (SUBREG_REG (x)))
and then optimizes based on that in one place, but then the
r7-1077 optimization triggers in and treats all the upper bits in
paradoxical SUBREG as undefined and performs based on that another
optimization.  The r7-1077 optimization is done only if SUBREG_REG
is either a REG or MEM, from the discussions in the PR seems that if
it is a REG, the upper bits in paradoxical SUBREG on
WORD_REGISTER_OPERATIONS targets aren't really undefined, but we can't
tell what values they have because we don't see the operation which
computed that REG, and for MEM it depends on load_extend_op - if
it is SIGN_EXTEND, the upper bits are sign bit copies and so something
not really usable for the optimization, if ZERO_EXTEND, they are zeros
and it is usable for the optimization, for UNKNOWN I think it is better
to punt as well.

So, the following patch basically disables the r7-1077 optimization
on WORD_REGISTER_OPERATIONS unless we know it is still ok for sure,
which is either if sub_width is >= BITS_PER_WORD because then the
WORD_REGISTER_OPERATIONS rules don't apply, or load_extend_op on a MEM
is ZERO_EXTEND.

Bootstrapped/regtested on x86_64-linux and i686-linux (neither of which
is WORD_REGISTER_OPERATIONS target), tested on the testcase using
cross to riscv64-linux but don't have an easy access to a
WORD_REGISTER_OPERATIONS target to bootstrap/regtest it there.

Ok for trunk?

2023-12-22  Jakub Jelinek  <ja...@redhat.com>

        PR rtl-optimization/112758
        * combine.cc (make_compopund_operation_int): Optimize AND of a SUBREG
        based on nonzero_bits of SUBREG_REG and constant mask on
        WORD_REGISTER_OPERATIONS targets only if it is a zero extending
        MEM load.

        * gcc.c-torture/execute/pr112758.c: New test.

--- gcc/combine.cc.jj   2023-12-11 23:52:03.528513943 +0100
+++ gcc/combine.cc      2023-12-21 20:25:45.461737423 +0100
@@ -8227,12 +8227,20 @@ make_compound_operation_int (scalar_int_
          int sub_width;
          if ((REG_P (sub) || MEM_P (sub))
              && GET_MODE_PRECISION (sub_mode).is_constant (&sub_width)
-             && sub_width < mode_width)
+             && sub_width < mode_width
+             && (!WORD_REGISTER_OPERATIONS
+                 || sub_width >= BITS_PER_WORD
+                 /* On WORD_REGISTER_OPERATIONS targets the bits
+                    beyond sub_mode aren't considered undefined,
+                    so optimize only if it is a MEM load when MEM loads
+                    zero extend, because then the upper bits are all zero.  */
+                 || (MEM_P (sub)
+                     && load_extend_op (sub_mode) == ZERO_EXTEND)))
            {
              unsigned HOST_WIDE_INT mode_mask = GET_MODE_MASK (sub_mode);
              unsigned HOST_WIDE_INT mask;
 
-             /* original AND constant with all the known zero bits set */
+             /* Original AND constant with all the known zero bits set.  */
              mask = UINTVAL (XEXP (x, 1)) | (~nonzero_bits (sub, sub_mode));
              if ((mask & mode_mask) == mode_mask)
                {
--- gcc/testsuite/gcc.c-torture/execute/pr112758.c.jj   2023-12-21 
21:01:43.780755959 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr112758.c      2023-12-21 
21:01:30.521940358 +0100
@@ -0,0 +1,15 @@
+/* PR rtl-optimization/112758 */
+
+int a = -__INT_MAX__ - 1;
+
+int
+main ()
+{
+  if (-__INT_MAX__ - 1U == 0x80000000ULL)
+    {
+      unsigned long long b = 0xffff00ffffffffffULL;
+      if ((b & a) != 0xffff00ff80000000ULL)
+       __builtin_abort ();
+    }
+  return 0;
+}

        Jakub

Reply via email to