On 2/22/24 2:08 PM, Jakub Jelinek wrote:
On Thu, Feb 22, 2024 at 12:59:18PM -0800, Greg McGary wrote:
The sign bit of a sign-extending load cannot be known until runtime,
so don't attempt to simplify it in the combiner.

2024-02-22  Greg McGary  <g...@rivosinc.com>

         PR rtl-optimization/113010
         * combine.cc (simplify_comparison): Don't simplify high part
        of paradoxical-SUBREG-of-MEM on machines that sign-extend loads

         * gcc.c-torture/execute/pr113010.c: New test.
---
  gcc/combine.cc                                 | 10 ++++++++--
  gcc/testsuite/gcc.c-torture/execute/pr113010.c |  9 +++++++++
  2 files changed, 17 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr113010.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 812553c091e..736206242e1 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -12550,9 +12550,15 @@ simplify_comparison (enum rtx_code code, rtx *pop0, 
rtx *pop1)
            }
/* If the inner mode is narrower and we are extracting the low part,
-            we can treat the SUBREG as if it were a ZERO_EXTEND.  */
+            we can treat the SUBREG as if it were a ZERO_EXTEND ...  */
          if (paradoxical_subreg_p (op0))
-           ;
+           {
+             /* ... except we can't treat as ZERO_EXTEND when a machine
+                automatically sign-extends loads. */
+             if (MEM_P (SUBREG_REG (op0)) && WORD_REGISTER_OPERATIONS
+                 && load_extend_op (inner_mode) == SIGN_EXTEND)
+               break;
That doesn't feel sufficient.  Like in the PR112758 patch, I believe
for WORD_REGISTER_OPERATIONS you should treat it as a ZERO_EXTEND only
if MEM_P (SUBREG_REG (op0)) && load_extend_op (inner_mode) == ZERO_EXTEND
or if GET_MODE_PRECISION (inner_mode) is known to be >= BITS_PER_WORD.

        Jakub

The gist of your comment is that my patch was missing the test for width
of inner_mode vs. BITS_PER_WORD. Here's a revision. Looks good to you?

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 812553c091e..4626f2edae9 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -12550,9 +12550,17 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
            }

          /* If the inner mode is narrower and we are extracting the low part,
-            we can treat the SUBREG as if it were a ZERO_EXTEND. */
+            we can treat the SUBREG as if it were a ZERO_EXTEND ...  */
          if (paradoxical_subreg_p (op0))
-           ;
+           {
+             /* ... except we can't do that for loads on machines
+                that don't automatically zero-extend loads. */
+             if (WORD_REGISTER_OPERATIONS
+                 && GET_MODE_PRECISION (inner_mode) < BITS_PER_WORD
+                 && MEM_P (SUBREG_REG (op0))
+                 && load_extend_op (inner_mode) != ZERO_EXTEND)
+               break;
+           }
          else if (subreg_lowpart_p (op0)
                   && GET_MODE_CLASS (mode) == MODE_INT
                   && is_int_mode (GET_MODE (SUBREG_REG (op0)), &inner_mode)

Reply via email to