This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as
(truncate:HI (reg:SI)), and closely related variants.  In RTL, a
paradoxical SUBREG where the outermode is wider than the innermode
is like the extensions zero_extend or sign_extend, but where we
don't care about the contents of the extended bits.  Hence, in the
above case, it's convenient to eliminate the SUBREG by tweaking
the explicit TRUNCATE to include more bits from the original REG.

There are three possibilities: as above where outermode < origmode,
where we can eliminate the paradoxical SUBREG using a wider TRUNCATE;
secondly when outermode == origmode, we can eliminate both the SUBREG
and the TRUNCATE, so (subreg:SI (truncate:QI (reg:SI))) becomes just
(reg:SI); and finally when outermode > origmode, we can eliminate
the TRUNCATE, and generate a paradoxical subreg from the original
source, so (subreg:DI (truncate:QI (reg:SI))) becomes the simpler
(subreg:DI (reg:SI)).

An example benefit of this simplification is that on nvptx-none,
int foo(char x, char y) { return (x=='x') && (y=='y'); }
shrinks from 17 instructions to 13 instructions.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures, and also on nvptx-none with
no new failures.  Ok for mainline?


2021-09-05  Roger Sayle  <ro...@nextmovesoftware.com>

gcc/ChangeLog
        * simplify-rtx.c (simplify_subreg): Optimize paradoxical subreg
        extensions of TRUNCATE.

Roger
--

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index ebad5cb..3040136 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -7403,13 +7403,34 @@ simplify_context::simplify_subreg (machine_mode 
outermode, rtx op,
          return immed_wide_int_const (val, int_outermode);
        }
 
-      if (GET_MODE_PRECISION (int_outermode)
-         < GET_MODE_PRECISION (int_innermode))
+      unsigned int outerprec = GET_MODE_PRECISION (int_outermode);
+      unsigned int innerprec = GET_MODE_PRECISION (int_innermode);
+      if (outerprec < innerprec)
        {
          rtx tem = simplify_truncation (int_outermode, op, int_innermode);
          if (tem)
            return tem;
        }
+      else if (outerprec > innerprec
+              && GET_CODE (op) == TRUNCATE)
+       {
+         /* Optimize paradoxial subreg extension of a truncate, where
+            we can eliminate the truncation, or widen the truncation
+            to the desired mode.  */
+         scalar_int_mode int_origmode;
+         rtx orig = XEXP (op, 0);
+         if (is_a <scalar_int_mode> (GET_MODE (orig), &int_origmode))
+           {
+             unsigned int origprec = GET_MODE_PRECISION (int_origmode);
+             if (outerprec < origprec)
+               return simplify_gen_unary (TRUNCATE, outermode, orig,
+                                          GET_MODE (orig));
+             else if (outerprec > origprec)
+               return lowpart_subreg (int_outermode, orig, int_origmode);
+             else if (outermode == GET_MODE (orig))
+               return orig;
+           }
+       }
     }
 
   /* If OP is a vector comparison and the subreg is not changing the

Reply via email to