[Bug target/96403] [nvptx] Less optimal code in v2si-cvt.c after setting TARGET_TRULY_NOOP_TRUNCATION to false

vries at gcc dot gnu.org Fri, 31 Jul 2020 09:58:14 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96403


Tom de Vries <vries at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |nvptx

--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> ---
Looking at the first regression, we have without the patch:
...
//(insn 9 5 12 2
//    (set (reg:HI 27 [ arg ])
//         (subreg:HI (reg/v:V2SI 25 [ arg ]) 0))
//     "v2si-cvt.c":11:32 5 {*movhi_insn}
//     (nil))
                cvt.u16.u32     %r27, %r25.x; // 9 [c=12] *movhi_insn/0
...
and with the patch:
...
//(insn 8 5 9 2 
//    (set (reg:DI 26 [ arg ])
//         (subreg:DI (reg/v:V2SI 25 [ arg ]) 0))
//     "v2si-cvt.c":11:32 7 {*movdi_insn}
//     (nil))
                mov.b64 %r26, %r25;     // 8    [c=12]  *movdi_insn/0

//(insn 9 8 13 2
//    (set (reg:HI 27 [ arg ])
//         (truncate:HI (reg:DI 26 [ arg ])))
//     "v2si-cvt.c":11:32 32 {truncdihi2}
//     (expr_list:REG_DEAD (reg:DI 26 [ arg ])
//        (nil)))
               cvt.u16.u64     %r27, %r26;
...

I guess we would like to generate this instead:
...
//(insn 9 8 13 2
//    (set (reg:HI 27 [ arg ])
//         (truncate:HI (subreg:SI (reg/v:V2SI 25 [ arg ]) 0))
//     "v2si-cvt.c":11:32 32 {truncdihi2}
//     (expr_list:REG_DEAD (reg:DI 26 [ arg ])
//        (nil)))
               cvt.u16.u32     %r26, %r25.x;
...

Debugging combine, we hit TARGET_MODES_TIEABLE_P as a barrier, but after
enabling that we have a slightly different inns (the store has merged with the
truncate), where combine also fails:
...
Trying 8 -> 13:
    8: r26:DI=r25:V2SI#0
   13: [%frame:DI]=trunc(r26:DI)
      REG_DEAD r26:DI
Failed to match this instruction:
(set (mem/v/c:HI (reg/f:DI 2 %frame) [2 s+0 S2 A128])
    (truncate:HI (subreg:DI (reg/v:V2SI 25 [ arg ]) 0)))
...
I've tried enabling subregs in truncsi<HI> but that didn't help either.

I managed to get the desired code using this (to match the pattern tried by
combine):
...
@@ -372,11 +386,26 @@

 (define_insn "truncdi<mode>2"
   [(set (match_operand:QHSIM 0 "nvptx_nonimmediate_operand" "=R,m")
-       (truncate:QHSIM (match_operand:DI 1 "nvptx_register_operand" "R,R")))]
+       (truncate:QHSIM (match_operand:DI 1 "register_operand" "R,Q")))]
   ""
-  "@
-   %.\\tcvt%t0.u64\\t%0, %1;
-   %.\\tst%A0.u%T0\\t%0, %1;"
+{
+    if (which_alternative == 0)
+      {
+        if (SUBREG_P (operands[1])
+           && GET_MODE (SUBREG_REG (operands[1])) == V2SImode)
+          return "%.\\tcvt%t0.u32\\t%0, %1.x;";
+        else
+          return "%.\\tcvt%t0.u64\\t%0, %1;";
+      }
+    else
+      {
+        if (SUBREG_P (operands[1])
+           && GET_MODE (SUBREG_REG (operands[1])) == V2SImode)
+          return "   %.\\tst%A0.u%T0\\t%0, %1.x;";
+        else
+          return "   %.\\tst%A0.u%T0\\t%0, %1;";
+      }
+}
   [(set_attr "subregs_ok" "true")])

 ;; Integer arithmetic
...
But I would hope there's a cleaner way.

[Bug target/96403] [nvptx] Less optimal code in v2si-cvt.c after setting TARGET_TRULY_NOOP_TRUNCATION to false

Reply via email to