https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005

--- Comment #8 from Tom de Vries <vries at gcc dot gnu.org> ---
I've tried the workaround (posting here only the patch for trunchiqi2, the
pattern that was actually triggered):
...
@@ -424,9 +436,21 @@
   [(set (match_operand:QI 0 "nvptx_nonimmediate_operand" "=R,m")
        (truncate:QI (match_operand:HI 1 "nvptx_register_operand" "R,R")))]
   ""
-  "@
-   %.\\tcvt%t0.u16\\t%0, %1;
-   %.\\tst%A0.u8\\t%0, %1;"
+  {
+    if (which_alternative == 1)
+      return "%.\\tst%A0.u8\\t%0, %1;";
+
+    const char *cvt = "%.\\tcvt%t0.u16\\t%0, %1;";
+    if (1)
+      {
+        /* Workaround https://developer.nvidia.com/nvidia_bug/3527713.  */
+        output_asm_insn ("%.\\tcvt.s32.s16\\t%0, %1;", operands);
+        output_asm_insn ("%.\\tand.b32\\t%0, %0,0x0000ffff;", operands);
+        return "";
+      }
+
+    return cvt;
+  }
   [(set_attr "subregs_ok" "true")])

 (define_insn "truncsi<mode>2"
...
but it didn't work for the test-case from comment 0.

Something that does seem to work for both cases, and the unreduced
builtin-arith-overflow-15.c:
...
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 6c399dea1908..c33903688a5d 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -507,7 +507,13 @@
        (minus:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
                     (match_operand:HSDIM 2 "nvptx_register_operand" "R")))]
   ""
-  "%.\\tsub%t0\\t%0, %1, %2;")
+  {
+    if (GET_MODE (operands[0]) == HImode)
+      /* Workaround https://developer.nvidia.com/nvidia_bug/3527713.  */
+      return "%.\\tsub.s16\\t%0, %1, %2;";
+
+    return "%.\\tsub%t0\\t%0, %1, %2;";
+  })

 (define_insn "mul<mode>3"
   [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
...

Reply via email to