https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005
--- Comment #8 from Tom de Vries <vries at gcc dot gnu.org> --- I've tried the workaround (posting here only the patch for trunchiqi2, the pattern that was actually triggered): ... @@ -424,9 +436,21 @@ [(set (match_operand:QI 0 "nvptx_nonimmediate_operand" "=R,m") (truncate:QI (match_operand:HI 1 "nvptx_register_operand" "R,R")))] "" - "@ - %.\\tcvt%t0.u16\\t%0, %1; - %.\\tst%A0.u8\\t%0, %1;" + { + if (which_alternative == 1) + return "%.\\tst%A0.u8\\t%0, %1;"; + + const char *cvt = "%.\\tcvt%t0.u16\\t%0, %1;"; + if (1) + { + /* Workaround https://developer.nvidia.com/nvidia_bug/3527713. */ + output_asm_insn ("%.\\tcvt.s32.s16\\t%0, %1;", operands); + output_asm_insn ("%.\\tand.b32\\t%0, %0,0x0000ffff;", operands); + return ""; + } + + return cvt; + } [(set_attr "subregs_ok" "true")]) (define_insn "truncsi<mode>2" ... but it didn't work for the test-case from comment 0. Something that does seem to work for both cases, and the unreduced builtin-arith-overflow-15.c: ... diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 6c399dea1908..c33903688a5d 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -507,7 +507,13 @@ (minus:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R") (match_operand:HSDIM 2 "nvptx_register_operand" "R")))] "" - "%.\\tsub%t0\\t%0, %1, %2;") + { + if (GET_MODE (operands[0]) == HImode) + /* Workaround https://developer.nvidia.com/nvidia_bug/3527713. */ + return "%.\\tsub.s16\\t%0, %1, %2;"; + + return "%.\\tsub%t0\\t%0, %1, %2;"; + }) (define_insn "mul<mode>3" [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") ...