https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96403
Tom de Vries <vries at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target| |nvptx --- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> --- Looking at the first regression, we have without the patch: ... //(insn 9 5 12 2 // (set (reg:HI 27 [ arg ]) // (subreg:HI (reg/v:V2SI 25 [ arg ]) 0)) // "v2si-cvt.c":11:32 5 {*movhi_insn} // (nil)) cvt.u16.u32 %r27, %r25.x; // 9 [c=12] *movhi_insn/0 ... and with the patch: ... //(insn 8 5 9 2 // (set (reg:DI 26 [ arg ]) // (subreg:DI (reg/v:V2SI 25 [ arg ]) 0)) // "v2si-cvt.c":11:32 7 {*movdi_insn} // (nil)) mov.b64 %r26, %r25; // 8 [c=12] *movdi_insn/0 //(insn 9 8 13 2 // (set (reg:HI 27 [ arg ]) // (truncate:HI (reg:DI 26 [ arg ]))) // "v2si-cvt.c":11:32 32 {truncdihi2} // (expr_list:REG_DEAD (reg:DI 26 [ arg ]) // (nil))) cvt.u16.u64 %r27, %r26; ... I guess we would like to generate this instead: ... //(insn 9 8 13 2 // (set (reg:HI 27 [ arg ]) // (truncate:HI (subreg:SI (reg/v:V2SI 25 [ arg ]) 0)) // "v2si-cvt.c":11:32 32 {truncdihi2} // (expr_list:REG_DEAD (reg:DI 26 [ arg ]) // (nil))) cvt.u16.u32 %r26, %r25.x; ... Debugging combine, we hit TARGET_MODES_TIEABLE_P as a barrier, but after enabling that we have a slightly different inns (the store has merged with the truncate), where combine also fails: ... Trying 8 -> 13: 8: r26:DI=r25:V2SI#0 13: [%frame:DI]=trunc(r26:DI) REG_DEAD r26:DI Failed to match this instruction: (set (mem/v/c:HI (reg/f:DI 2 %frame) [2 s+0 S2 A128]) (truncate:HI (subreg:DI (reg/v:V2SI 25 [ arg ]) 0))) ... I've tried enabling subregs in truncsi<HI> but that didn't help either. I managed to get the desired code using this (to match the pattern tried by combine): ... @@ -372,11 +386,26 @@ (define_insn "truncdi<mode>2" [(set (match_operand:QHSIM 0 "nvptx_nonimmediate_operand" "=R,m") - (truncate:QHSIM (match_operand:DI 1 "nvptx_register_operand" "R,R")))] + (truncate:QHSIM (match_operand:DI 1 "register_operand" "R,Q")))] "" - "@ - %.\\tcvt%t0.u64\\t%0, %1; - %.\\tst%A0.u%T0\\t%0, %1;" +{ + if (which_alternative == 0) + { + if (SUBREG_P (operands[1]) + && GET_MODE (SUBREG_REG (operands[1])) == V2SImode) + return "%.\\tcvt%t0.u32\\t%0, %1.x;"; + else + return "%.\\tcvt%t0.u64\\t%0, %1;"; + } + else + { + if (SUBREG_P (operands[1]) + && GET_MODE (SUBREG_REG (operands[1])) == V2SImode) + return " %.\\tst%A0.u%T0\\t%0, %1.x;"; + else + return " %.\\tst%A0.u%T0\\t%0, %1;"; + } +} [(set_attr "subregs_ok" "true")]) ;; Integer arithmetic ... But I would hope there's a cleaner way.