Hi! I'm working towards reviewing some (of Roger's) GCC/nvptx patches, and therefore learning some more GCC/nvptx, and generally RTL etc., and the conventions around it. Please bear with me asking "obvious" questions.
For the PTX bit reverse instruction, GCC/nvptx currently ("forever") defines: (define_insn "bitrev<mode>2" [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R") (unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")] UNSPEC_BITREV))] "" "%.\\tbrev.b%T0\\t%0, %1;") ..., with: (define_predicate "nvptx_register_operand" (match_code "reg") { return register_operand (op, mode); }) (define_constraint "R" "A pseudo register." (match_code "reg")) That is, only a register input operand is permitted, not an immediate. However, I don't see such a restriction in the manual, <https://docs.nvidia.com/cuda/parallel-thread-execution/#integer-arithmetic-instructions-brev>. If I change that 'define_insn': - (unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")] + (unspec:SDIM [(match_operand:SDIM 1 "nvptx_nonmemory_operand" "Ri")] ..., with (existing): (define_predicate "nvptx_nonmemory_operand" (match_code "reg,const_int,const_double") { return (REG_P (op) ? register_operand (op, mode) : immediate_operand (op, mode)); }) ..., then a simple code: return __builtin_nvptx_brev(0xe6a2c480) != 0x01234567; ... for '-O1' subsequently changes: [...] .reg .u32 %r22; -.reg .u32 %r25; .reg .pred %r29; -mov.u32 %r25,-425540480; -brev.b32 %r22,%r25; +brev.b32 %r22,-425540480; setp.ne.u32 %r29,%r22,19088743; [...] (I understand that, in the end, that's probably equivalent, assuming that the later PTX -> SASS compiler does the same optimization, but I find it easier to read: one less '.reg' to keep track of textually/mentally.) Does that make sense to you, too? I'd then extend my attached "[WIP] nvptx: Also allow immediate input operand to 'bitrev<mode>2'" with a test case. (Similarly then, a number of other GCC/nvptx 'define_insn's to be reviewed/revised, later on.) Relatedly, I see that a lot of GCC/nvptx' two input operands instructions ('add<mode>3', etc.) similarly do allow for their second input operand to be an immediate in addition to a register. I suppose that only allowing for the second input operand to be an immediate is sufficient/desirable: reduces load on the matching; we shouldn't ever end up with 'IMM + IMM', for example: should've optimized that before. As I learn from <https://gcc.gnu.org/onlinedocs/gccint/Insn-Canonicalizations.html>, "for commutative [...] operators, a constant is always made the second operand". (Confirmed to ICE if swapping that around for 'add<mode>3'; so, that's all as expected.) Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
>From 4a5138eb61a026ad6bc5470a648ebc596af1b1ed Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <tho...@codesourcery.com> Date: Mon, 4 Sep 2023 16:48:53 +0200 Subject: [PATCH] [WIP] nvptx: Also allow immediate input operand to 'bitrev<mode>2' --- gcc/config/nvptx/nvptx.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 1bb93045403..e1c822f2ea8 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -636,7 +636,7 @@ (define_insn "bitrev<mode>2" [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R") - (unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")] + (unspec:SDIM [(match_operand:SDIM 1 "nvptx_nonmemory_operand" "Ri")] UNSPEC_BITREV))] "" "%.\\tbrev.b%T0\\t%0, %1;") -- 2.34.1