Hi Richard, On 6/2/25 01:27, Richard Sandiford wrote: > Vineet Gupta <vine...@rivosinc.com> writes: >> +CC gcc-patches >> >> On 5/30/25 14:04, Vineet Gupta wrote: >>> Hi Jeff, Richard >>> >>> As part of RISC-V FRM mode switching improvements, I'm running into a >>> behavior >>> in late_combine2 where it is eliminating FRM save/restores when it is >>> desired to >>> keep them. >>> >>> I'm pasting snippet of RTL dumps, could you please see if anything is >>> jumping >>> out from this limited info. >>> In RISC-V backend, FRM is specified as a global_reg and inline asm is the >>> only >>> way for users to achieve the global fesetround() like semantics. >>> >>> src >>> >>> /* -march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize */ >>> #pragma riscv intrinsic "vector" >>> typedef long unsigned int size_t; >>> >>> static void >>> set_frm (int frm) >>> { >>> __asm__ volatile ( "fsrm %0" : :"r"(frm) : "frm"); >>> } >>> >>> vfloat32m1_t __attribute__ ((noinline)) >>> test_float_point_frm_run_1 (vfloat32m1_t op1, vfloat32m1_t op2, size_t >>> vl) >>> { >>> vfloat32m1_t result; >>> /* global mode set */ >>> set_frm (0); >>> /* intrinsic for set mode 1 should be local and restored back to >>> global 0 >>> upon return */ >>> result = __riscv_vfadd_vv_f32m1_rm (op1, result, 1, vl); >>> return result; >>> } >>> >>> >>> RTL dump >>> >>> mode-sw >>> >>> (insn 9 8 18 2 (parallel [ >>> (asm_operands/v ("fsrm %0") ("") 0 [ >>> (reg:SI 139) >>> ] >>> [ >>> (asm_input:SI ("r") frm-run-1.c:33) >>> ] >>> [] frm-run-1.c:33) >>> (clobber (reg:V4096QI 69 frm)) >>> ]) "frm-run-1.c":33:3 -1 >>> (expr_list:REG_DEAD (reg:SI 139) >>> (nil))) >>> >>> (insn 27 10 28 2 (set (reg:SI 144) >>> (reg:SI 69 frm)) "frm-run-1.c":43:1 -1 >>> (nil)) >>> (insn 28 27 14 2 (set (reg:SI 69 frm) >>> (const_int 1 [0x1])) "frm-run-1.c":43:1 -1 >>> (nil)) > It looks like insn 14 has been snipped. What was it? Did it use FRM?
Its corresponds to a Vector intrinsic which does use the static FRM set to 1 in insn 28. (insn 14 19 15 2 (set (reg/i:RVVM1SF 104 v8) (if_then_else:RVVM1SF (unspec:RVVMF32BI [ (const_vector:RVVMF32BI repeat [ (const_int 1 [0x1]) ]) (reg:DI 10 a0 [orig:143 vl ] [143]) (const_int 2 [0x2]) repeated x2 (const_int 0 [0]) (const_int 1 [0x1]) (reg:SI 66 vl) (reg:SI 67 vtype) (reg:SI 69 frm) ] UNSPEC_VPREDICATE) (plus:RVVM1SF (reg:RVVM1SF 104 v8 [orig:142 op1 ] [142]) (reg/v:RVVM1SF 105 v9 [orig:134 result ] [134])) (unspec:RVVM1SF [ (reg:DI 0 zero) ] UNSPEC_VUNDEF))) "frm-run-1.c":43:1 14638 {pred_addrvvm1sf} (nil)) > If not, then... > >>> (insn 29 14 24 2 (set (reg:SI 69 frm) >>> (reg:SI 144)) -1 >>> (nil)) > ...insns 27, 28, and 29 as given above collectively have no effect, > assuming that reg 144 dies in insn 29. The sequence can be removed > without changing the RTL semantics. insn 28 intends to have global side-effect, it changes FRM which is also modeled as a global_reg [] The other two insn 27 and insn 29 are save/restore to preserve the global state. W/o the inline asm in the mix, the same 3 seq insn seq is not optimized to just insn 28. > The dump you give here: > >>> late-combine2 >>> >>> trying to combine definition of r15 in: >>> 27: a5:SI=frm:SI >>> into: >>> 29: frm:SI=a5:SI >>> instruction becomes a no-op: >>> (set (reg:SI 69 frm) >>> (reg:SI 69 frm)) >>> original cost = 4 + 4 (weighted: 8.000000), replacement cost = nop; >>> keeping >>> replacement >>> rescanning insn with uid = 29. >>> updating insn 29 in-place >>> verify found no changes in insn with uid = 29. >>> deleting insn 27 >>> deleting insn with uid = 27 >>> >>> >>> If I comment out the inline asm - it can no longer combine, elimination >>> doesn't >>> happen with expected outcome. >>> >>> trying to combine definition of r15 in: >>> 25: a5:SI=frm:SI >>> into: >>> 27: frm:SI=a5:SI >>> -- cannot satisfy all definitions and uses in insn 27 > ...is from late-combine2, so after RA has completed, whereas the earlier > dump is from mode switching, so it's hard to tell what late-combine2 is > operating on. Could you give the RTL as late-combine2 sees it? > (That would normally be the result of pass_postreload_cse.) Right, it is attached here. Thx, -Vineet
;; Function test_float_point_frm_run_1 (test_float_point_frm_run_1, funcdef_no=3, decl_uid=129081, cgraph_uid=4, symbol_order=3) starting the processing of deferred insns ending the processing of deferred insns test_float_point_frm_run_1 Dataflow summary: ;; fully invalidated by EH 0 [zero] 1 [ra] 3 [gp] 4 [tp] 5 [t0] 6 [t1] 7 [t2] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 28 [t3] 29 [t4] 30 [t5] 31 [t6] 32 [ft0] 33 [ft1] 34 [ft2] 35 [ft3] 36 [ft4] 37 [ft5] 38 [ft6] 39 [ft7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 49 [fa7] 60 [ft8] 61 [ft9] 62 [ft10] 63 [ft11] 66 [vl] 67 [vtype] 68 [vxrm] 69 [frm] 70 [vxsat] 71 [N/A] 72 [N/A] 73 [N/A] 74 [N/A] 75 [N/A] 76 [N/A] 77 [N/A] 78 [N/A] 79 [N/A] 80 [N/A] 81 [N/A] 82 [N/A] 83 [N/A] 84 [N/A] 85 [N/A] 86 [N/A] 87 [N/A] 88 [N/A] 89 [N/A] 90 [N/A] 91 [N/A] 92 [N/A] 93 [N/A] 94 [N/A] 95 [N/A] 96 [v0] 97 [v1] 98 [v2] 99 [v3] 100 [v4] 101 [v5] 102 [v6] 103 [v7] 104 [v8] 105 [v9] 106 [v10] 107 [v11] 108 [v12] 109 [v13] 110 [v14] 111 [v15] 112 [v16] 113 [v17] 114 [v18] 115 [v19] 116 [v20] 117 [v21] 118 [v22] 119 [v23] 120 [v24] 121 [v25] 122 [v26] 123 [v27] 124 [v28] 125 [v29] 126 [v30] 127 [v31] ;; hardware regs used 2 [sp] 68 [vxrm] 69 [frm] ;; regular block artificial uses 2 [sp] ;; eh block artificial uses 2 [sp] 64 [arg] ;; entry block defs 1 [ra] 2 [sp] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 49 [fa7] 68 [vxrm] 69 [frm] ;; exit block uses 1 [ra] 2 [sp] 68 [vxrm] 69 [frm] 104 [v8] ;; regs ever live 0 [zero] 10 [a0] 15 [a5] 66 [vl] 67 [vtype] 69 [frm] 104 [v8] 105 [v9] ;; ref usage r0={2u} r1={1d,1u} r2={1d,2u} r10={1d,2u} r11={1d} r12={1d} r13={1d} r14={1d} r15={3d,2u} r16={1d} r17={1d} r42={1d} r43={1d} r44={1d} r45={1d} r46={1d} r47={1d} r48={1d} r49={1d} r66={2u} r67={2u} r68={1d,1u} r69={4d,3u} r104={1d,3u} r105={1d,1u} ;; total ref usage 48{27d,21u,0e} in 8{8 regular + 0 call} insns. (note 1 0 23 NOTE_INSN_DELETED) (note 23 1 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 5 23 10 2 NOTE_INSN_FUNCTION_BEG) (note 10 5 24 2 NOTE_INSN_DELETED) (note 24 10 8 2 NOTE_INSN_DELETED) (insn 8 24 9 2 (set (reg:SI 15 a5 [139]) (const_int 0 [0])) "frm-run-1.c":33:3 278 {*movsi_internal} (expr_list:REG_EQUIV (const_int 0 [0]) (nil))) (insn 9 8 27 2 (parallel [ (asm_operands/v ("fsrm %0") ("") 0 [ (reg:SI 15 a5 [139]) ] [ (asm_input:SI ("r") frm-run-1.c:33) ] [] frm-run-1.c:33) (clobber (reg:V4096QI 69 frm)) ]) "frm-run-1.c":33:3 -1 (nil)) (insn 27 9 28 2 (set (reg:SI 15 a5 [144]) (reg:SI 69 frm)) "frm-run-1.c":43:1 2829 {frrmsi} (nil)) (insn 28 27 19 2 (set (reg:SI 69 frm) (const_int 1 [0x1])) "frm-run-1.c":43:1 2828 {fsrmsi_restore} (nil)) (insn 19 28 14 2 (set (reg/v:RVVM1SF 105 v9 [orig:134 result ] [134]) (if_then_else:RVVM1SF (unspec:RVVMF32BI [ (const_vector:RVVMF32BI repeat [ (const_int 1 [0x1]) ]) (reg:DI 10 a0 [orig:143 vl ] [143]) (const_int 2 [0x2]) repeated x2 (const_int 0 [0]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (const_vector:RVVM1SF repeat [ (const_double:SF 0.0 [0x0.0p+0]) ]) (unspec:RVVM1SF [ (reg:DI 0 zero) ] UNSPEC_VUNDEF))) "frm-run-1.c":41:12 4494 {*pred_broadcastrvvm1sf_imm} (expr_list:REG_EQUIV (const_vector:RVVM1SF repeat [ (const_double:SF 0.0 [0x0.0p+0]) ]) (nil))) (insn 14 19 29 2 (set (reg/i:RVVM1SF 104 v8) (if_then_else:RVVM1SF (unspec:RVVMF32BI [ (const_vector:RVVMF32BI repeat [ (const_int 1 [0x1]) ]) (reg:DI 10 a0 [orig:143 vl ] [143]) (const_int 2 [0x2]) repeated x2 (const_int 0 [0]) (const_int 1 [0x1]) (reg:SI 66 vl) (reg:SI 67 vtype) (reg:SI 69 frm) ] UNSPEC_VPREDICATE) (plus:RVVM1SF (reg:RVVM1SF 104 v8 [orig:142 op1 ] [142]) (reg/v:RVVM1SF 105 v9 [orig:134 result ] [134])) (unspec:RVVM1SF [ (reg:DI 0 zero) ] UNSPEC_VUNDEF))) "frm-run-1.c":43:1 14638 {pred_addrvvm1sf} (nil)) (insn 29 14 15 2 (set (reg:SI 69 frm) (reg:SI 15 a5 [144])) 2828 {fsrmsi_restore} (nil)) (insn 15 29 30 2 (use (reg/i:RVVM1SF 104 v8)) "frm-run-1.c":43:1 -1 (nil)) (note 30 15 31 NOTE_INSN_DELETED) (note 31 30 0 NOTE_INSN_DELETED) ;; Function main (main, funcdef_no=4, decl_uid=129084, cgraph_uid=5, symbol_order=4) (executed once) rescanning insn with uid = 33. rescanning insn with uid = 9. deleting insn with uid = 33. starting the processing of deferred insns ending the processing of deferred insns main Dataflow summary: ;; fully invalidated by EH 0 [zero] 1 [ra] 3 [gp] 4 [tp] 5 [t0] 6 [t1] 7 [t2] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 28 [t3] 29 [t4] 30 [t5] 31 [t6] 32 [ft0] 33 [ft1] 34 [ft2] 35 [ft3] 36 [ft4] 37 [ft5] 38 [ft6] 39 [ft7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 49 [fa7] 60 [ft8] 61 [ft9] 62 [ft10] 63 [ft11] 66 [vl] 67 [vtype] 68 [vxrm] 69 [frm] 70 [vxsat] 71 [N/A] 72 [N/A] 73 [N/A] 74 [N/A] 75 [N/A] 76 [N/A] 77 [N/A] 78 [N/A] 79 [N/A] 80 [N/A] 81 [N/A] 82 [N/A] 83 [N/A] 84 [N/A] 85 [N/A] 86 [N/A] 87 [N/A] 88 [N/A] 89 [N/A] 90 [N/A] 91 [N/A] 92 [N/A] 93 [N/A] 94 [N/A] 95 [N/A] 96 [v0] 97 [v1] 98 [v2] 99 [v3] 100 [v4] 101 [v5] 102 [v6] 103 [v7] 104 [v8] 105 [v9] 106 [v10] 107 [v11] 108 [v12] 109 [v13] 110 [v14] 111 [v15] 112 [v16] 113 [v17] 114 [v18] 115 [v19] 116 [v20] 117 [v21] 118 [v22] 119 [v23] 120 [v24] 121 [v25] 122 [v26] 123 [v27] 124 [v28] 125 [v29] 126 [v30] 127 [v31] ;; hardware regs used 2 [sp] 68 [vxrm] 69 [frm] ;; regular block artificial uses 2 [sp] ;; eh block artificial uses 2 [sp] 64 [arg] ;; entry block defs 1 [ra] 2 [sp] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 49 [fa7] 68 [vxrm] 69 [frm] ;; exit block uses 1 [ra] 2 [sp] 10 [a0] 68 [vxrm] 69 [frm] ;; regs ever live 0 [zero] 1 [ra] 2 [sp] 10 [a0] 14 [a4] 15 [a5] 66 [vl] 67 [vtype] 68 [vxrm] 69 [frm] 104 [v8] 105 [v9] ;; ref usage r0={2d,1u} r1={3d,1u} r2={1d,6u} r3={2d} r4={2d} r5={2d} r6={2d} r7={2d} r10={5d,3u} r11={3d} r12={3d} r13={3d} r14={4d,1u} r15={7d,4u} r16={3d} r17={3d} r28={2d} r29={2d} r30={2d} r31={2d} r32={2d} r33={2d} r34={2d} r35={2d} r36={2d} r37={2d} r38={2d} r39={2d} r42={3d} r43={3d} r44={3d} r45={3d} r46={3d} r47={3d} r48={3d} r49={3d} r60={2d} r61={2d} r62={2d} r63={2d} r66={2d,1u} r67={2d,1u} r68={3d,3u} r69={4d,3u} r70={2d} r71={2d} r72={2d} r73={2d} r74={2d} r75={2d} r76={2d} r77={2d} r78={2d} r79={2d} r80={2d} r81={2d} r82={2d} r83={2d} r84={2d} r85={2d} r86={2d} r87={2d} r88={2d} r89={2d} r90={2d} r91={2d} r92={2d} r93={2d} r94={2d} r95={2d} r96={2d} r97={1d} r98={1d} r99={1d} r100={1d} r101={1d} r102={1d} r103={1d} r104={3d,1u} r105={3d,2u} r106={2d} r107={2d} r108={2d} r109={2d} r110={2d} r111={2d} r112={2d} r113={2d} r114={2d} r115={2d} r116={2d} r117={2d} r118={2d} r119={2d} r120={1d} r121={1d} r122={1d} r123={1d} r124={1d} r125={1d} r126={1d} r127={1d} ;; total ref usage 244{217d,27u,0e} in 14{12 regular + 2 call} insns. (note 1 0 35 NOTE_INSN_DELETED) (note 35 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 2 35 31 2 NOTE_INSN_FUNCTION_BEG) (note 31 2 34 2 NOTE_INSN_DELETED) (note 34 31 5 2 NOTE_INSN_DELETED) (insn 5 34 6 2 (set (reg:SI 15 a5 [138]) (const_int 4 [0x4])) "frm-run-1.c":33:3 278 {*movsi_internal} (expr_list:REG_EQUIV (const_int 4 [0x4]) (nil))) (insn 6 5 30 2 (parallel [ (asm_operands/v ("fsrm %0") ("") 0 [ (reg:SI 15 a5 [138]) ] [ (asm_input:SI ("r") frm-run-1.c:33) ] [] frm-run-1.c:33) (clobber (reg:V4096QI 69 frm)) ]) "frm-run-1.c":33:3 -1 (nil)) (insn 30 6 8 2 (set (reg:DI 15 a5 [142]) (unspec:DI [ (const_int 32 [0x20]) ] UNSPEC_VLMAX)) "frm-run-1.c":54:3 2825 {vlmax_avldi} (expr_list:REG_EQUIV (unspec:DI [ (const_int 32 [0x20]) ] UNSPEC_VLMAX) (nil))) (insn 8 30 9 2 (set (reg:RVVM1SF 105 v9) (if_then_else:RVVM1SF (unspec:RVVMF32BI [ (const_vector:RVVMF32BI repeat [ (const_int 1 [0x1]) ]) (reg:DI 15 a5 [142]) (const_int 2 [0x2]) repeated x2 (const_int 1 [0x1]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (const_vector:RVVM1SF repeat [ (const_double:SF 0.0 [0x0.0p+0]) ]) (unspec:RVVM1SF [ (reg:DI 0 zero) ] UNSPEC_VUNDEF))) "frm-run-1.c":54:3 4494 {*pred_broadcastrvvm1sf_imm} (nil)) (insn 9 8 7 2 (set (reg:RVVM1SF 104 v8) (reg:RVVM1SF 105 v9)) "frm-run-1.c":54:3 2853 {*movrvvm1sf_whole} (nil)) (insn 7 9 10 2 (set (reg:DI 10 a0) (const_int 8 [0x8])) "frm-run-1.c":54:3 277 {*movdi_64bit} (nil)) (call_insn 10 7 12 2 (parallel [ (set (reg:RVVM1SF 104 v8) (call (mem:SI (symbol_ref:DI ("test_float_point_frm_run_1") [flags 0x3] <function_decl 0x7a756671cb00 test_float_point_frm_run_1>) [0 test_float_point_frm_run_1 S4 A32]) (const_int 0 [0]))) (use (unspec:SI [ (const_int 1 [0x1]) ] UNSPEC_CALLEE_CC)) (clobber (reg:SI 1 ra)) ]) "frm-run-1.c":54:3 468 {call_value_internal} (expr_list:REG_CALL_DECL (symbol_ref:DI ("test_float_point_frm_run_1") [flags 0x3] <function_decl 0x7a756671cb00 test_float_point_frm_run_1>) (expr_list:REG_EH_REGION (const_int 0 [0]) (nil))) (expr_list:RVVM1SF (use (reg:RVVM1SF 104 v8)) (expr_list:RVVM1SF (use (reg:RVVM1SF 105 v9)) (expr_list:DI (use (reg:DI 10 a0)) (nil))))) (insn 12 10 13 2 (set (reg:SI 15 a5 [orig:139 frm ] [139]) (asm_operands/v:SI ("frrm %0") ("=r") 0 [] [] [] frm-run-1.c:20)) "frm-run-1.c":20:3 -1 (nil)) (insn 13 12 11 2 (set (reg:DI 14 a4 [140]) (const_int 4 [0x4])) "frm-run-1.c":8:6 277 {*movdi_64bit} (expr_list:REG_EQUIV (const_int 4 [0x4]) (nil))) (insn 11 13 14 2 (set (reg/v:DI 15 a5 [orig:136 frm ] [136]) (sign_extend:DI (reg:SI 15 a5 [orig:139 frm ] [139]))) "frm-run-1.c":20:3 127 {*extendsidi2_internal} (nil)) (jump_insn 14 11 15 2 (set (pc) (if_then_else (eq (reg/v:DI 15 a5 [orig:136 frm ] [136]) (reg:DI 14 a4 [140])) (label_ref:DI 39) (pc))) "frm-run-1.c":8:6 371 {*branchdi} (int_list:REG_BR_PROB 1073741831 (nil)) -> 39) (note 15 14 16 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (call_insn 16 15 17 3 (parallel [ (call (mem:SI (symbol_ref:DI ("abort") [flags 0x41] <function_decl 0x7a756bd64100 __builtin_abort>) [0 __builtin_abort S4 A32]) (const_int 0 [0])) (use (unspec:SI [ (const_int 0 [0]) ] UNSPEC_CALLEE_CC)) (clobber (reg:SI 1 ra)) ]) "frm-run-1.c":11:7 467 {call_internal} (expr_list:REG_CALL_DECL (symbol_ref:DI ("abort") [flags 0x41] <function_decl 0x7a756bd64100 __builtin_abort>) (expr_list:REG_NORETURN (const_int 0 [0]) (expr_list:REG_EH_REGION (const_int 0 [0]) (nil)))) (nil)) (barrier 17 16 39) (code_label 39 17 38 4 5 (nil) [1 uses]) (note 38 39 24 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (insn 24 38 25 4 (set (reg/i:DI 10 a0) (const_int 0 [0])) "frm-run-1.c":58:1 277 {*movdi_64bit} (nil)) (insn 25 24 40 4 (use (reg/i:DI 10 a0)) "frm-run-1.c":58:1 -1 (nil)) (note 40 25 41 NOTE_INSN_DELETED) (note 41 40 0 NOTE_INSN_DELETED)