Hi Richard,

On 6/2/25 01:27, Richard Sandiford wrote:
> Vineet Gupta <vine...@rivosinc.com> writes:
>> +CC gcc-patches
>>
>> On 5/30/25 14:04, Vineet Gupta wrote:
>>> Hi Jeff, Richard
>>>
>>> As part of RISC-V FRM mode switching improvements, I'm running into a 
>>> behavior
>>> in late_combine2 where it is eliminating FRM save/restores when it is 
>>> desired to
>>> keep them.
>>>
>>> I'm pasting snippet of RTL dumps, could you please see if anything is 
>>> jumping
>>> out from this limited info.
>>> In RISC-V backend, FRM is specified as a global_reg and inline asm is  the 
>>> only
>>> way for users to achieve the global fesetround() like semantics.
>>>
>>> src
>>>
>>>     /*  -march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize */
>>>     #pragma riscv intrinsic "vector"
>>>     typedef long unsigned int size_t;
>>>
>>>     static void
>>>     set_frm (int frm)
>>>     {
>>>       __asm__ volatile ( "fsrm %0" : :"r"(frm) : "frm");
>>>     }
>>>
>>>     vfloat32m1_t __attribute__ ((noinline))
>>>     test_float_point_frm_run_1 (vfloat32m1_t op1, vfloat32m1_t op2, size_t 
>>> vl)
>>>     {
>>>       vfloat32m1_t result;
>>>       /* global mode set */
>>>       set_frm (0);
>>>       /* intrinsic for set mode 1 should be local and restored back to 
>>> global 0
>>>     upon return */
>>>       result = __riscv_vfadd_vv_f32m1_rm (op1, result, 1, vl);
>>>       return result;
>>>     }
>>>
>>>
>>> RTL dump
>>>
>>>    mode-sw
>>>
>>>     (insn 9 8 18 2 (parallel [
>>>                 (asm_operands/v ("fsrm %0") ("") 0 [
>>>                         (reg:SI 139)
>>>                     ]
>>>                      [
>>>                         (asm_input:SI ("r") frm-run-1.c:33)
>>>                     ]
>>>                      [] frm-run-1.c:33)
>>>                 (clobber (reg:V4096QI 69 frm))
>>>             ]) "frm-run-1.c":33:3 -1
>>>          (expr_list:REG_DEAD (reg:SI 139)
>>>             (nil)))
>>>
>>>     (insn 27 10 28 2 (set (reg:SI 144)
>>>             (reg:SI 69 frm)) "frm-run-1.c":43:1 -1
>>>          (nil))
>>>     (insn 28 27 14 2 (set (reg:SI 69 frm)
>>>             (const_int 1 [0x1])) "frm-run-1.c":43:1 -1
>>>          (nil))
> It looks like insn 14 has been snipped.  What was it?  Did it use FRM?

Its corresponds to a Vector intrinsic which does use the static FRM set to 1 in
insn 28.

(insn 14 19 15 2 (set (reg/i:RVVM1SF 104 v8)
        (if_then_else:RVVM1SF (unspec:RVVMF32BI [
                    (const_vector:RVVMF32BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (reg:DI 10 a0 [orig:143 vl ] [143])
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                    (reg:SI 69 frm)
                ] UNSPEC_VPREDICATE)
            (plus:RVVM1SF (reg:RVVM1SF 104 v8 [orig:142 op1 ] [142])
                (reg/v:RVVM1SF 105 v9 [orig:134 result ] [134]))
            (unspec:RVVM1SF [
                    (reg:DI 0 zero)
                ] UNSPEC_VUNDEF))) "frm-run-1.c":43:1 14638 {pred_addrvvm1sf}
     (nil))



> If not, then...
>
>>>     (insn 29 14 24 2 (set (reg:SI 69 frm)
>>>             (reg:SI 144)) -1
>>>          (nil))
> ...insns 27, 28, and 29 as given above collectively have no effect,
> assuming that reg 144 dies in insn 29.  The sequence can be removed
> without changing the RTL semantics.

insn 28 intends to have global side-effect, it changes FRM which is also modeled
as a global_reg []
The other two insn 27 and insn 29 are save/restore to preserve the global state.
W/o the inline asm in the mix, the same 3 seq insn seq is not optimized to just
insn 28.

> The dump you give here:
>
>>>    late-combine2
>>>
>>>     trying to combine definition of r15 in:
>>>        27: a5:SI=frm:SI
>>>     into:
>>>        29: frm:SI=a5:SI
>>>     instruction becomes a no-op:
>>>     (set (reg:SI 69 frm)
>>>         (reg:SI 69 frm))
>>>     original cost = 4 + 4 (weighted: 8.000000), replacement cost = nop; 
>>> keeping
>>>     replacement
>>>     rescanning insn with uid = 29.
>>>     updating insn 29 in-place
>>>     verify found no changes in insn with uid = 29.
>>>     deleting insn 27
>>>     deleting insn with uid = 27
>>>
>>>
>>> If I comment out the inline asm - it can no longer combine, elimination 
>>> doesn't
>>> happen with expected outcome.
>>>
>>>     trying to combine definition of r15 in:
>>>        25: a5:SI=frm:SI
>>>     into:
>>>        27: frm:SI=a5:SI
>>>     -- cannot satisfy all definitions and uses in insn 27
> ...is from late-combine2, so after RA has completed, whereas the earlier
> dump is from mode switching, so it's hard to tell what late-combine2 is
> operating on.  Could you give the RTL as late-combine2 sees it?
> (That would normally be the result of pass_postreload_cse.)

Right, it is attached here.

Thx,
-Vineet
;; Function test_float_point_frm_run_1 (test_float_point_frm_run_1, 
funcdef_no=3, decl_uid=129081, cgraph_uid=4, symbol_order=3)

starting the processing of deferred insns
ending the processing of deferred insns


test_float_point_frm_run_1

Dataflow summary:
;;  fully invalidated by EH      0 [zero] 1 [ra] 3 [gp] 4 [tp] 5 [t0] 6 [t1] 7 
[t2] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 28 [t3] 29 
[t4] 30 [t5] 31 [t6] 32 [ft0] 33 [ft1] 34 [ft2] 35 [ft3] 36 [ft4] 37 [ft5] 38 
[ft6] 39 [ft7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 
49 [fa7] 60 [ft8] 61 [ft9] 62 [ft10] 63 [ft11] 66 [vl] 67 [vtype] 68 [vxrm] 69 
[frm] 70 [vxsat] 71 [N/A] 72 [N/A] 73 [N/A] 74 [N/A] 75 [N/A] 76 [N/A] 77 [N/A] 
78 [N/A] 79 [N/A] 80 [N/A] 81 [N/A] 82 [N/A] 83 [N/A] 84 [N/A] 85 [N/A] 86 
[N/A] 87 [N/A] 88 [N/A] 89 [N/A] 90 [N/A] 91 [N/A] 92 [N/A] 93 [N/A] 94 [N/A] 
95 [N/A] 96 [v0] 97 [v1] 98 [v2] 99 [v3] 100 [v4] 101 [v5] 102 [v6] 103 [v7] 
104 [v8] 105 [v9] 106 [v10] 107 [v11] 108 [v12] 109 [v13] 110 [v14] 111 [v15] 
112 [v16] 113 [v17] 114 [v18] 115 [v19] 116 [v20] 117 [v21] 118 [v22] 119 [v23] 
120 [v24] 121 [v25] 122 [v26] 123 [v27] 124 [v28] 125 [v29] 126 [v30] 127 [v31]
;;  hardware regs used   2 [sp] 68 [vxrm] 69 [frm]
;;  regular block artificial uses        2 [sp]
;;  eh block artificial uses     2 [sp] 64 [arg]
;;  entry block defs     1 [ra] 2 [sp] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 
15 [a5] 16 [a6] 17 [a7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 
48 [fa6] 49 [fa7] 68 [vxrm] 69 [frm]
;;  exit block uses      1 [ra] 2 [sp] 68 [vxrm] 69 [frm] 104 [v8]
;;  regs ever live       0 [zero] 10 [a0] 15 [a5] 66 [vl] 67 [vtype] 69 [frm] 
104 [v8] 105 [v9]
;;  ref usage   r0={2u} r1={1d,1u} r2={1d,2u} r10={1d,2u} r11={1d} r12={1d} 
r13={1d} r14={1d} r15={3d,2u} r16={1d} r17={1d} r42={1d} r43={1d} r44={1d} 
r45={1d} r46={1d} r47={1d} r48={1d} r49={1d} r66={2u} r67={2u} r68={1d,1u} 
r69={4d,3u} r104={1d,3u} r105={1d,1u} 
;;    total ref usage 48{27d,21u,0e} in 8{8 regular + 0 call} insns.
(note 1 0 23 NOTE_INSN_DELETED)
(note 23 1 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 5 23 10 2 NOTE_INSN_FUNCTION_BEG)
(note 10 5 24 2 NOTE_INSN_DELETED)
(note 24 10 8 2 NOTE_INSN_DELETED)
(insn 8 24 9 2 (set (reg:SI 15 a5 [139])
        (const_int 0 [0])) "frm-run-1.c":33:3 278 {*movsi_internal}
     (expr_list:REG_EQUIV (const_int 0 [0])
        (nil)))
(insn 9 8 27 2 (parallel [
            (asm_operands/v ("fsrm %0") ("") 0 [
                    (reg:SI 15 a5 [139])
                ]
                 [
                    (asm_input:SI ("r") frm-run-1.c:33)
                ]
                 [] frm-run-1.c:33)
            (clobber (reg:V4096QI 69 frm))
        ]) "frm-run-1.c":33:3 -1
     (nil))
(insn 27 9 28 2 (set (reg:SI 15 a5 [144])
        (reg:SI 69 frm)) "frm-run-1.c":43:1 2829 {frrmsi}
     (nil))
(insn 28 27 19 2 (set (reg:SI 69 frm)
        (const_int 1 [0x1])) "frm-run-1.c":43:1 2828 {fsrmsi_restore}
     (nil))
(insn 19 28 14 2 (set (reg/v:RVVM1SF 105 v9 [orig:134 result ] [134])
        (if_then_else:RVVM1SF (unspec:RVVMF32BI [
                    (const_vector:RVVMF32BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (reg:DI 10 a0 [orig:143 vl ] [143])
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (const_vector:RVVM1SF repeat [
                    (const_double:SF 0.0 [0x0.0p+0])
                ])
            (unspec:RVVM1SF [
                    (reg:DI 0 zero)
                ] UNSPEC_VUNDEF))) "frm-run-1.c":41:12 4494 
{*pred_broadcastrvvm1sf_imm}
     (expr_list:REG_EQUIV (const_vector:RVVM1SF repeat [
                (const_double:SF 0.0 [0x0.0p+0])
            ])
        (nil)))
(insn 14 19 29 2 (set (reg/i:RVVM1SF 104 v8)
        (if_then_else:RVVM1SF (unspec:RVVMF32BI [
                    (const_vector:RVVMF32BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (reg:DI 10 a0 [orig:143 vl ] [143])
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                    (reg:SI 69 frm)
                ] UNSPEC_VPREDICATE)
            (plus:RVVM1SF (reg:RVVM1SF 104 v8 [orig:142 op1 ] [142])
                (reg/v:RVVM1SF 105 v9 [orig:134 result ] [134]))
            (unspec:RVVM1SF [
                    (reg:DI 0 zero)
                ] UNSPEC_VUNDEF))) "frm-run-1.c":43:1 14638 {pred_addrvvm1sf}
     (nil))
(insn 29 14 15 2 (set (reg:SI 69 frm)
        (reg:SI 15 a5 [144])) 2828 {fsrmsi_restore}
     (nil))
(insn 15 29 30 2 (use (reg/i:RVVM1SF 104 v8)) "frm-run-1.c":43:1 -1
     (nil))
(note 30 15 31 NOTE_INSN_DELETED)
(note 31 30 0 NOTE_INSN_DELETED)

;; Function main (main, funcdef_no=4, decl_uid=129084, cgraph_uid=5, 
symbol_order=4) (executed once)

rescanning insn with uid = 33.
rescanning insn with uid = 9.
deleting insn with uid = 33.
starting the processing of deferred insns
ending the processing of deferred insns


main

Dataflow summary:
;;  fully invalidated by EH      0 [zero] 1 [ra] 3 [gp] 4 [tp] 5 [t0] 6 [t1] 7 
[t2] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 28 [t3] 29 
[t4] 30 [t5] 31 [t6] 32 [ft0] 33 [ft1] 34 [ft2] 35 [ft3] 36 [ft4] 37 [ft5] 38 
[ft6] 39 [ft7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 
49 [fa7] 60 [ft8] 61 [ft9] 62 [ft10] 63 [ft11] 66 [vl] 67 [vtype] 68 [vxrm] 69 
[frm] 70 [vxsat] 71 [N/A] 72 [N/A] 73 [N/A] 74 [N/A] 75 [N/A] 76 [N/A] 77 [N/A] 
78 [N/A] 79 [N/A] 80 [N/A] 81 [N/A] 82 [N/A] 83 [N/A] 84 [N/A] 85 [N/A] 86 
[N/A] 87 [N/A] 88 [N/A] 89 [N/A] 90 [N/A] 91 [N/A] 92 [N/A] 93 [N/A] 94 [N/A] 
95 [N/A] 96 [v0] 97 [v1] 98 [v2] 99 [v3] 100 [v4] 101 [v5] 102 [v6] 103 [v7] 
104 [v8] 105 [v9] 106 [v10] 107 [v11] 108 [v12] 109 [v13] 110 [v14] 111 [v15] 
112 [v16] 113 [v17] 114 [v18] 115 [v19] 116 [v20] 117 [v21] 118 [v22] 119 [v23] 
120 [v24] 121 [v25] 122 [v26] 123 [v27] 124 [v28] 125 [v29] 126 [v30] 127 [v31]
;;  hardware regs used   2 [sp] 68 [vxrm] 69 [frm]
;;  regular block artificial uses        2 [sp]
;;  eh block artificial uses     2 [sp] 64 [arg]
;;  entry block defs     1 [ra] 2 [sp] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 
15 [a5] 16 [a6] 17 [a7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 
48 [fa6] 49 [fa7] 68 [vxrm] 69 [frm]
;;  exit block uses      1 [ra] 2 [sp] 10 [a0] 68 [vxrm] 69 [frm]
;;  regs ever live       0 [zero] 1 [ra] 2 [sp] 10 [a0] 14 [a4] 15 [a5] 66 [vl] 
67 [vtype] 68 [vxrm] 69 [frm] 104 [v8] 105 [v9]
;;  ref usage   r0={2d,1u} r1={3d,1u} r2={1d,6u} r3={2d} r4={2d} r5={2d} 
r6={2d} r7={2d} r10={5d,3u} r11={3d} r12={3d} r13={3d} r14={4d,1u} r15={7d,4u} 
r16={3d} r17={3d} r28={2d} r29={2d} r30={2d} r31={2d} r32={2d} r33={2d} 
r34={2d} r35={2d} r36={2d} r37={2d} r38={2d} r39={2d} r42={3d} r43={3d} 
r44={3d} r45={3d} r46={3d} r47={3d} r48={3d} r49={3d} r60={2d} r61={2d} 
r62={2d} r63={2d} r66={2d,1u} r67={2d,1u} r68={3d,3u} r69={4d,3u} r70={2d} 
r71={2d} r72={2d} r73={2d} r74={2d} r75={2d} r76={2d} r77={2d} r78={2d} 
r79={2d} r80={2d} r81={2d} r82={2d} r83={2d} r84={2d} r85={2d} r86={2d} 
r87={2d} r88={2d} r89={2d} r90={2d} r91={2d} r92={2d} r93={2d} r94={2d} 
r95={2d} r96={2d} r97={1d} r98={1d} r99={1d} r100={1d} r101={1d} r102={1d} 
r103={1d} r104={3d,1u} r105={3d,2u} r106={2d} r107={2d} r108={2d} r109={2d} 
r110={2d} r111={2d} r112={2d} r113={2d} r114={2d} r115={2d} r116={2d} r117={2d} 
r118={2d} r119={2d} r120={1d} r121={1d} r122={1d} r123={1d} r124={1d} r125={1d} 
r126={1d} r127={1d} 
;;    total ref usage 244{217d,27u,0e} in 14{12 regular + 2 call} insns.
(note 1 0 35 NOTE_INSN_DELETED)
(note 35 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 35 31 2 NOTE_INSN_FUNCTION_BEG)
(note 31 2 34 2 NOTE_INSN_DELETED)
(note 34 31 5 2 NOTE_INSN_DELETED)
(insn 5 34 6 2 (set (reg:SI 15 a5 [138])
        (const_int 4 [0x4])) "frm-run-1.c":33:3 278 {*movsi_internal}
     (expr_list:REG_EQUIV (const_int 4 [0x4])
        (nil)))
(insn 6 5 30 2 (parallel [
            (asm_operands/v ("fsrm %0") ("") 0 [
                    (reg:SI 15 a5 [138])
                ]
                 [
                    (asm_input:SI ("r") frm-run-1.c:33)
                ]
                 [] frm-run-1.c:33)
            (clobber (reg:V4096QI 69 frm))
        ]) "frm-run-1.c":33:3 -1
     (nil))
(insn 30 6 8 2 (set (reg:DI 15 a5 [142])
        (unspec:DI [
                (const_int 32 [0x20])
            ] UNSPEC_VLMAX)) "frm-run-1.c":54:3 2825 {vlmax_avldi}
     (expr_list:REG_EQUIV (unspec:DI [
                (const_int 32 [0x20])
            ] UNSPEC_VLMAX)
        (nil)))
(insn 8 30 9 2 (set (reg:RVVM1SF 105 v9)
        (if_then_else:RVVM1SF (unspec:RVVMF32BI [
                    (const_vector:RVVMF32BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (reg:DI 15 a5 [142])
                    (const_int 2 [0x2]) repeated x2
                    (const_int 1 [0x1])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (const_vector:RVVM1SF repeat [
                    (const_double:SF 0.0 [0x0.0p+0])
                ])
            (unspec:RVVM1SF [
                    (reg:DI 0 zero)
                ] UNSPEC_VUNDEF))) "frm-run-1.c":54:3 4494 
{*pred_broadcastrvvm1sf_imm}
     (nil))
(insn 9 8 7 2 (set (reg:RVVM1SF 104 v8)
        (reg:RVVM1SF 105 v9)) "frm-run-1.c":54:3 2853 {*movrvvm1sf_whole}
     (nil))
(insn 7 9 10 2 (set (reg:DI 10 a0)
        (const_int 8 [0x8])) "frm-run-1.c":54:3 277 {*movdi_64bit}
     (nil))
(call_insn 10 7 12 2 (parallel [
            (set (reg:RVVM1SF 104 v8)
                (call (mem:SI (symbol_ref:DI ("test_float_point_frm_run_1") 
[flags 0x3]  <function_decl 0x7a756671cb00 test_float_point_frm_run_1>) [0 
test_float_point_frm_run_1 S4 A32])
                    (const_int 0 [0])))
            (use (unspec:SI [
                        (const_int 1 [0x1])
                    ] UNSPEC_CALLEE_CC))
            (clobber (reg:SI 1 ra))
        ]) "frm-run-1.c":54:3 468 {call_value_internal}
     (expr_list:REG_CALL_DECL (symbol_ref:DI ("test_float_point_frm_run_1") 
[flags 0x3]  <function_decl 0x7a756671cb00 test_float_point_frm_run_1>)
        (expr_list:REG_EH_REGION (const_int 0 [0])
            (nil)))
    (expr_list:RVVM1SF (use (reg:RVVM1SF 104 v8))
        (expr_list:RVVM1SF (use (reg:RVVM1SF 105 v9))
            (expr_list:DI (use (reg:DI 10 a0))
                (nil)))))
(insn 12 10 13 2 (set (reg:SI 15 a5 [orig:139 frm ] [139])
        (asm_operands/v:SI ("frrm %0") ("=r") 0 []
             []
             [] frm-run-1.c:20)) "frm-run-1.c":20:3 -1
     (nil))
(insn 13 12 11 2 (set (reg:DI 14 a4 [140])
        (const_int 4 [0x4])) "frm-run-1.c":8:6 277 {*movdi_64bit}
     (expr_list:REG_EQUIV (const_int 4 [0x4])
        (nil)))
(insn 11 13 14 2 (set (reg/v:DI 15 a5 [orig:136 frm ] [136])
        (sign_extend:DI (reg:SI 15 a5 [orig:139 frm ] [139]))) 
"frm-run-1.c":20:3 127 {*extendsidi2_internal}
     (nil))
(jump_insn 14 11 15 2 (set (pc)
        (if_then_else (eq (reg/v:DI 15 a5 [orig:136 frm ] [136])
                (reg:DI 14 a4 [140]))
            (label_ref:DI 39)
            (pc))) "frm-run-1.c":8:6 371 {*branchdi}
     (int_list:REG_BR_PROB 1073741831 (nil))
 -> 39)
(note 15 14 16 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(call_insn 16 15 17 3 (parallel [
            (call (mem:SI (symbol_ref:DI ("abort") [flags 0x41]  <function_decl 
0x7a756bd64100 __builtin_abort>) [0 __builtin_abort S4 A32])
                (const_int 0 [0]))
            (use (unspec:SI [
                        (const_int 0 [0])
                    ] UNSPEC_CALLEE_CC))
            (clobber (reg:SI 1 ra))
        ]) "frm-run-1.c":11:7 467 {call_internal}
     (expr_list:REG_CALL_DECL (symbol_ref:DI ("abort") [flags 0x41]  
<function_decl 0x7a756bd64100 __builtin_abort>)
        (expr_list:REG_NORETURN (const_int 0 [0])
            (expr_list:REG_EH_REGION (const_int 0 [0])
                (nil))))
    (nil))
(barrier 17 16 39)
(code_label 39 17 38 4 5 (nil) [1 uses])
(note 38 39 24 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 24 38 25 4 (set (reg/i:DI 10 a0)
        (const_int 0 [0])) "frm-run-1.c":58:1 277 {*movdi_64bit}
     (nil))
(insn 25 24 40 4 (use (reg/i:DI 10 a0)) "frm-run-1.c":58:1 -1
     (nil))
(note 40 25 41 NOTE_INSN_DELETED)
(note 41 40 0 NOTE_INSN_DELETED)

Reply via email to