"Li, Pan2" <pan2...@intel.com> writes: > Thanks Jeff and Richard S. > > Not sure if I followed up the discussion correct, but this patch only try to > fix the vxrm insn > deleted during late-combine (same scenario as frm) by adding it to > global_regs. > > If global_regs is not the right place according to the sematic of vxrm, we > may need other fix up to a point. > AFAIK, the most difference between vxrm and frm may look like below, take rvv > intrinsic as example: > > 13 │ void vxrm () > 14 │ { > 15 │ size_t vl = __riscv_vsetvl_e16m1 (N); > 16 │ vuint16m1_t va = __riscv_vle16_v_u16m1 (a, vl); > 17 │ vuint16m1_t vb = __riscv_vle16_v_u16m1 (b, vl); > 18 │ vuint16m1_t vc = __riscv_vaaddu_vv_u16m1 (va, vb, > __RISCV_VXRM_RDN, vl); > 19 │ > 20 │ __riscv_vse16_v_u16m1 (c, vc, vl); > 21 │ > 22 │ call_external (); > 23 │ } > 24 │ > 25 │ void frm () > 26 │ { > 27 │ size_t vl = __riscv_vsetvl_e16m1 (N); > 28 │ > 29 │ vfloat16m1_t va = __riscv_vle16_v_f16m1(af, vl); > 30 │ va = __riscv_vfnmadd_vv_f16m1_rm(va, va, va, __RISCV_FRM_RDN, vl); > 31 │ __riscv_vse16_v_f16m1(bf, va, vl); > 32 │ > 33 │ call_external (); > 34 │ } > > With option "-march=rv64gcv_zvfh -O3" > > 10 │ vxrm: > 11 │ csrwi vxrm,2 // Just set rm directly > ... > 17 │ vle16.v v2,0(a4) > 18 │ vle16.v v1,0(a3) > ... > 21 │ vaaddu.vv v1,v1,v2 > 22 │ vse16.v v1,0(a4) > 23 │ tail call_external > 28 │ frm: > 29 │ frrm a2 // backup > 30 │ fsrmi 2 // set rm > ... > 35 │ vle16.v v1,0(a3) > 36 │ addi a5,a5,%lo(bf) > 37 │ vfnmadd.vv v1,v1,v1 > 38 │ vse16.v v1,0(a5) > 39 │ fsrm a2 // restore > 40 │ tail call_external > > However, I would like to wait Jeff, or other RISC-V ports for a while before > any potential action to take.
The difference in the patch seems to be: @@ -49,6 +49,7 @@ .type main, @function main: .LFB2: + csrwi vxrm,2 addi sp,sp,-16 .LCFI0: sd ra,8(sp) giving: main: .LFB2: csrwi vxrm,2 addi sp,sp,-16 .LCFI0: sd ra,8(sp) .LCFI1: call initialize lui a3,%hi(a) lui a4,%hi(b) vsetivli zero,4,e16,m1,ta,ma addi a4,a4,%lo(b) addi a3,a3,%lo(a) vle16.v v2,0(a4) vle16.v v1,0(a3) lui a4,%hi(c) addi a4,a4,%lo(c) li a0,0 vaaddu.vv v1,v1,v2 vse16.v v1,0(a4) ld ra,8(sp) .LCFI2: addi sp,sp,16 .LCFI3: jr ra But if VXRM is call-clobbered, shouldn't the csrwi be after the call to initialize, rather than before it? The problem seems to be that mode-switching overloads VXRM_MODE_NONE to mean both "no requirement" and "unknown state". So we have: static int singleton_vxrm_need (void) { /* Only needed for vector code. */ if (!TARGET_VECTOR) return VXRM_MODE_NONE; and: if (vxrm_unknown_p (insn)) return VXRM_MODE_NONE; This means that VXRM is assumed to be transparent in an instruction that matches vxrm_unknown_p. The pass then thinks that it can move the initialisation of VXRM up through the call to initialize to the head of the block, even though the call clobbers VXRM and the uses are after the call. For mode-switching to work properly when the mode is not always known, there need to be different "neutral" and "unknown" states. E.g. if the current state is X: after (X, neutral) == X but: after (X, unknown) == unknown So it looks like the global_regs change is masking an incorrect placement of the VXRM instructions. If the call had been to some external function that clobbers VXRM then (AIUI) the code after the patch would still be wrong. I think there needs to be something like an VXRM_MODE_UNKNOWN. Thanks, Richard