On 2/17/25 3:12 AM, Richard Sandiford wrote:
"Li, Pan2" <pan2...@intel.com> writes:
Thanks Jeff and Richard S.
Not sure if I followed up the discussion correct, but this patch only try to
fix the vxrm insn
deleted during late-combine (same scenario as frm) by adding it to global_regs.
If global_regs is not the right place according to the sematic of vxrm, we may
need other fix up to a point.
AFAIK, the most difference between vxrm and frm may look like below, take rvv
intrinsic as example:
13 │ void vxrm ()
14 │ {
15 │ size_t vl = __riscv_vsetvl_e16m1 (N);
16 │ vuint16m1_t va = __riscv_vle16_v_u16m1 (a, vl);
17 │ vuint16m1_t vb = __riscv_vle16_v_u16m1 (b, vl);
18 │ vuint16m1_t vc = __riscv_vaaddu_vv_u16m1 (va, vb, __RISCV_VXRM_RDN,
vl);
19 │
20 │ __riscv_vse16_v_u16m1 (c, vc, vl);
21 │
22 │ call_external ();
23 │ }
24 │
25 │ void frm ()
26 │ {
27 │ size_t vl = __riscv_vsetvl_e16m1 (N);
28 │
29 │ vfloat16m1_t va = __riscv_vle16_v_f16m1(af, vl);
30 │ va = __riscv_vfnmadd_vv_f16m1_rm(va, va, va, __RISCV_FRM_RDN, vl);
31 │ __riscv_vse16_v_f16m1(bf, va, vl);
32 │
33 │ call_external ();
34 │ }
With option "-march=rv64gcv_zvfh -O3"
10 │ vxrm:
11 │ csrwi vxrm,2 // Just set rm directly
...
17 │ vle16.v v2,0(a4)
18 │ vle16.v v1,0(a3)
...
21 │ vaaddu.vv v1,v1,v2
22 │ vse16.v v1,0(a4)
23 │ tail call_external
28 │ frm:
29 │ frrm a2 // backup
30 │ fsrmi 2 // set rm
...
35 │ vle16.v v1,0(a3)
36 │ addi a5,a5,%lo(bf)
37 │ vfnmadd.vv v1,v1,v1
38 │ vse16.v v1,0(a5)
39 │ fsrm a2 // restore
40 │ tail call_external
However, I would like to wait Jeff, or other RISC-V ports for a while before
any potential action to take.
main:
.LFB2:
csrwi vxrm,2
addi sp,sp,-16
.LCFI0:
sd ra,8(sp)
.LCFI1:
call initialize
lui a3,%hi(a)
lui a4,%hi(b)
vsetivli zero,4,e16,m1,ta,ma
addi a4,a4,%lo(b)
addi a3,a3,%lo(a)
vle16.v v2,0(a4)
vle16.v v1,0(a3)
lui a4,%hi(c)
addi a4,a4,%lo(c)
li a0,0
vaaddu.vv v1,v1,v2
vse16.v v1,0(a4)
ld ra,8(sp)
.LCFI2:
addi sp,sp,16
.LCFI3:
jr ra
But if VXRM is call-clobbered, shouldn't the csrwi be after the call
to initialize, rather than before it?
Yes, absolutely true.
Let me take a looksie. I know I looked at this scenario not terribly
long ago in x264. The important case has no calls and there was a
secondary case with calls.
jeff