On 2/17/25 3:12 AM, Richard Sandiford wrote:
"Li, Pan2" <pan2...@intel.com> writes:
Thanks Jeff and Richard S.

Not sure if I followed up the discussion correct, but this patch only try to 
fix the vxrm insn
deleted during late-combine (same scenario as frm) by adding it to global_regs.

If global_regs is not the right place according to the sematic of vxrm, we may 
need other fix up to a point.
AFAIK, the most difference between vxrm and frm may look like below, take rvv 
intrinsic as example:

   13   │ void vxrm ()
   14   │ {
   15   │   size_t vl = __riscv_vsetvl_e16m1 (N);
   16   │   vuint16m1_t va = __riscv_vle16_v_u16m1 (a, vl);
   17   │   vuint16m1_t vb = __riscv_vle16_v_u16m1 (b, vl);
   18   │   vuint16m1_t vc = __riscv_vaaddu_vv_u16m1 (va, vb, __RISCV_VXRM_RDN, 
vl);
   19   │
   20   │   __riscv_vse16_v_u16m1 (c, vc, vl);
   21   │
   22   │   call_external ();
   23   │ }
   24   │
   25   │ void frm ()
   26   │ {
   27   │   size_t vl = __riscv_vsetvl_e16m1 (N);
   28   │
   29   │   vfloat16m1_t va = __riscv_vle16_v_f16m1(af, vl);
   30   │   va = __riscv_vfnmadd_vv_f16m1_rm(va, va, va, __RISCV_FRM_RDN, vl);
   31   │   __riscv_vse16_v_f16m1(bf, va, vl);
   32   │
   33   │   call_external ();
   34   │ }

With option "-march=rv64gcv_zvfh -O3"

   10   │ vxrm:
   11   │     csrwi   vxrm,2  // Just set rm directly
...
   17   │     vle16.v v2,0(a4)
   18   │     vle16.v v1,0(a3)
...
   21   │     vaaddu.vv   v1,v1,v2
   22   │     vse16.v v1,0(a4)
   23   │     tail    call_external
   28   │ frm:
   29   │     frrm    a2        // backup
   30   │     fsrmi   2          // set rm
...
   35   │     vle16.v v1,0(a3)
   36   │     addi    a5,a5,%lo(bf)
   37   │     vfnmadd.vv  v1,v1,v1
   38   │     vse16.v v1,0(a5)
   39   │     fsrm    a2       // restore
   40   │     tail    call_external

However, I would like to wait Jeff, or other RISC-V ports for a while before 
any potential action to take.



main:
.LFB2:
         csrwi   vxrm,2
         addi    sp,sp,-16
.LCFI0:
         sd      ra,8(sp)
.LCFI1:
         call    initialize
         lui     a3,%hi(a)
         lui     a4,%hi(b)
         vsetivli        zero,4,e16,m1,ta,ma
         addi    a4,a4,%lo(b)
         addi    a3,a3,%lo(a)
         vle16.v v2,0(a4)
         vle16.v v1,0(a3)
         lui     a4,%hi(c)
         addi    a4,a4,%lo(c)
         li      a0,0
         vaaddu.vv       v1,v1,v2
         vse16.v v1,0(a4)
         ld      ra,8(sp)
.LCFI2:
         addi    sp,sp,16
.LCFI3:
         jr      ra

But if VXRM is call-clobbered, shouldn't the csrwi be after the call
to initialize, rather than before it?
Yes, absolutely true.

Let me take a looksie. I know I looked at this scenario not terribly long ago in x264. The important case has no calls and there was a secondary case with calls.

jeff

Reply via email to