"Li, Pan2" <pan2...@intel.com> writes:
> Thanks Jeff and Richard S.
>
> Not sure if I followed up the discussion correct, but this patch only try to 
> fix the vxrm insn
> deleted during late-combine (same scenario as frm) by adding it to 
> global_regs.
>
> If global_regs is not the right place according to the sematic of vxrm, we 
> may need other fix up to a point.
> AFAIK, the most difference between vxrm and frm may look like below, take rvv 
> intrinsic as example:
>
>   13   │ void vxrm ()
>   14   │ {
>   15   │   size_t vl = __riscv_vsetvl_e16m1 (N);
>   16   │   vuint16m1_t va = __riscv_vle16_v_u16m1 (a, vl);
>   17   │   vuint16m1_t vb = __riscv_vle16_v_u16m1 (b, vl);
>   18   │   vuint16m1_t vc = __riscv_vaaddu_vv_u16m1 (va, vb, 
> __RISCV_VXRM_RDN, vl);
>   19   │
>   20   │   __riscv_vse16_v_u16m1 (c, vc, vl);
>   21   │
>   22   │   call_external ();
>   23   │ }
>   24   │
>   25   │ void frm ()
>   26   │ {
>   27   │   size_t vl = __riscv_vsetvl_e16m1 (N);
>   28   │
>   29   │   vfloat16m1_t va = __riscv_vle16_v_f16m1(af, vl);
>   30   │   va = __riscv_vfnmadd_vv_f16m1_rm(va, va, va, __RISCV_FRM_RDN, vl);
>   31   │   __riscv_vse16_v_f16m1(bf, va, vl);
>   32   │
>   33   │   call_external ();
>   34   │ }
>
> With option "-march=rv64gcv_zvfh -O3"
>
>   10   │ vxrm:
>   11   │     csrwi   vxrm,2  // Just set rm directly
> ...
>   17   │     vle16.v v2,0(a4)
>   18   │     vle16.v v1,0(a3)
> ...
>   21   │     vaaddu.vv   v1,v1,v2
>   22   │     vse16.v v1,0(a4)
>   23   │     tail    call_external
>   28   │ frm:
>   29   │     frrm    a2        // backup
>   30   │     fsrmi   2          // set rm
> ...
>   35   │     vle16.v v1,0(a3)
>   36   │     addi    a5,a5,%lo(bf)
>   37   │     vfnmadd.vv  v1,v1,v1
>   38   │     vse16.v v1,0(a5)
>   39   │     fsrm    a2       // restore
>   40   │     tail    call_external
>
> However, I would like to wait Jeff, or other RISC-V ports for a while before 
> any potential action to take.

The difference in the patch seems to be:

@@ -49,6 +49,7 @@
        .type   main, @function
 main:
 .LFB2:
+       csrwi   vxrm,2
        addi    sp,sp,-16
 .LCFI0:
        sd      ra,8(sp)

giving:

main:
.LFB2:
        csrwi   vxrm,2
        addi    sp,sp,-16
.LCFI0:
        sd      ra,8(sp)
.LCFI1:
        call    initialize
        lui     a3,%hi(a)
        lui     a4,%hi(b)
        vsetivli        zero,4,e16,m1,ta,ma
        addi    a4,a4,%lo(b)
        addi    a3,a3,%lo(a)
        vle16.v v2,0(a4)
        vle16.v v1,0(a3)
        lui     a4,%hi(c)
        addi    a4,a4,%lo(c)
        li      a0,0
        vaaddu.vv       v1,v1,v2
        vse16.v v1,0(a4)
        ld      ra,8(sp)
.LCFI2:
        addi    sp,sp,16
.LCFI3:
        jr      ra

But if VXRM is call-clobbered, shouldn't the csrwi be after the call
to initialize, rather than before it?

The problem seems to be that mode-switching overloads VXRM_MODE_NONE
to mean both "no requirement" and "unknown state".  So we have:

static int
singleton_vxrm_need (void)
{
  /* Only needed for vector code.  */
  if (!TARGET_VECTOR)
    return VXRM_MODE_NONE;

and:

  if (vxrm_unknown_p (insn))
    return VXRM_MODE_NONE;

This means that VXRM is assumed to be transparent in an instruction
that matches vxrm_unknown_p.  The pass then thinks that it can move
the initialisation of VXRM up through the call to initialize to the
head of the block, even though the call clobbers VXRM and the uses
are after the call.

For mode-switching to work properly when the mode is not always known,
there need to be different "neutral" and "unknown" states.  E.g. if the
current state is X:

   after (X, neutral) == X

but:

   after (X, unknown) == unknown

So it looks like the global_regs change is masking an incorrect placement
of the VXRM instructions.  If the call had been to some external function
that clobbers VXRM then (AIUI) the code after the patch would still be wrong.

I think there needs to be something like an VXRM_MODE_UNKNOWN.

Thanks,
Richard

Reply via email to