> If we encounter a uarch where the other sequence is better, then I think > we can do something like query costs or the like and select between the > approaches -- but no need to do that now.
> So OK for the trunk. Thanks, patch will be committed soon. ------------------ Original ------------------ From: "Jeff Law" <gcc-patches@gcc.gnu.org>; Date: Sat, Aug 12, 2023 07:02 AM To: "Lehua Ding"<lehua.d...@rivai.ai>;"gcc-patches"<gcc-patches@gcc.gnu.org>; Cc: "juzhe.zhong"<juzhe.zh...@rivai.ai>;"kito.cheng"<kito.ch...@gmail.com>;"rdapp.gcc"<rdapp....@gmail.com>;"palmer"<pal...@rivosinc.com>; Subject: Re: [PATCH] RISC-V: Revert the convert from vmv.s.x to vmv.v.i On 8/11/23 03:01, Lehua Ding wrote: > Hi, > > This patch revert the convert from vmv.s.x to vmv.v.i and add new pattern > optimize the special case when the scalar operand is zero. > > Currently, the broadcast pattern where the scalar operand is a imm > will be converted to vmv.v.i from vmv.s.x and the mask operand will be > converted from 00..01 to 11..11. There are some advantages and > disadvantages before and after the conversion after discussing > with Juzhe offline and we chose not to do this transform. > > Before: > > Advantages: The vsetvli info required by vmv.s.x has better compatibility since > vmv.s.x only required SEW and VLEN be zero or one. That mean there > is more opportunities to combine with other vsetlv infos in vsetvl pass. > > Disadvantages: For non-zero scalar imm, one more `li rd, imm` instruction > will be needed. > > After: > > Advantages: No need `li rd, imm` instruction since vmv.v.i support imm operand. > > Disadvantages: Like before's advantages. Worse compatibility leads to more > vsetvl instrunctions need. > > Consider the bellow C code and asm after autovec. > there is an extra insn (vsetivli zero, 1, e32, m1, ta, ma) > after converted vmv.s.x to vmv.v.i. > > ``` > int foo1(int* restrict a, int* restrict b, int *restrict c, int n) { > int sum = 0; > for (int i = 0; i < n; i++) > sum += a[i] * b[i]; > > return sum; > } > ``` > > asm (Before): > > ``` > foo1: > ble a3,zero,.L7 > vsetvli a2,zero,e32,m1,ta,ma > vmv.v.i v1,0 > .L6: > vsetvli a5,a3,e32,m1,tu,ma > slli a4,a5,2 > sub a3,a3,a5 > vle32.v v2,0(a0) > vle32.v v3,0(a1) > add a0,a0,a4 > add a1,a1,a4 > vmacc.vv v1,v3,v2 > bne a3,zero,.L6 > vsetvli a2,zero,e32,m1,ta,ma > vmv.s.x v2,zero > vredsum.vs v1,v1,v2 > vmv.x.s a0,v1 > ret > .L7: > li a0,0 > ret > ``` > > asm (After): > > ``` > foo1: > ble a3,zero,.L4 > vsetvli a2,zero,e32,m1,ta,ma > vmv.v.i v1,0 > .L3: > vsetvli a5,a3,e32,m1,tu,ma > slli a4,a5,2 > sub a3,a3,a5 > vle32.v v2,0(a0) > vle32.v v3,0(a1) > add a0,a0,a4 > add a1,a1,a4 > vmacc.vv v1,v3,v2 > bne a3,zero,.L3 > vsetivli zero,1,e32,m1,ta,ma > vmv.v.i v2,0 > vsetvli a2,zero,e32,m1,ta,ma > vredsum.vs v1,v1,v2 > vmv.x.s a0,v1 > ret > .L4: > li a0,0 > ret > ``` > > Best, > Lehua > > Co-Authored-By: Ju-Zhe Zhong <juzhe.zh...@rivai.ai> > > gcc/ChangeLog: > >* config/riscv/predicates.md (vector_const_0_operand): New. >* config/riscv/vector.md (*pred_broadcast<mode>_zero): Ditto. > > gcc/testsuite/ChangeLog: > >* gcc.target/riscv/rvv/base/scalar_move-5.c: Update. >* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto. If we encounter a uarch where the other sequence is better, then I think we can do something like query costs or the like and select between the approaches -- but no need to do that now. So OK for the trunk. jeff