Hi, Richi. >> As I said in the PR with the proposed scheme you get a loop around copy of >> the IV since both the pre and the post decrement values are live at the same >> time. >> If the CPU has a underflow bit set from the subtraction and a branch on that >> test using that could avoid the copy need.
RISC-V port doesn't have such instructions so such copy is needed in RISC-V port. But as I said, such copy is very cheap. So, I wonder whether you will consider take && review this patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620086.html or not? Or you have another plan ? Thanks. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-05-31 00:40 To: 钟居哲 CC: richard.sandiford; gcc-patches; linkw Subject: Re: [PATCH] VECT: Change flow of decrement IV Am 30.05.2023 um 14:38 schrieb 钟居哲 <juzhe.zh...@rivai.ai>: >> That's odd, you only need to adjust the IV which is used in the exit test, >> not all the others. Sorry for my incorrect information. I checked the codegen of both single-rgroup and multi-rgroup. Their codegen are same behavior, after this patch, there will be 1 more neg instruction in preheader and 1 more mv instruction inside the loop. As I said in the PR with the proposed scheme you get a loop around copy of the IV since both the pre and the post decrement values are live at the same time. If the CPU has a underflow bit set from the subtraction and a branch on that test using that could avoid the copy need. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-05-30 20:33 To: juzhe.zhong CC: Richard Sandiford; gcc-patches; linkw Subject: Re: [PATCH] VECT: Change flow of decrement IV On Tue, 30 May 2023, juzhe.zhong wrote: > This patch will generate the number of rgroup ?mov? instructions inside the > loop. This is unacceptable. For example?if number of rgroups=3? will be 3 more > instruction in loop. If this patch is necessary? I think I should find a way > to fix it. That's odd, you only need to adjust the IV which is used in the exit test, not all the others. > ---- Replied Message ---- > From > Richard Sandiford<richard.sandif...@arm.com> > Date > 05/30/2023 19:41 > To > juzhe.zh...@rivai.ai<juzhe.zh...@rivai.ai> > Cc > gcc-patches<gcc-patches@gcc.gnu.org>, > rguenther<rguent...@suse.de>, > linkw<li...@linux.ibm.com> > Subject > Re: [PATCH] VECT: Change flow of decrement IV > "juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes: > > Before this patch: > > foo: > > ble a2,zero,.L5 > > csrr a3,vlenb > > srli a4,a3,2 > > .L3: > > minu a5,a2,a4 > > vsetvli zero,a5,e32,m1,ta,ma > > vle32.v v2,0(a1) > > vle32.v v1,0(a0) > > vsetvli t1,zero,e32,m1,ta,ma > > vadd.vv v1,v1,v2 > > vsetvli zero,a5,e32,m1,ta,ma > > vse32.v v1,0(a0) > > add a1,a1,a3 > > add a0,a0,a3 > > sub a2,a2,a5 > > bne a2,zero,.L3 > > .L5: > > ret > > > > After this patch: > > > > foo: > > ble a2,zero,.L5 > > csrr a3,vlenb > > srli a4,a3,2 > > neg a7,a4 -->>>additional instruction > > .L3: > > minu a5,a2,a4 > > vsetvli zero,a5,e32,m1,ta,ma > > vle32.v v2,0(a1) > > vle32.v v1,0(a0) > > vsetvli t1,zero,e32,m1,ta,ma > > mv a6,a2 -->>>additional instruction > > vadd.vv v1,v1,v2 > > vsetvli zero,a5,e32,m1,ta,ma > > vse32.v v1,0(a0) > > add a1,a1,a3 > > add a0,a0,a3 > > add a2,a2,a7 > > bgtu a6,a4,.L3 > > .L5: > > ret > > > > There is 1 more instruction in preheader and 1 more instruction in loop. > > But I think it's OK for RVV since we will definitely be using SELECT_VL so > this issue will gone. > > But what about cases where you won't be using SELECT_VL, such as SLP? > > Richard > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)