Re: Re: [PATCH] VECT: Change flow of decrement IV

钟居哲 Tue, 30 May 2023 15:51:32 -0700

Hi, Richi.

>> As I said in the PR with the proposed scheme you get a loop around copy of 
>> the IV since both the pre and the post decrement values are live at the same 
>> time.  
>> If the CPU has a underflow bit set from the subtraction and a branch on that 
>> test using that could avoid the copy need.


RISC-V port doesn't have such instructions so such copy is needed in RISC-V 
port.
But as I said, such copy is very cheap.

So, I wonder whether you will consider take && review this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620086.html 
or not?

Or you have another plan ?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-31 00:40
To: 钟居哲
CC: richard.sandiford; gcc-patches; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV


Am 30.05.2023 um 14:38 schrieb 钟居哲 <juzhe.zh...@rivai.ai>:

 

>> That's odd, you only need to adjust the IV which is used in the exit test,
>> not all the others.
Sorry for my incorrect information. I checked the codegen of both single-rgroup 
and multi-rgroup.
Their codegen are same behavior, after this patch, there will be 1 more neg 
instruction in preheader
and 1 more mv instruction inside the loop.

As I said in the PR with the proposed scheme you get a loop around copy of the 
IV since both the pre and the post decrement values are live at the same time.  
If the CPU has a underflow bit set from the subtraction and a branch on that 
test using that could avoid the copy need.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 20:33
To: juzhe.zhong
CC: Richard Sandiford; gcc-patches; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
On Tue, 30 May 2023, juzhe.zhong wrote:
 
> This patch will generate the number of rgroup ?mov? instructions inside the
> loop. This is unacceptable. For example?if number of rgroups=3? will be 3 more
> instruction in loop. If this patch is necessary? I think I should find a way
> to fix it.
 
That's odd, you only need to adjust the IV which is used in the exit test,
not all the others.
 
> ---- Replied Message ----
> From
> Richard Sandiford<richard.sandif...@arm.com>
> Date
> 05/30/2023 19:41
> To
> juzhe.zh...@rivai.ai<juzhe.zh...@rivai.ai>
> Cc
> gcc-patches<gcc-patches@gcc.gnu.org>,
> rguenther<rguent...@suse.de>,
> linkw<li...@linux.ibm.com>
> Subject
> Re: [PATCH] VECT: Change flow of decrement IV
> "juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes:
> > Before this patch:
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> >       sub   a2,a2,a5
> > bne a2,zero,.L3
> > .L5:
> > ret
> >
> > After this patch:
> >
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > neg a7,a4   -->>>additional instruction
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > mv a6,a2  -->>>additional instruction
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> > add a2,a2,a7
> > bgtu a6,a4,.L3
> > .L5:
> > ret
> >
> > There is 1 more instruction in preheader and 1 more instruction in loop.
> > But I think it's OK for RVV since we will definitely be using SELECT_VL so
> this issue will gone.
> 
> But what about cases where you won't be using SELECT_VL, such as SLP?
> 
> Richard
> 
> 
 
-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: Re: [PATCH] VECT: Change flow of decrement IV

Reply via email to