On Tue, Apr 11, 2023 at 11:19 AM juzhe.zh...@rivai.ai <juzhe.zh...@rivai.ai> wrote: > > No, we can only pass "available" to LCM. > Passing "compatible" to LCM can not work for us. > > LCM can only help for eliminate vsetvls can not help for fuse vsetvls. > > For example: > > bb 0: > vsetvl e8,mf8 > vadd (Demand SEW = 8, LMUL = MF8) > bb 1: > vsetvl e32 mf2 > vle32 (Demand RATIO (SEW/LMUL) = 64, so available [SEW, LMUL] = [e8,mf8 or > e16,mf4 or e32,mf2 or e64,m1]) > > I use LCM to handle the case above, I tell LCM that the vsetvl of vadd is > "available" for the following "vle" instruction. > Then LCM will let us to remove "vsetvl e32mf2" > This is what I said "available" case that I use LCM to handle. > > However, LCM can not handle "compatible" case. Here is the example: > > Loop: > { > bb 0: > vsetvl e8,mf8,TA > vadd (Demand SEW = 8, LMUL = MF8, can either TU or TA) > bb 1: > vsetvl e32 mf2,TU > vle32 (Demand RATIO (SEW/LMUL) = 64, so available [SEW, LMUL] = [e8,mf8 or > e16,mf4 or e32,mf2 or e64,m1], and demand TU) > }
So for this case the vle32 instruction is the one that also works with other VL, which means the vsetvl is too strict. I agree if the above is the input to LCM then it's difficult to make it work. But then maybe initial placement should be only done for strict cases. Anyway - I'm just throwing in wild guesses here. > It's obvious that neither "vsetvl e8,mf8,TA" nor "vsetvl e32 mf2,TU" are > available for both instructions "vadd" and "vle". > That's why we need Phase 3 in VSETVL PASS. > I do the demand fusion generate a new vsetvl instructions "vsetvl e8,mf8,TU" > which is available for both RVV instructions "vadd" and "vle", > and update the first vsetvl "vsetvl e8,mf8,TA" to "vsetvl e8,mf8,TU" > > Then, I tell LCM "vsetvl e8,mf8,TU" is available for both "vadd" and "vle32", > so LCM will hoist "vsetvl e8,mf8,TU" outside the LOOP > and remove all vsetvls inside the loop. > > ________________________________ > juzhe.zh...@rivai.ai > > > From: Richard Biener > Date: 2023-04-11 16:55 > To: juzhe.zhong > CC: Jeff Law; gcc-patches; kito.cheng; palmer > Subject: Re: Re: [PATCH] RISC-V: Fix PR108279 > On Wed, Apr 5, 2023 at 3:53 PM <juzhe.zh...@rivai.ai> wrote: > > > > >> So fusion in this context is really about identifying cases where two > > >> configuration settings are equivalent and you "fuse" them together. > > >> Presumably this is only going to be possible when the vector insns are > > >> just doing data movement rather than actual computations? > > > > >> If my understanding is correct, I can kind of see why you're doing > > >> fusion during phase 3. My sense is there's a better way, but I'm having > > >> a bit of trouble working out the details of what that should be to > > >> myself. In any event, revamping parts of the vsetvl insertion code > > >> isn't the kind of thing we should be doing now. > > > > The vsetvl demand fusion happens is not necessary "equivalent", instead, we > > call it we will do demand fusion when they are "compatible". > > And the fusion can happen between any vector insns including data movement > > and actual computations. > > > > What is "compatible" ?? This definition is according to RVV ISA. > > For example , For a vadd.vv need a vsetvl zero, 4, e32,m1,ta,ma. > > and a vle.v need a vsetvl zero,4,e8,mf4,ta,ma. > > > > According to RVV ISA: > > vadd.vv demand SEW = 32, LMUL = M1, AVL = 4 > > vle.v demand RATIO = SEW/LMUL = 32, AVL = 4. > > So after demand fusion, the demand becomes SEW = 32, LMUL = M1, AVL = 4. > > Such vsetvl instruction is configured as this demand fusion, we call it > > "compatible" > > since we can find a common vsetvl VL/VTYPE status for both vadd.vv and vle.v > > > > However, what case is not "incompatible", same example, if the vadd.vv > > demand SEW = 32. LMUL = MF2, > > the vadd.vv is incompatible with vle.v. since we can't find a common > > VL/VTYPE vsetvl instruction available > > for both of them. > > > > We have local demand fusion which is Phase 1. Local demand fusion is doing > > the fusion within a block > > And also we have global demand fusion which is Phase 3. Global demand > > fusion is doing across blocks. > > > > After Phase 1, each block has a single demand fusion. Phase 3 is doing > > global demand fusion trying to > > find the common VL/VTYPE status available for a bunch of blocks, and fuse > > them into a single vsetvl. > > So that we eliminate redundant vsetvli. > > > > Here is a example: > > > > bb 0: (vle.v demand RATIO = 32) > > / \ > > bb 1 bb 2 > > / \ / \ > > bb 3 bb 4 .... bb 5 > > vadd vmul vdiv > > (demand (demand (demand > > sew = 8, sew = 8, sew = 8, > > lmul = mf4) lmul = mf4, lmul = mf4, > > tail policy = tu) mask policy = mu) > > > > So in this case, we should do the global demand fusion for bb 0, bb3, bb 4, > > bb5. > > since they are compatible according to RVV ISA. > > The final demand info of vsetvl should be vsetvl e8,mf4,tu,mu and put it > > in the bb0. Then we can avoid vsetvl in bb 3, 4, 5. > > Just to throw in a comment here - I think you should present LCM with > something it > can identify as the same for compatible vsetvl and then it should just > work? OTOH > if "compatible" is not transitive that's not possible (but then I > can't quickly make up > an example where it wouldn't be). > > > >> We have more fusion rules according to RVV ISA. Phase 3 (Global demand > > >> fusion) is > > >> really important. > > > > >> That would seem to indicate the function is poorly named. Unless you're > > >> using "empty" here to mean the state is valid or dirty. Either way it > > >> seems like the function name ought to be improved. > > > > >> The comments talk about bb1 being inside a loop. Nowhere do you check > > >> that as far as I can tell. > > > > >> When trying to understand what the patch is going I ran across this > > >> comment: > > > > >> /* The local_dem vector insn_info of the block. */ > > >> vector_insn_info local_dem; > > > > > > >> That comment really doesn't improve anything. "local_dem" is clearly > > >> short-hand for something (local demand?), whatever it is, make it > > >> clearer in the comment. > > > > Sorry for bad comments in the codes. Currently, I am working on the first > > patch > > of auto-vectorization. After I sent the first patch of auto-vectorization > > for you to > > review. I would like to re-check all the comments and code style of VSETVL > > PASS. > > And refine them. > > > > > > > > > > juzhe.zh...@rivai.ai > > > > From: Jeff Law > > Date: 2023-04-05 21:05 > > To: juzhe.zhong; gcc-patches > > CC: kito.cheng; palmer > > Subject: Re: [PATCH] RISC-V: Fix PR108279 > > > > > > On 4/2/23 16:40, juzhe.zh...@rivai.ai wrote: > > > This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand > > > info backward fusion and propogation) which > > > is I introduced into VSETVL PASS to enhance LCM && improve vsetvl > > > instruction performance. > > So fusion in this context is really about identifying cases where two > > configuration settings are equivalent and you "fuse" them together. > > Presumably this is only going to be possible when the vector insns are > > just doing data movement rather than actual computations? > > > > If my understanding is correct, I can kind of see why you're doing > > fusion during phase 3. My sense is there's a better way, but I'm having > > a bit of trouble working out the details of what that should be to > > myself. In any event, revamping parts of the vsetvl insertion code > > isn't the kind of thing we should be doing now. > > > > > > WRT the actual patch. Please put a function comment on the > > all_empty_predecessor_p method. Something like this perhaps? > > > > /* Return TRUE if all the predecessors of CFG_BB have vsetvl > > state that is valid or dirty, FALSE otherwise. */ > > > > > > That would seem to indicate the function is poorly named. Unless you're > > using "empty" here to mean the state is valid or dirty. Either way it > > seems like the function name ought to be improved. > > > > The comments talk about bb1 being inside a loop. Nowhere do you check > > that as far as I can tell. > > > > When trying to understand what the patch is going I ran across this comment: > > > > /* The local_dem vector insn_info of the block. */ > > vector_insn_info local_dem; > > > > > > That comment really doesn't improve anything. "local_dem" is clearly > > short-hand for something (local demand?), whatever it is, make it > > clearer in the comment. > > > > Jeff > > >