On Tue, Apr 11, 2023 at 11:19 AM juzhe.zh...@rivai.ai
<juzhe.zh...@rivai.ai> wrote:
>
> No, we can only pass "available" to LCM.
> Passing "compatible" to LCM can not work for us.
>
> LCM can only help for eliminate vsetvls can not help for fuse vsetvls.
>
> For example:
>
> bb 0:
> vsetvl e8,mf8
> vadd (Demand SEW = 8, LMUL = MF8)
> bb 1:
> vsetvl e32 mf2
> vle32 (Demand RATIO (SEW/LMUL) = 64, so available [SEW, LMUL] = [e8,mf8 or 
> e16,mf4 or e32,mf2 or e64,m1])
>
> I use LCM to handle the case above, I tell LCM that the vsetvl of vadd is 
> "available" for the following "vle" instruction.
> Then LCM will let us to remove "vsetvl e32mf2"
> This is what I said "available" case that I use LCM to handle.
>
> However, LCM can not handle "compatible" case. Here is the example:
>
> Loop:
> {
> bb 0:
> vsetvl e8,mf8,TA
> vadd (Demand SEW = 8, LMUL = MF8, can either TU or TA)
> bb 1:
> vsetvl e32 mf2,TU
> vle32 (Demand RATIO (SEW/LMUL) = 64, so available [SEW, LMUL] = [e8,mf8 or 
> e16,mf4 or e32,mf2 or e64,m1], and demand TU)
> }

So for this case the vle32 instruction is the one that also works with
other VL, which means the vsetvl is too strict.  I agree
if the above is the input to LCM then it's difficult to make it work.
But then maybe initial placement should be only
done for strict cases.  Anyway - I'm just throwing in wild guesses here.

> It's obvious that neither "vsetvl e8,mf8,TA" nor "vsetvl e32 mf2,TU" are 
> available for both instructions "vadd" and "vle".
> That's why we need Phase 3 in VSETVL PASS.
> I do the demand fusion generate a new vsetvl instructions "vsetvl e8,mf8,TU" 
> which is available for both RVV instructions "vadd" and "vle",
> and update the first vsetvl "vsetvl e8,mf8,TA" to  "vsetvl e8,mf8,TU"
>
> Then, I tell LCM "vsetvl e8,mf8,TU" is available for both "vadd" and "vle32", 
> so LCM will hoist "vsetvl e8,mf8,TU" outside the LOOP
> and remove all vsetvls inside the loop.
>
> ________________________________
> juzhe.zh...@rivai.ai
>
>
> From: Richard Biener
> Date: 2023-04-11 16:55
> To: juzhe.zhong
> CC: Jeff Law; gcc-patches; kito.cheng; palmer
> Subject: Re: Re: [PATCH] RISC-V: Fix PR108279
> On Wed, Apr 5, 2023 at 3:53 PM <juzhe.zh...@rivai.ai> wrote:
> >
> > >> So fusion in this context is really about identifying cases where two
> > >> configuration settings are equivalent and you "fuse" them together.
> > >> Presumably this is only going to be possible when the vector insns are
> > >> just doing data movement rather than actual computations?
> >
> > >> If my understanding is correct, I can kind of see why you're doing
> > >> fusion during phase 3.  My sense is there's a better way, but I'm having
> > >> a bit of trouble working out the details of what that should be to
> > >> myself.  In any event, revamping parts of the vsetvl insertion code
> > >> isn't the kind of thing we should be doing now.
> >
> > The vsetvl demand fusion happens is not necessary "equivalent", instead, we
> > call it we will do demand fusion when they are "compatible".
> > And the fusion can happen between any vector insns including data movement
> > and actual computations.
> >
> > What is "compatible" ??  This definition is according to RVV ISA.
> > For example , For a vadd.vv need a vsetvl zero, 4, e32,m1,ta,ma.
> > and a vle.v need a vsetvl zero,4,e8,mf4,ta,ma.
> >
> > According to RVV ISA:
> > vadd.vv demand SEW = 32, LMUL = M1, AVL = 4
> > vle.v demand RATIO = SEW/LMUL = 32, AVL = 4.
> > So after demand fusion, the demand becomes SEW = 32, LMUL = M1, AVL = 4.
> > Such vsetvl instruction is configured as this demand fusion, we call it 
> > "compatible"
> > since we can find a common vsetvl VL/VTYPE status for both vadd.vv and vle.v
> >
> > However, what case is not "incompatible", same example, if the vadd.vv 
> > demand SEW = 32. LMUL = MF2,
> > the vadd.vv is incompatible with vle.v. since we can't find a common 
> > VL/VTYPE vsetvl instruction available
> > for both of them.
> >
> > We have local demand fusion which is Phase 1. Local demand fusion is doing 
> > the fusion within a block
> > And also we have global demand fusion which is Phase 3. Global demand 
> > fusion is doing across blocks.
> >
> > After Phase 1, each block has a single demand fusion. Phase 3 is doing 
> > global demand fusion trying to
> > find the common VL/VTYPE status available for a bunch of blocks, and fuse 
> > them into a single vsetvl.
> > So that we eliminate redundant vsetvli.
> >
> > Here is a example:
> >
> >                                     bb 0:  (vle.v demand RATIO = 32)
> >                                   /       \
> >                             bb 1      bb 2
> >                           /      \     /       \
> >                  bb 3       bb 4  ....     bb 5
> >                vadd       vmul          vdiv
> >             (demand  (demand      (demand
> >              sew = 8,    sew = 8,      sew = 8,
> >         lmul = mf4)  lmul = mf4,   lmul = mf4,
> >                           tail policy = tu) mask policy = mu)
> >
> > So in this case, we should do the global demand fusion for bb 0, bb3, bb 4, 
> > bb5.
> > since they are compatible according to RVV ISA.
> > The final demand info of vsetvl should be vsetvl e8,mf4,tu,mu and put it
> > in the bb0. Then we can avoid vsetvl in bb 3, 4, 5.
>
> Just to throw in a comment here - I think you should present LCM with
> something it
> can identify as the same for compatible vsetvl and then it should just
> work?  OTOH
> if "compatible" is not transitive that's not possible (but then I
> can't quickly make up
> an example where it wouldn't be).
>
> > >> We have more fusion rules according to RVV ISA. Phase 3 (Global demand 
> > >> fusion) is
> > >> really important.
> >
> > >> That would seem to indicate the function is poorly named.  Unless you're
> > >> using "empty" here to mean the state is valid or dirty.  Either way it
> > >> seems like the function name ought to be improved.
> >
> > >> The comments talk about bb1 being inside a loop.  Nowhere do you check
> > >> that as far as I can tell.
> >
> > >> When trying to understand what the patch is going I ran across this 
> > >> comment:
> >
> > >>   /* The local_dem vector insn_info of the block.  */
> >  >>   vector_insn_info local_dem;
> >
> >
> > >> That comment really doesn't improve anything.  "local_dem" is clearly
> > >> short-hand for something (local demand?), whatever it is, make it
> > >> clearer in the comment.
> >
> > Sorry for bad comments in the codes. Currently, I am working on the first 
> > patch
> > of auto-vectorization. After I sent the first patch of auto-vectorization 
> > for you to
> > review. I would like to re-check all the comments and code style of VSETVL 
> > PASS.
> > And refine them.
> >
> >
> >
> >
> > juzhe.zh...@rivai.ai
> >
> > From: Jeff Law
> > Date: 2023-04-05 21:05
> > To: juzhe.zhong; gcc-patches
> > CC: kito.cheng; palmer
> > Subject: Re: [PATCH] RISC-V: Fix PR108279
> >
> >
> > On 4/2/23 16:40, juzhe.zh...@rivai.ai wrote:
> > > This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand
> > > info backward fusion and propogation) which
> > > is I introduced into VSETVL PASS to enhance LCM && improve vsetvl
> > > instruction performance.
> > So fusion in this context is really about identifying cases where two
> > configuration settings are equivalent and you "fuse" them together.
> > Presumably this is only going to be possible when the vector insns are
> > just doing data movement rather than actual computations?
> >
> > If my understanding is correct, I can kind of see why you're doing
> > fusion during phase 3.  My sense is there's a better way, but I'm having
> > a bit of trouble working out the details of what that should be to
> > myself.  In any event, revamping parts of the vsetvl insertion code
> > isn't the kind of thing we should be doing now.
> >
> >
> > WRT the actual patch.  Please put a function comment on the
> > all_empty_predecessor_p method. Something like this perhaps?
> >
> > /* Return TRUE if all the predecessors of CFG_BB have vsetvl
> >     state that is valid or dirty, FALSE otherwise.  */
> >
> >
> > That would seem to indicate the function is poorly named.  Unless you're
> > using "empty" here to mean the state is valid or dirty.  Either way it
> > seems like the function name ought to be improved.
> >
> > The comments talk about bb1 being inside a loop.  Nowhere do you check
> > that as far as I can tell.
> >
> > When trying to understand what the patch is going I ran across this comment:
> >
> >   /* The local_dem vector insn_info of the block.  */
> >    vector_insn_info local_dem;
> >
> >
> > That comment really doesn't improve anything.  "local_dem" is clearly
> > short-hand for something (local demand?), whatever it is, make it
> > clearer in the comment.
> >
> > Jeff
> >
>

Reply via email to