On Wed, 27 Jul 2022, juzhe.zh...@rivai.ai wrote:

> For vint8m1_t: 
>    VECTOR_MODES_WITH_PREFIX (VNx, INT, 16, 0)
> ADJUST_NUNITS (VNx16QI, riscv_vector_chunks * 8);
> ADJUST_BYTES (VNx16QI, riscv_vector_chunks * 8);
> For vint8mf2_t: 
>    VECTOR_MODES_WITH_PREFIX (VNx, INT, 8, 0)
> ADJUST_NUNITS (VNx8QI, riscv_vector_chunks * 4);
> ADJUST_BYTES (VNx16QI, riscv_vector_chunks * 8);

^^^

ADJUST_BYTES (VNx8QI, riscv_vector_chunks * 8);

probably.  As said, I'm not sure this is a good idea just for the sake
of spill slots.  Maybe there's a mode_for_spill one could use and
spill a paradoxical subreg of that mode ... (don't search for
mode_for_spill, I just made that up).

Maybe you can somehow forbit spilling of vint8mf2_t and instead
define spill_class for those so they spill first to vint8m1_t
and then those get spilled to the stack?

> riscv_vector_chunks is a compile-time unknown poly value, I use ADJUST_BYTES 
> to specify these 2 machine_modes 
> with same bytesize but different element number.
> This way can tell GCC that vint8mf2_t has half element nunber of vint8m1_t 
> but occupy the same size during the register spilling.
> 
> May be we can add a new function that call ADJUST_PRECISION that I can 
> adjust these two precision?
> I am not sure. I am look forward another solution to deal with this issuse. 
> Thank you so much.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2022-07-27 16:12
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches
> Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison
> On Wed, 27 Jul 2022, juzhe.zh...@rivai.ai wrote:
>  
> > Let's take look at these 2 cases: https://godbolt.org/z/zP16frPnb. In 
> > RVV, we have vle8 and vsetvli to specify loading vint8mf2 (vsetvli a1, 
> > zero + vle8.v). You can see it in foo function.  In this case we don't 
> > need to confuse compiler the size of vint8mf2. However, The second case 
> > is I write assembly to occupy the vector register to generate register 
> > spilling cases. You can see LLVM implementation:  First use vsetvli + 
> > vle8.v load (vint8mf2_t) data from the base pointer, then spill the 
> > vector to memory using vs1r (this is the whole register store which 
> > store the whole vector to the memory and then use vl1r load the whole 
> > register and finally return it.  In LLVM implementation, it insert 
> > vsetvli instructions for RVV instruction using LLVM PASS before RA 
> > (register allocation).  So after RA, compiler generate the spilling 
> > loads/stores. We can either choose to use vsetvli + vle/vse (load/store 
> > fractional vector) or vl1r/vs1r (load/store whole vector which enlarge 
> > the spill size).
> > 
> > Because implementing insert vsetvli PASS after RA (register allocation) 
> > is so complicated, LLVM choose to use vl1r/vs1r. Frankly, I implement 
> > RVV GCC reference to LLVM. So that why I want to define the machine_mode 
> > for `vint8mf2` with smaller element-size but same byte-size from 
> > `vint8m1'.
>  
> The LLVM strathegy might not be the best one for GCC.  I still don't
> quite understand what issue you face with the code you try to patch.
> How does the machmode.def portion of the port look like for
> vint8m1 and vint8mf2?  What are the corresponding modes?
>  
> > Thank you for your reply.  
> > 
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2022-07-27 15:35
> > To: juzhe.zh...@rivai.ai
> > CC: gcc-patches
> > Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison
> > On Wed, 27 Jul 2022, juzhe.zh...@rivai.ai wrote:
> >  
> > > Thank you so much for the fast reply. Ok, it is true that I didn't think 
> > > about it carefully. Can you help me with the following the issue?
> > > 
> > > For RVV (RISC-V 'V' Extension), we have full vector type 'vint8m1_t' 
> > > (LMUL = 1) and fractional vector type 'vint8mf2_t' (LMUL = 1/2).
> >  
> > Can you explain in terms of GCCs generic vectors what vint8m1_t and
> > vint8mf2_t are?
> >  
> > > Because in the ISA, we don't have whole register load/store for 
> > > fractional vector. I reference the LLVM implementation and I adjust 
> > > BITSIZE of 
> > > fractional vector same as full vector (It will confuse GCC the bytesize 
> > > of fractional vector and consider the spill size of a fractional vector 
> > > is same as LMUL = 1) 
> > > so that I can use whole register load/store directly during the register 
> > > spilling. (Even though it will enlarge the spill size). According to the 
> > > machine_mode definition,
> > > The machine_mode PRECISION is calculate by component size which is 
> > > different from BITSIZE
> > > 
> > > Now, here is the question. For array type: vint8mf2x4_t,  if I want to 
> > > access vint8mf2x4_t[2], because the PRECISION and BITSIZE are different. 
> > > Because bitops is calculated by
> > > bitsize and compare to precision in the codes that the patch mentioned. 
> > > It will make a out-of-bounds access to small array.
> > > 
> > >  Can you help me with this? This is important for the following RVV 
> > > upstream support. Thanks.
> >  
> > So you have that vint8mf2_t type and registers + instructions to operate 
> > on them but no way to load/store?  How do you implement
> >  
> > vint8mf2_t foo (vint8mf2_t *p)
> > {
> >   return *p;
> > }
> >  
> > ?
> >  
> > > 
> > > 
> > > 
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2022-07-27 14:46
> > > To: zhongjuzhe
> > > CC: gcc-patches; richard.earnshaw; jakub; kenner; jlaw; gnu; jason; 
> > > davem; joseph; richard.sandiford; bernds_cb1; ian; wilson
> > > Subject: Re: [PATCH 1/1] Fix bit-position comparison
> > > On Wed, 27 Jul 2022, juzhe.zh...@rivai.ai wrote:
> > >  
> > > > From: zhongjuzhe <juzhe.zh...@rivai.ai>
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > >         * expr.cc (expand_assignment): Change GET_MODE_PRECISION to 
> > > > GET_MODE_BITSIZE
> > > > 
> > > > ---
> > > >  gcc/expr.cc | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > > > index 80bb1b8a4c5..ac2b3c07df6 100644
> > > > --- a/gcc/expr.cc
> > > > +++ b/gcc/expr.cc
> > > > @@ -5574,7 +5574,7 @@ expand_assignment (tree to, tree from, bool 
> > > > nontemporal)
> > > >  code contains an out-of-bounds access to a small array.  */
> > > >        if (!MEM_P (to_rtx)
> > > >    && GET_MODE (to_rtx) != BLKmode
> > > > -   && known_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx))))
> > > > +   && known_ge (bitpos, GET_MODE_BITSIZE (GET_MODE (to_rtx))))
> > >  
> > > I think this has the chance to go wrong with regard to endianess.
> > > Consider to_rtx with 32bit mode size but 12bit mode precision.  bitpos
> > > is relative to the start of to_rtx as if it were in memory and bitsize
> > > determines the contiguous region affected.  But since we are actually
> > > storing into a non-memory endianess comes into play.
> > >  
> > > So no, I don't think the patch is correct, it would be way more
> > > complicated to get the desired improvement.
> > >  
> > > Richard.
> > >  
> > > >  {
> > > >    expand_normal (from);
> > > >    result = NULL;
> > > > 
> > >  
> > > 
> >  
> > 
>  
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Reply via email to