On Wed, 27 Jul 2022, juzhe.zh...@rivai.ai wrote: > For vint8m1_t: > VECTOR_MODES_WITH_PREFIX (VNx, INT, 16, 0) > ADJUST_NUNITS (VNx16QI, riscv_vector_chunks * 8); > ADJUST_BYTES (VNx16QI, riscv_vector_chunks * 8); > For vint8mf2_t: > VECTOR_MODES_WITH_PREFIX (VNx, INT, 8, 0) > ADJUST_NUNITS (VNx8QI, riscv_vector_chunks * 4); > ADJUST_BYTES (VNx16QI, riscv_vector_chunks * 8);
^^^ ADJUST_BYTES (VNx8QI, riscv_vector_chunks * 8); probably. As said, I'm not sure this is a good idea just for the sake of spill slots. Maybe there's a mode_for_spill one could use and spill a paradoxical subreg of that mode ... (don't search for mode_for_spill, I just made that up). Maybe you can somehow forbit spilling of vint8mf2_t and instead define spill_class for those so they spill first to vint8m1_t and then those get spilled to the stack? > riscv_vector_chunks is a compile-time unknown poly value, I use ADJUST_BYTES > to specify these 2 machine_modes > with same bytesize but different element number. > This way can tell GCC that vint8mf2_t has half element nunber of vint8m1_t > but occupy the same size during the register spilling. > > May be we can add a new function that call ADJUST_PRECISION that I can > adjust these two precision? > I am not sure. I am look forward another solution to deal with this issuse. > Thank you so much. > > > > juzhe.zh...@rivai.ai > > From: Richard Biener > Date: 2022-07-27 16:12 > To: juzhe.zh...@rivai.ai > CC: gcc-patches > Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison > On Wed, 27 Jul 2022, juzhe.zh...@rivai.ai wrote: > > > Let's take look at these 2 cases: https://godbolt.org/z/zP16frPnb. In > > RVV, we have vle8 and vsetvli to specify loading vint8mf2 (vsetvli a1, > > zero + vle8.v). You can see it in foo function. In this case we don't > > need to confuse compiler the size of vint8mf2. However, The second case > > is I write assembly to occupy the vector register to generate register > > spilling cases. You can see LLVM implementation: First use vsetvli + > > vle8.v load (vint8mf2_t) data from the base pointer, then spill the > > vector to memory using vs1r (this is the whole register store which > > store the whole vector to the memory and then use vl1r load the whole > > register and finally return it. In LLVM implementation, it insert > > vsetvli instructions for RVV instruction using LLVM PASS before RA > > (register allocation). So after RA, compiler generate the spilling > > loads/stores. We can either choose to use vsetvli + vle/vse (load/store > > fractional vector) or vl1r/vs1r (load/store whole vector which enlarge > > the spill size). > > > > Because implementing insert vsetvli PASS after RA (register allocation) > > is so complicated, LLVM choose to use vl1r/vs1r. Frankly, I implement > > RVV GCC reference to LLVM. So that why I want to define the machine_mode > > for `vint8mf2` with smaller element-size but same byte-size from > > `vint8m1'. > > The LLVM strathegy might not be the best one for GCC. I still don't > quite understand what issue you face with the code you try to patch. > How does the machmode.def portion of the port look like for > vint8m1 and vint8mf2? What are the corresponding modes? > > > Thank you for your reply. > > > > > > > > juzhe.zh...@rivai.ai > > > > From: Richard Biener > > Date: 2022-07-27 15:35 > > To: juzhe.zh...@rivai.ai > > CC: gcc-patches > > Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison > > On Wed, 27 Jul 2022, juzhe.zh...@rivai.ai wrote: > > > > > Thank you so much for the fast reply. Ok, it is true that I didn't think > > > about it carefully. Can you help me with the following the issue? > > > > > > For RVV (RISC-V 'V' Extension), we have full vector type 'vint8m1_t' > > > (LMUL = 1) and fractional vector type 'vint8mf2_t' (LMUL = 1/2). > > > > Can you explain in terms of GCCs generic vectors what vint8m1_t and > > vint8mf2_t are? > > > > > Because in the ISA, we don't have whole register load/store for > > > fractional vector. I reference the LLVM implementation and I adjust > > > BITSIZE of > > > fractional vector same as full vector (It will confuse GCC the bytesize > > > of fractional vector and consider the spill size of a fractional vector > > > is same as LMUL = 1) > > > so that I can use whole register load/store directly during the register > > > spilling. (Even though it will enlarge the spill size). According to the > > > machine_mode definition, > > > The machine_mode PRECISION is calculate by component size which is > > > different from BITSIZE > > > > > > Now, here is the question. For array type: vint8mf2x4_t, if I want to > > > access vint8mf2x4_t[2], because the PRECISION and BITSIZE are different. > > > Because bitops is calculated by > > > bitsize and compare to precision in the codes that the patch mentioned. > > > It will make a out-of-bounds access to small array. > > > > > > Can you help me with this? This is important for the following RVV > > > upstream support. Thanks. > > > > So you have that vint8mf2_t type and registers + instructions to operate > > on them but no way to load/store? How do you implement > > > > vint8mf2_t foo (vint8mf2_t *p) > > { > > return *p; > > } > > > > ? > > > > > > > > > > > > > > > > > juzhe.zh...@rivai.ai > > > > > > From: Richard Biener > > > Date: 2022-07-27 14:46 > > > To: zhongjuzhe > > > CC: gcc-patches; richard.earnshaw; jakub; kenner; jlaw; gnu; jason; > > > davem; joseph; richard.sandiford; bernds_cb1; ian; wilson > > > Subject: Re: [PATCH 1/1] Fix bit-position comparison > > > On Wed, 27 Jul 2022, juzhe.zh...@rivai.ai wrote: > > > > > > > From: zhongjuzhe <juzhe.zh...@rivai.ai> > > > > > > > > gcc/ChangeLog: > > > > > > > > * expr.cc (expand_assignment): Change GET_MODE_PRECISION to > > > > GET_MODE_BITSIZE > > > > > > > > --- > > > > gcc/expr.cc | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/gcc/expr.cc b/gcc/expr.cc > > > > index 80bb1b8a4c5..ac2b3c07df6 100644 > > > > --- a/gcc/expr.cc > > > > +++ b/gcc/expr.cc > > > > @@ -5574,7 +5574,7 @@ expand_assignment (tree to, tree from, bool > > > > nontemporal) > > > > code contains an out-of-bounds access to a small array. */ > > > > if (!MEM_P (to_rtx) > > > > && GET_MODE (to_rtx) != BLKmode > > > > - && known_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx)))) > > > > + && known_ge (bitpos, GET_MODE_BITSIZE (GET_MODE (to_rtx)))) > > > > > > I think this has the chance to go wrong with regard to endianess. > > > Consider to_rtx with 32bit mode size but 12bit mode precision. bitpos > > > is relative to the start of to_rtx as if it were in memory and bitsize > > > determines the contiguous region affected. But since we are actually > > > storing into a non-memory endianess comes into play. > > > > > > So no, I don't think the patch is correct, it would be way more > > > complicated to get the desired improvement. > > > > > > Richard. > > > > > > > { > > > > expand_normal (from); > > > > result = NULL; > > > > > > > > > > > > > > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)