On Fri, Sep 5, 2025 at 9:21 AM Kito Cheng <kito.ch...@gmail.com> wrote: > > On Thu, Sep 4, 2025 at 11:50 PM Robin Dapp <rdapp....@gmail.com> wrote: > > > > > The layout will be different between VLEN=128 and VLEN=256 (and also > > > any larger VLEN) > > > > > > Give a practical example: > > > vec1 allocated into v8, and v9, the reg layout will be: > > > > > > VLEN = 128 > > > v8 = [0, 1, 2, 3] > > > v9 = [4, 5, 6, 7] > > > > > > VLEN=256 > > > v8 = [0, 1, 2, 3, 4, 5, 6, 7] > > > v9 = [?, ?, ?, ?, ?, ?, ?, ?] > > > > > > Then you could imaging > > > vsetivli zero,8,e32,m2,ta,ma > > > vadd.vv v8, v8, v10 > > > is work on any machine with VLEN >= 128 > > > > Ok, so whenever we didn't split a vector into LMUL1-sized (128 here) chunks > > in > > the first place we cannot go back to LMUL1 any more. > > > > Doesn't that also mean that _if_ we split into 128-bit chunks (first case > > above) running on VLEN=256 would look like > > > > v8 = [0, 1, 2, 3, ?, ?, ?, ?] > > v9 = [4, 5, 6, 7, ?, ?, ?, ?]
I guess this part involve few terms might be confused, so let me clarify here VLEN, MIN_VLEN (TARGET_MIN_VLEN) and ABI_VLEN VLEN refer to the hardware VLEN, which is the run-time constant MIN_VLEN refer to the compile time assumption, e.g. zvl128b assume vector register (VLEN) at least larger than 128 And the last one is the ABI_VLEN ``` int32x8_t __attribute__((riscv_vls_cc(128))) test_256bit_vector(int32x8_t vec1, int32x8_t vec2) { ``` The 128 in the riscv_vls_cc is ABI_VLEN, which means it will assume the vector register is at least 128 bits, so the function can be safely executed on any machine with VLEN >= 128. So why ABI_VLEN? that provides a way to stabilize the ABI rather than always let ABI_VLEN = MIN_VLEN. What's the problem if we didn't separate ABI_VLEN from MIN_VLEN/VLEN? that will cause zvl128b and zvl256b to become incompatible on the ABI layer. e.g. int32x8_t passed in m2 for ABI_VLEN=128 and int32x8_t passed in m1 for ABI_VLEN=256. And what we expect on the library usage side is they will mostly use ABI_VLEN=128, so that can run on all rva23 capability cpu. But why not just fix it into ABI_VLEN=128, because there is zvb32b and zvb64b. Last you may have question about why we need VLS CC rather than just use scalable vector type like vint32m1_t, the reason is because those type is less portability among different target, also it's impossible to declare scalable vector type to global var or a struct/class member. > > > > and > > vsetivli zero,8,e32,m2,ta,ma > > vadd.vv v8, v8, v10 > > > > wouldn't get the right result (from an LMUL2 perspective)? So the layouts > > are > > only compatible if VLEN and LMUL match? So back here, VLEN is runtime constant, so VLEN=256 will always layout value in v8 = [0, 1, 2, 3, 4, 5, 6, 7] v9 = [?, ?, ?, ?, ?, ?, ?, ?] Even the program is compiled with MIN_VLEN=128 > > > > I'm probably misunderstanding and/or am confused :) > > > > -- > > Regards > > Robin > >