On Fri, Sep 5, 2025 at 9:21 AM Kito Cheng <kito.ch...@gmail.com> wrote:
>
> On Thu, Sep 4, 2025 at 11:50 PM Robin Dapp <rdapp....@gmail.com> wrote:
> >
> > > The layout will be different between VLEN=128 and VLEN=256 (and also
> > > any larger VLEN)
> > >
> > > Give a practical example:
> > > vec1 allocated into v8, and v9, the reg layout will be:
> > >
> > > VLEN = 128
> > > v8 = [0, 1, 2, 3]
> > > v9 = [4, 5, 6, 7]
> > >
> > > VLEN=256
> > > v8 = [0, 1, 2, 3, 4, 5, 6, 7]
> > > v9 = [?, ?, ?, ?, ?, ?, ?, ?]
> > >
> > > Then you could imaging
> > > vsetivli        zero,8,e32,m2,ta,ma
> > > vadd.vv v8, v8, v10
> > > is work on any machine with VLEN >= 128
> >
> > Ok, so whenever we didn't split a vector into LMUL1-sized (128 here) chunks 
> > in
> > the first place we cannot go back to LMUL1 any more.
> >
> > Doesn't that also mean that _if_ we split into 128-bit chunks (first case
> > above) running on VLEN=256 would look like
> >
> > v8 = [0, 1, 2, 3, ?, ?, ?, ?]
> > v9 = [4, 5, 6, 7, ?, ?, ?, ?]

I guess this part involve few terms might be confused, so let me clarify here
VLEN, MIN_VLEN (TARGET_MIN_VLEN) and ABI_VLEN

VLEN refer to the hardware VLEN, which is the run-time constant

MIN_VLEN refer to the compile time assumption, e.g. zvl128b assume
vector register (VLEN) at least larger than 128

And the last one is the ABI_VLEN

```
int32x8_t __attribute__((riscv_vls_cc(128)))
test_256bit_vector(int32x8_t vec1, int32x8_t vec2) {
```
The 128 in the riscv_vls_cc is ABI_VLEN, which means it will assume
the vector register is at least 128 bits,
so the function can be safely executed on any machine with VLEN >= 128.

So why ABI_VLEN? that provides a way to stabilize the ABI rather than
always let ABI_VLEN = MIN_VLEN.

What's the problem if we didn't separate ABI_VLEN from MIN_VLEN/VLEN?
that will cause zvl128b and zvl256b to become incompatible on the ABI
layer.

e.g. int32x8_t passed in m2 for ABI_VLEN=128 and int32x8_t passed in
m1 for ABI_VLEN=256.

And what we expect on the library usage side is they will mostly use
ABI_VLEN=128, so that can run on all rva23 capability cpu.

But why not just fix it into ABI_VLEN=128, because there is zvb32b and zvb64b.

Last you may have question about why we need VLS CC rather than just
use scalable vector type like vint32m1_t,
the reason is because those type is less portability among different
target, also it's impossible to declare scalable vector type to global
var or a struct/class member.

> >
> > and
> > vsetivli        zero,8,e32,m2,ta,ma
> > vadd.vv v8, v8, v10
> >
> > wouldn't get the right result (from an LMUL2 perspective)? So the layouts 
> > are
> > only compatible if VLEN and LMUL match?

So back here, VLEN is runtime constant, so VLEN=256 will always layout value in

v8 = [0, 1, 2, 3, 4, 5, 6, 7]
v9 = [?, ?, ?, ?, ?, ?, ?, ?]

Even the program is compiled with MIN_VLEN=128

> >
> > I'm probably misunderstanding and/or am confused :)
> >
> > --
> > Regards
> >  Robin
> >

Reply via email to