> > And that code gen will kind of make VLS CC become useless due to the
> > code gen quality.
>
> Ideally we'd just have
>
>   vadd.vv v8,v8,v10
>   vadd.vv v9,v9,v11
>
> (plus vsetvl) in the function if LMUL1 is preferred I guess?
>
> Right now we unnecessarily create
>
>   _6 = BIT_FIELD_REF <vec1_1(D), 128, 0>;
>   _7 = BIT_FIELD_REF <vec2_2(D), 128, 0>;
>
> to access the upper/lower halves.  That happens in vec lowering already I
> suppose.  Those are vec_extracts but should just be nops as we can just
> access the individual registers with LMUL1.  I haven't looked into what
> would need changing, though.
>
> Is this what you intended as well or did I misunderstand?

The layout will be different between VLEN=128 and VLEN=256 (and also
any larger VLEN)

Give a practical example:
vec1 allocated into v8, and v9, the reg layout will be:

VLEN = 128
v8 = [0, 1, 2, 3]
v9 = [4, 5, 6, 7]

VLEN=256
v8 = [0, 1, 2, 3, 4, 5, 6, 7]
v9 = [?, ?, ?, ?, ?, ?, ?, ?]

Then you could imaging
vsetivli        zero,8,e32,m2,ta,ma
vadd.vv v8, v8, v10
is work on any machine with VLEN >= 128

But if we try to split that into LMUL=1 operation without spill and reload:
vsetivli        zero,4,e32,m1,ta,ma
vadd.vv v8, v8 v10
vadd.vv v9, v9, v11

Then it can only work as expected on the machine with VLEN = 128

And then this need spill and reload to fix that:
vsetivli        zero,8,e32,m2,ta,ma
vse.v  v8, 0(sp)
...
vsetivli        zero,4,e32,m1,ta,ma
# The reg layout will be same on VLEN=128 or larger VLEN
vle.v v8, 0(sp)
vle.v v9, 4(sp)
...
vadd.vv v8, v8 v10
vadd.vv v9, v9, v11

Reply via email to