I am not sure about passing arguments in m2 but limited operation on
m1 is worth spending time on that way
because I think it's hard to prevent us from spilling that into the
stack and reloading that into the register.

And that code gen will kind of make VLS CC become useless due to the
code gen quality.

Ideally we'd just have
 vadd.vv v8,v8,v10
 vadd.vv v9,v9,v11

(plus vsetvl) in the function if LMUL1 is preferred I guess?

Right now we unnecessarily create

 _6 = BIT_FIELD_REF <vec1_1(D), 128, 0>;
 _7 = BIT_FIELD_REF <vec2_2(D), 128, 0>;

to access the upper/lower halves. That happens in vec lowering already I suppose. Those are vec_extracts but should just be nops as we can just
access the individual registers with LMUL1.  I haven't looked into what
would need changing, though.

Is this what you intended as well or did I misunderstand?

--
Regards
Robin

Reply via email to