I am not sure about passing arguments in m2 but limited operation on
m1 is worth spending time on that way
because I think it's hard to prevent us from spilling that into the
stack and reloading that into the register.
And that code gen will kind of make VLS CC become useless due to the
code gen quality.
Ideally we'd just have
vadd.vv v8,v8,v10
vadd.vv v9,v9,v11
(plus vsetvl) in the function if LMUL1 is preferred I guess?
Right now we unnecessarily create
_6 = BIT_FIELD_REF <vec1_1(D), 128, 0>;
_7 = BIT_FIELD_REF <vec2_2(D), 128, 0>;
to access the upper/lower halves. That happens in vec lowering already I
suppose. Those are vec_extracts but should just be nops as we can just
access the individual registers with LMUL1. I haven't looked into what
would need changing, though.
Is this what you intended as well or did I misunderstand?
--
Regards
Robin