>> All that's missing is a (reinterpreting) vtype change to Pmode-sized >> elements before. I quickly hacked something together (without the proper >> mode change) and the resulting code looks like: >> vsetvli zero, 8, e8, ... >> vmv.v.x v1,a5 >> # missing vsetivli zero, 1, e64, ... or something >> vmv.x.s a0,v1
This issue has been addressed by this patch: [PATCH V3] RISC-V: Fix ICE in get_avl_or_vl_reg (gnu.org) juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-09-15 05:06 To: Kito Cheng; Juzhe-Zhong CC: rdapp.gcc; gcc-patches; kito.cheng; jeffreyalaw Subject: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391] > I am thinking what we are doing is something like we are allowing > scalar mode within the vector register, so...not sure should we try to > implement that within the mov pattern? > > I guess we need some inputs from Jeff. Sorry for the late response. I have also been thinking about this and it feels a bit like a bandaid to me. Usually register-class moves like this are performed by reload (which consults register_move_costs among other things) and we are working around it. The situation is that we move a vec_duplicate of QImodes into a vector register. Then we want to use this as scalar call argument so we need to transfer it back to a DImode register. One maybe more typical solution would be to allow small VLS vector modes like V8QI in GPRs (via hard_regno_mode_ok) until reload so we could have a (set (reg:V8QI a0) (vec_duplicate:V8QI ...)). The next step would be to have a mov<mode> expander with target "r" constraint (and source "vr") that performs the actual move. This is where Juzhe's mov code could fit in (without the subreg handling). If I'm not mistaken vmv.x.s without slidedown should be sufficient for our case as we'd only want to use the whole thing when the full vector fits into a GPR. All that's missing is a (reinterpreting) vtype change to Pmode-sized elements before. I quickly hacked something together (without the proper mode change) and the resulting code looks like: vsetvli zero, 8, e8, ... vmv.v.x v1,a5 # missing vsetivli zero, 1, e64, ... or something vmv.x.s a0,v1 Now, whether that's efficient (and desirable) is a separate issue and should probably be defined by register_move_costs as well as instruction costs. I wasn't actually aware of this call/argument optimization that uses vec_duplicate and I haven't checked what costing (if at all) it uses. Regards Robin