> > And that code gen will kind of make VLS CC become useless due to the > > code gen quality. > > Ideally we'd just have > > vadd.vv v8,v8,v10 > vadd.vv v9,v9,v11 > > (plus vsetvl) in the function if LMUL1 is preferred I guess? > > Right now we unnecessarily create > > _6 = BIT_FIELD_REF <vec1_1(D), 128, 0>; > _7 = BIT_FIELD_REF <vec2_2(D), 128, 0>; > > to access the upper/lower halves. That happens in vec lowering already I > suppose. Those are vec_extracts but should just be nops as we can just > access the individual registers with LMUL1. I haven't looked into what > would need changing, though. > > Is this what you intended as well or did I misunderstand?
The layout will be different between VLEN=128 and VLEN=256 (and also any larger VLEN) Give a practical example: vec1 allocated into v8, and v9, the reg layout will be: VLEN = 128 v8 = [0, 1, 2, 3] v9 = [4, 5, 6, 7] VLEN=256 v8 = [0, 1, 2, 3, 4, 5, 6, 7] v9 = [?, ?, ?, ?, ?, ?, ?, ?] Then you could imaging vsetivli zero,8,e32,m2,ta,ma vadd.vv v8, v8, v10 is work on any machine with VLEN >= 128 But if we try to split that into LMUL=1 operation without spill and reload: vsetivli zero,4,e32,m1,ta,ma vadd.vv v8, v8 v10 vadd.vv v9, v9, v11 Then it can only work as expected on the machine with VLEN = 128 And then this need spill and reload to fix that: vsetivli zero,8,e32,m2,ta,ma vse.v v8, 0(sp) ... vsetivli zero,4,e32,m1,ta,ma # The reg layout will be same on VLEN=128 or larger VLEN vle.v v8, 0(sp) vle.v v9, 4(sp) ... vadd.vv v8, v8 v10 vadd.vv v9, v9, v11