https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794
--- Comment #7 from Robin Dapp <rdapp at gcc dot gnu.org> --- vectp.4_188 = x_50(D); vect__1.5_189 = MEM <vector(8) int> [(int *)vectp.4_188]; mask__2.6_190 = { 1, 1, 1, 1, 1, 1, 1, 1 } == vect__1.5_189; mask_patt_156.7_191 = VIEW_CONVERT_EXPR<vector(8) <signed-boolean:1>>(mask__2.6_190); _1 = *x_50(D); _2 = _1 == 1; vectp.9_192 = y_51(D); vect__3.10_193 = MEM <vector(8) short int> [(short int *)vectp.9_192]; mask__4.11_194 = { 2, 2, 2, 2, 2, 2, 2, 2 } == vect__3.10_193; mask_patt_157.12_195 = mask_patt_156.7_191 & mask__4.11_194; vect_patt_158.13_196 = VEC_COND_EXPR <mask_patt_157.12_195, { 1, 1, 1, 1, 1, 1, 1, 1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>; vect_patt_159.14_197 = (vector(8) int) vect_patt_158.13_196; This yields the following assembly: vsetivli zero,8,e32,m2,ta,ma vle32.v v2,0(a0) vmv.v.i v4,1 vle16.v v1,0(a1) vmseq.vv v0,v2,v4 vsetvli zero,zero,e16,m1,ta,ma vmseq.vi v1,v1,2 vsetvli zero,zero,e32,m2,ta,ma vmv.v.i v2,0 vmand.mm v0,v0,v1 vmerge.vvm v2,v2,v4,v0 vse32.v v2,0(a0) Apart from CSE'ing v4 this looks pretty good to me. My connection is really poor at the moment so I cannot quickly compare what aarch64 does for that example.