https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794
--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 16 Oct 2023, rdapp at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794 > > --- Comment #7 from Robin Dapp <rdapp at gcc dot gnu.org> --- > vectp.4_188 = x_50(D); > vect__1.5_189 = MEM <vector(8) int> [(int *)vectp.4_188]; > mask__2.6_190 = { 1, 1, 1, 1, 1, 1, 1, 1 } == vect__1.5_189; > mask_patt_156.7_191 = VIEW_CONVERT_EXPR<vector(8) > <signed-boolean:1>>(mask__2.6_190); > _1 = *x_50(D); > _2 = _1 == 1; > vectp.9_192 = y_51(D); > vect__3.10_193 = MEM <vector(8) short int> [(short int *)vectp.9_192]; > mask__4.11_194 = { 2, 2, 2, 2, 2, 2, 2, 2 } == vect__3.10_193; > mask_patt_157.12_195 = mask_patt_156.7_191 & mask__4.11_194; > vect_patt_158.13_196 = VEC_COND_EXPR <mask_patt_157.12_195, { 1, 1, 1, 1, 1, > 1, 1, 1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>; > vect_patt_159.14_197 = (vector(8) int) vect_patt_158.13_196; > > > This yields the following assembly: > vsetivli zero,8,e32,m2,ta,ma > vle32.v v2,0(a0) > vmv.v.i v4,1 > vle16.v v1,0(a1) > vmseq.vv v0,v2,v4 > vsetvli zero,zero,e16,m1,ta,ma > vmseq.vi v1,v1,2 > vsetvli zero,zero,e32,m2,ta,ma > vmv.v.i v2,0 > vmand.mm v0,v0,v1 > vmerge.vvm v2,v2,v4,v0 > vse32.v v2,0(a0) > > Apart from CSE'ing v4 this looks pretty good to me. My connection is really > poor at the moment so I cannot quickly compare what aarch64 does for that > example. That looks reasonable. Note this then goes through vectorizable_assignment as a no-op move. The question is if we can arrive here with signed bool : 2 vs. _Bool : 2 somehow (I wonder how we arrive with singed bool : 1 here - that's from pattern recog, right? why didn't that produce a COND_EXPR for this?). I think for more thorough testing the condition should change to /* But a conversion that does not change the bit-pattern is ok. */ && !(INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest)) && INTEGRAL_TYPE_P (TREE_TYPE (op)) && ((TYPE_PRECISION (TREE_TYPE (scalar_dest)) > TYPE_PRECISION (TREE_TYPE (op))) && TYPE_UNSIGNED (TREE_TYPE (op)))) || TYPE_PRECISION (TREE_TYPE (scalar_dest)) == TYPE_PRECISION (TREE_TYPE (op))))) rather than just doing >= which would be odd (why allow to skip sign-extenting from the unsigned MSB but not allow to skip zero-extending from it)