https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98537
--- Comment #4 from prathamesh3492 at gcc dot gnu.org --- Hi, It seems to work on my machine for x86_64. Compiling with -O3 (or -O2), .optimized dump shows: v4si foo (v4si b, v4si a) { v4si c; vector(4) <signed-boolean:32> _1; <bb 2> [local count: 1073741824]: _1 = a_2(D) == b_3(D); c_4 = VIEW_CONVERT_EXPR<v4si>(_1); return c_4; } I tried on top of af362af18f405c34840d820143aa3a94f72fce4d. Btw, on ARM it seems to "scalarize" the code, .optimized dump shows: _6 = BIT_FIELD_REF <a_2(D), 32, 0>; _7 = BIT_FIELD_REF <b_3(D), 32, 0>; _8 = _6 == _7 ? -1 : 0; _9 = BIT_FIELD_REF <a_2(D), 32, 32>; _10 = BIT_FIELD_REF <b_3(D), 32, 32>; _11 = _9 == _10 ? -1 : 0; _12 = BIT_FIELD_REF <a_2(D), 32, 64>; _13 = BIT_FIELD_REF <b_3(D), 32, 64>; _14 = _12 == _13 ? -1 : 0; _15 = BIT_FIELD_REF <a_2(D), 32, 96>; _16 = BIT_FIELD_REF <b_3(D), 32, 96>; _17 = _15 == _16 ? -1 : 0; c_4 = {_8, _11, _14, _17}; return c_4; Thanks, Prathamesh