https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79938
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement Last reconfirmed| |2021-08-02 Status|UNCONFIRMED |NEW Component|target |tree-optimization Ever confirmed|0 |1 --- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> --- I think we could be better than what we producing on the trunk: _1 = BIT_FIELD_REF <a_49(D), 8, 0>; _3 = BIT_FIELD_REF <a_49(D), 8, 8>; _6 = BIT_FIELD_REF <a_49(D), 8, 16>; _8 = BIT_FIELD_REF <a_49(D), 8, 24>; _13 = BIT_FIELD_REF <a_49(D), 8, 32>; _15 = BIT_FIELD_REF <a_49(D), 8, 40>; _18 = BIT_FIELD_REF <a_49(D), 8, 48>; _20 = BIT_FIELD_REF <a_49(D), 8, 56>; _25 = BIT_FIELD_REF <a_49(D), 8, 64>; _27 = BIT_FIELD_REF <a_49(D), 8, 72>; _30 = BIT_FIELD_REF <a_49(D), 8, 80>; _32 = BIT_FIELD_REF <a_49(D), 8, 88>; _37 = BIT_FIELD_REF <a_49(D), 8, 96>; _63 = {_1, _13, _25, _37}; vect__2.10_22 = (vector(4) int) _63; _39 = BIT_FIELD_REF <a_49(D), 8, 104>; _29 = {_3, _15, _27, _39}; vect__4.11_60 = (vector(4) int) _29; _69 = vect__2.10_22 + vect__4.11_60; _42 = BIT_FIELD_REF <a_49(D), 8, 112>; _10 = {_6, _18, _30, _42}; vect__7.9_17 = (vector(4) int) _10; _44 = BIT_FIELD_REF <a_49(D), 8, 120>; _5 = {_8, _20, _32, _44}; vect__9.8_66 = (vector(4) int) _5; _70 = vect__7.9_17 + vect__9.8_66; vect__11.14_57 = _69 + _70; _55 = VIEW_CONVERT_EXPR<__m128i>(vect__11.14_57); ------ CUT ---- We could produce a shuffle from a_49(D) and then do extracts to get _63, _29, _10, and _5. clang does: pshufb .LCPI1_0(%rip), %xmm2 # xmm2 = xmm2[0],zero,zero,zero,xmm2[1],zero,zero,zero,xmm2[3],zero,zero,zero,xmm2[2],zero,zero,zero pshufd $238, %xmm2, %xmm0