https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121284
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #1) > trunk: > _3 = {_8, _9, _10, _11, _12, _13, _14, _15}; > _16 = BIT_FIELD_REF <_3, 256, 0>; > _17 = BIT_FIELD_REF <_3, 256, 256>; > _18 = VEC_PACK_FIX_TRUNC_EXPR <_16, _17>; > > 13: > _16 = {_8, _9, _10, _11}; > _17 = {_12, _13, _14, _15}; > _18 = VEC_PACK_FIX_TRUNC_EXPR <_16, _17>; > > Though I think the real issue: > _1 = {x_4(D), { 0.0, 0.0 }}; > _2 = {y_5(D), { 0.0, 0.0 }}; > _3 = VEC_PERM_EXPR <_1, _2, { 0, 1, 4, 5 }>; > _6 = .VEC_CONVERT (_3); > > is that vec perm should become: > _3 = {x_4(D), y_5(D)}; I think given the trunk IL we should fold that to two 256bit CTORs if the BIT_FIELD_REFs are the only use. forwprop has similar handling for loads at least (split the load). And I agree the permute should be folded as suggested, possibly with or without a single-use restriction. It's hard to tell whether a target supports such kind of vector CTOR efficiently, we might want to canonicalize to perms, another simplification would be _3 = VEC_PERM_EXPR <x_4(D), y_5(D), { 0, 1, 2, 3 }>;