https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121284

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> trunk:
>   _3 = {_8, _9, _10, _11, _12, _13, _14, _15};
>   _16 = BIT_FIELD_REF <_3, 256, 0>;
>   _17 = BIT_FIELD_REF <_3, 256, 256>;
>   _18 = VEC_PACK_FIX_TRUNC_EXPR <_16, _17>;
> 
> 13:
>   _16 = {_8, _9, _10, _11};
>   _17 = {_12, _13, _14, _15};
>   _18 = VEC_PACK_FIX_TRUNC_EXPR <_16, _17>;
> 
> Though I think the real issue:
>   _1 = {x_4(D), { 0.0, 0.0 }};
>   _2 = {y_5(D), { 0.0, 0.0 }};
>   _3 = VEC_PERM_EXPR <_1, _2, { 0, 1, 4, 5 }>;
>   _6 = .VEC_CONVERT (_3);
> 
> is that vec perm should become:
>  _3 = {x_4(D), y_5(D)};

I think given the trunk IL we should fold that to two 256bit CTORs if the
BIT_FIELD_REFs are the only use.  forwprop has similar handling for loads
at least (split the load).  And I agree the permute should be folded as
suggested, possibly with or without a single-use restriction.  It's hard
to tell whether a target supports such kind of vector CTOR efficiently,
we might want to canonicalize to perms, another simplification would be

 _3 = VEC_PERM_EXPR <x_4(D), y_5(D), { 0, 1, 2, 3 }>;

Reply via email to