https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2019-01-14
CC| |rguenth at gcc dot gnu.org
Ever confirmed|0 |1
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think there's related bugs. foo1 is optimized OK:
y_4 = BIT_INSERT_EXPR <x_2(D), f_3(D), 0 (32 bits)>;
return y_4;
while foo is expanded from
<bb 2> [local count: 1073741824]:
_1 = BIT_FIELD_REF <x_7(D), 32, 32>;
_2 = BIT_FIELD_REF <x_7(D), 32, 64>;
_3 = BIT_FIELD_REF <x_7(D), 32, 96>;
y_6 = {f_5(D), _1, _2, _3};
return y_6;
tree forwprop contains code pattern-matching on vector CONSTRUCTORs,
it could be extended to handle this case I think. IIRC it can detect
arbitrary two-vector permutes already, for the above we could go
through an intermediate
_1 = {f_5(D), f_5(D), ... };
y_6 = VEC_PERM <_1, x_7(D), { .... }>;
and recognize permutes that only replace a single vector element.
So I think we should optimize
__v4sf
foo (__v4sf x, float f)
{
__v4sf y = __extension__ (__v4sf)
{ f, x[2], x[1], x[3] };
return y;
}
as well, first permuting x and then inserting f (at any position).