https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target|x86_64 i?86 |x86_64-*-* i?86-*-*
Keywords| |missed-optimization
Last reconfirmed| |2020-12-07
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, vector lowering performs this optimization. But then in GIMPLE we have
__m128 f (__m128 a, __m128 b)
{
vector(4) float _3;
vector(4) float _5;
vector(4) float _6;
<bb 2> [local count: 1073741824]:
_3 = __builtin_ia32_shufps (b_2(D), b_2(D), 0);
_5 = __builtin_ia32_shufps (a_4(D), a_4(D), 0);
_6 = _3 * _5;
return _6;
so we don't actually see the operation. To rectify this the backend would
need to GIMPLE-fold those calls once the MAKS argument becomes constant.
Fold it to VEC_PERM_EXPR of VIEW_CONVERTed operands, that is.
Vector lowering doesn't perform generic permute optimizations, the vectorizer
does but it doesn't touch existing code. I guess it could be done in some
new pass similar to backprop (but dataflow is the other way around).