https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
Bug ID: 98167
Summary: [x86] Failure to optimize operation on indentically
shuffled operand into a shuffle of the result of the
operation
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
__m128 f(__m128 a, __m128 b) {
return _mm_mul_ps(_mm_shuffle_ps(a, a, 0), _mm_shuffle_ps(b, b, 0));
}
This can be optimized to:
__m128 f(__m128 a, __m128 b) {
__m128 tmp = _mm_mul_ss(a, b);
return _mm_shuffle_ps(tmp, tmp, 0);
}
This transformation is done by LLVM, but not by GCC.