https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98774
--- Comment #5 from Andrew Pinski ---
For the trunk (without -ffast-math), the perms are done too early:
vect__5.32_53 = VEC_PERM_EXPR ;
vect__5.33_54 = VEC_PERM_EXPR ;
vect__5.34_55 = VEC_PERM_EXPR ;
_3 = *mag_9(D);
_58 = {_3, _3};
v
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98774
--- Comment #4 from Ivan Sorokin ---
I retested the sample on GCC 11.2.
https://godbolt.org/z/xrarP3zbY
Compared to Clang 12.0.1 GCC still generates 6 more instructions in total and
does 6 mulpd against Clang's 4 mulpd.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98774
--- Comment #3 from Ivan Sorokin ---
(In reply to Hongtao.liu from comment #1)
> It's fixed in current trunk https://godbolt.org/z/63576n
I can confirm that now GCC does use packed multiplication mulpd. Although it is
used somewhat inefficiently
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98774
Richard Biener changed:
What|Removed |Added
Keywords||missed-optimization
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98774
--- Comment #1 from Hongtao.liu ---
It's fixed in current trunk https://godbolt.org/z/63576n