https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99881

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
But 2 element construction _should_ be cheap.  What is missing is the move
cost from GPR to XMM regs (but we do not have a good idea whether the sources
are memory, so it's not as clear-cut here either).

IMHO a better approach might be to up unaligned vector store/load costs?

For the testcase at hand why does a throughput of 1 pose a problem?  There's
only one punpckldq instruction around?

Note that for the case of non-loop vectorization of 'double' the two element
vector CTORs are common and important to handle cheaply.  See also all the
discussion in PR98856

Reply via email to