[Bug target/99881] Regression compare -O2 -ftree-vectorize with -O2 on SKX/CLX

rguenth at gcc dot gnu.org via Gcc-bugs Tue, 06 Apr 2021 00:48:39 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99881


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
But 2 element construction _should_ be cheap.  What is missing is the move
cost from GPR to XMM regs (but we do not have a good idea whether the sources
are memory, so it's not as clear-cut here either).

IMHO a better approach might be to up unaligned vector store/load costs?

For the testcase at hand why does a throughput of 1 pose a problem?  There's
only one punpckldq instruction around?

Note that for the case of non-loop vectorization of 'double' the two element
vector CTORs are common and important to handle cheaply.  See also all the
discussion in PR98856

[Bug target/99881] Regression compare -O2 -ftree-vectorize with -O2 on SKX/CLX

Reply via email to