[Bug target/87599] Broadcasting scalar to vector uses stack unnecessarily on x86

2018-10-14 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599 H.J. Lu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/87599] Broadcasting scalar to vector uses stack unnecessarily on x86

2018-10-14 Thread hjl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599 --- Comment #9 from hjl at gcc dot gnu.org --- Author: hjl Date: Sun Oct 14 20:39:05 2018 New Revision: 265151 URL: https://gcc.gnu.org/viewcvs?rev=265151&root=gcc&view=rev Log: i386: Add register source to movddup Add register source to movddu

[Bug target/87599] Broadcasting scalar to vector uses stack unnecessarily on x86

2018-10-14 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599 --- Comment #8 from Alexander Monakov --- Never mind, I was misunderstanding the effect of your patch.

[Bug target/87599] Broadcasting scalar to vector uses stack unnecessarily on x86

2018-10-14 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599 --- Comment #7 from Alexander Monakov --- But note that even with -mavx, gcc still uses movddup, even though the second alternative has vpunpcklqdq with a register source.

[Bug target/87599] Broadcasting scalar to vector uses stack unnecessarily on x86

2018-10-13 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599 --- Comment #6 from H.J. Lu --- (In reply to Alexander Monakov from comment #5) > I think we should use punpcklqdq here rather than movddup, because (at least > on Intel) it has same latency, and same-or-better throughput. It may be ok > to use m

[Bug target/87599] Broadcasting scalar to vector uses stack unnecessarily on x86

2018-10-13 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599 --- Comment #5 from Alexander Monakov --- I think we should use punpcklqdq here rather than movddup, because (at least on Intel) it has same latency, and same-or-better throughput. It may be ok to use movddup when broadcasting from a memory sourc

[Bug target/87599] Broadcasting scalar to vector uses stack unnecessarily on x86

2018-10-13 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599 H.J. Lu changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed|

[Bug target/87599] Broadcasting scalar to vector uses stack unnecessarily on x86

2018-10-12 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comm

[Bug target/87599] Broadcasting scalar to vector uses stack unnecessarily on x86

2018-10-12 Thread vgatherps at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599 --- Comment #2 from vgatherps at gmail dot com --- Thanks! That fixes the optimization. However, using something like -march=haswell or -march=corei7 does not result in this optimization being made, which as far as I know -march= would imply -mtun

[Bug target/87599] Broadcasting scalar to vector uses stack unnecessarily on x86

2018-10-12 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599 --- Comment #1 from Andrew Pinski --- Try with -mtune=intel. So AMD cores are faster with the move between gpr and sse register sets via memory rather than direct