https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599
H.J. Lu changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599
--- Comment #9 from hjl at gcc dot gnu.org ---
Author: hjl
Date: Sun Oct 14 20:39:05 2018
New Revision: 265151
URL: https://gcc.gnu.org/viewcvs?rev=265151&root=gcc&view=rev
Log:
i386: Add register source to movddup
Add register source to movddu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599
--- Comment #8 from Alexander Monakov ---
Never mind, I was misunderstanding the effect of your patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599
--- Comment #7 from Alexander Monakov ---
But note that even with -mavx, gcc still uses movddup, even though the second
alternative has vpunpcklqdq with a register source.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599
--- Comment #6 from H.J. Lu ---
(In reply to Alexander Monakov from comment #5)
> I think we should use punpcklqdq here rather than movddup, because (at least
> on Intel) it has same latency, and same-or-better throughput. It may be ok
> to use m
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599
--- Comment #5 from Alexander Monakov ---
I think we should use punpcklqdq here rather than movddup, because (at least on
Intel) it has same latency, and same-or-better throughput. It may be ok to use
movddup when broadcasting from a memory sourc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599
H.J. Lu changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
--- Comm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599
--- Comment #2 from vgatherps at gmail dot com ---
Thanks! That fixes the optimization. However, using something like
-march=haswell or -march=corei7 does not result in this optimization being
made, which as far as I know -march= would imply -mtun
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599
--- Comment #1 from Andrew Pinski ---
Try with -mtune=intel. So AMD cores are faster with the move between gpr and
sse register sets via memory rather than direct
10 matches
Mail list logo