https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80820

--- Comment #3 from Peter Cordes <peter at cordes dot ca> ---
Also, going the other direction is not symmetric.  On some CPUs, a store/reload
strategy for xmm->int might be better even if an ALU strategy for int->xmm is
best.

Also, the choice can depend on chunk size, since loads are cheap (2 per clock
for AMD since K8 and Intel since SnB).  And store-forwarding works.

Doing the first one with movd and the next with store/reload might be good,
too, on some CPUs. especially if there's some independent work that can happen
for the movd result.

I also discussed some of this at the bottom of the first post in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833.

Reply via email to