https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178
--- Comment #19 from rguenther at suse dot de <rguenther at suse dot de> --- On Thu, 27 Jan 2022, hubicka at kam dot mff.cuni.cz wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 > > --- Comment #13 from hubicka at kam dot mff.cuni.cz --- > > > According to znver2_cost > > > > > > Cost of sse_to_integer is a little bit less than fp_store, maybe increase > > > sse_to_integer cost(more than fp_store) can helps RA to choose memory > > > instead of GPR. > > > > That sounds reasonable - GPR<->xmm is cheaper than GPR -> stack -> xmm > > but GPR<->xmm should be more expensive than GPR/xmm<->stack. As said above > > Zen2 can do reg -> mem, mem -> reg via renaming if 'mem' is somewhat > > special, > > but modeling that doesn't seem to be necessary. > > > > We seem to have store costs of 8 and load costs of 6, I'll try bumping the > > gpr<->xmm move cost to 8. > > I was simply following latencies here, so indeed reg<->mem bypass is not > really modelled. I recall doing few experiments which was kind of > inconclusive. Yes, I think xmm->gpr->xmm vs. xmm->mem->xmm isn't really the issue here but it's mem->gpr->xmm vs. mem->xmm with all the constant pool remats. Agner lists latency of 3 for gpr<->xmm and a latency of 4 for mem<->xmm but then there's forwarding (and the clever renaming trick) which likely makes xmm->mem->xmm cheaper than 4 + 4 but xmm->gpr->xmm will really be 3 + 3 latency. grp<->xmm seem to be also more resource constrained. In any case for moving xmm to gpr it doesn't make sense to go through memory but it doesn't seem worth to spill to xmm or gpr when we only use gpr / xmm later. Letting the odd and bogus code we generate for the .LC0 re-materialization aside which we should fix and which fixing likely will fix LBM.