https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178
--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Richard Biener from comment #8) > So w/ -Ofast -march=znver2 I get a runtime of 130 seconds, when I add > -mtune-ctrl=^inter_unit_moves_from_vec,^inter_unit_moves_to_vec then > this improves to 114 seconds, with sink2 disabled I get 108 seconds > and with the tune-ctrl ontop I get 113 seconds. > > Note that Zen2 is quite special in that it has the ability to handle > load/store from the stack by mapping it to a register, effectively > making them zero latency (zen3 lost this ability). > > So while moves between GPRs and XMM might not be bad anymore _spilling_ > to a GPR (and I suppose XMM, too) is still a bad idea and the stack > should be preferred. > According to znver2_cost Cost of sse_to_integer is a little bit less than fp_store, maybe increase sse_to_integer cost(more than fp_store) can helps RA to choose memory instead of GPR.