On Wed, 3 May 2023 at 08:18, Richard Henderson
<[email protected]> wrote:
>
> Use the fpu to perform 64-bit loads and stores.
>
> Signed-off-by: Richard Henderson <[email protected]>
> @@ -2091,7 +2095,20 @@ static void tcg_out_qemu_ld_direct(TCGContext *s,
> TCGReg datalo, TCGReg datahi,
> datalo = datahi;
> datahi = t;
> }
> - if (h.base == datalo || h.index == datalo) {
> + if (h.atom == MO_64) {
> + /*
> + * Atomicity requires that we use use a single 8-byte load.
> + * For simplicity and code size, always use the FPU for this.
> + * Similar insns using SSE/AVX are merely larger.
I'm surprised there's no performance penalty for throwing old-school
FPU insns into what is presumably otherwise code that's only
using modern SSE.
> + * Load from memory in one go, then store back to the stack,
> + * from whence we can load into the correct integer regs.
> + */
> + tcg_out_modrm_sib_offset(s, OPC_ESCDF + h.seg, ESCDF_FILD_m64,
> + h.base, h.index, 0, h.ofs);
> + tcg_out_modrm_offset(s, OPC_ESCDF, ESCDF_FISTP_m64, TCG_REG_ESP,
> 0);
> + tcg_out_modrm_offset(s, movop, datalo, TCG_REG_ESP, 0);
> + tcg_out_modrm_offset(s, movop, datahi, TCG_REG_ESP, 4);
> + } else if (h.base == datalo || h.index == datalo) {
> tcg_out_modrm_sib_offset(s, OPC_LEA, datahi,
> h.base, h.index, 0, h.ofs);
> tcg_out_modrm_offset(s, movop + h.seg, datalo, datahi, 0);
I assume the caller has arranged that the top of the stack
is trashable at this point?
Reviewed-by: Peter Maydell <[email protected]>
-- PMM