On 05/24/2014 02:55 AM, Paolo Bonzini wrote: > Il 14/05/2014 09:17, Richard Henderson ha scritto: >> + tcg_out_opc_imm(s, OPC_LW, TCG_REG_A0, TCG_REG_A0, add_off); >> + tcg_out_opc_reg(s, OPC_AND, TCG_REG_T0, TCG_REG_T0, addrl); >> + >> + label_ptr[0] = s->code_ptr; >> tcg_out_opc_br(s, OPC_BNE, TCG_REG_T0, TCG_REG_AT); >> - tcg_out_nop(s); > > I don't remember mips very well, LW cannot be put in the delay slot? This > would > let you fill both delay slots for the 64-bit case. Or is it just that the > code > becomes harder to follow due to the TARGET_LONG_BITS == 64 "if"s? > > Alternatively, for 64-bit you could use OR+BNE instead of BNE+NOP+BNE. Of > course this can be done later, this patchset is already a big improvement.
It's MIPS I that had all sorts of problems with scheduling loads. Including requiring two cycles between load issue and use. TCG doesn't handle any of that; we require a fully interlocked pipeline. Without looking it up, I'd guess that was at least MIPS III (circa 1992?). Mostly that nop is hard to fill because of the if's, and I wanted to fill the last slot with the addition to make up the full host address. OR+BNE doesn't help; you need 2 XORs and 1 OR to do a double-word equality comparison. That's something that might take a bit of measurement to show it's worthwhile. r~