On 10/25/2016 05:46 AM, Paolo Bonzini wrote:
> 
> 
> On 18/10/2016 17:10, Richard Henderson wrote:
>> +    case INDEX_op_extract_i32:
>> +        /* On the off-chance that we can use the high-byte registers.
>> +           Otherwise we emit the same ext16 + shift pattern that we
>> +           would have gotten from the normal tcg-op.c expansion.  */
>> +        tcg_debug_assert(args[2] == 8 && args[3] == 8);
>> +        if (args[1] < 4 && args[0] < 8) {
>> +            tcg_out_modrm(s, OPC_MOVZBL, args[0], args[1] + 4);
>> +        } else {
>> +            tcg_out_ext16u(s, args[0], args[1]);
>> +            tcg_out_shifti(s, SHIFT_SHR, args[0], 8);
>> +        }
> 
> Since the opcode is pretty rare, perhaps it's worth restricting the
> constraints to, respectively, a new constraint for 0xff ("R"?) and "Q"?
> It should generate slightly better code without constraining the
> register allocator too much.

I tried that, but since our allocator does nothing to look forward to future
uses, it will only properly load a value into Q if this is the first use of the
value within the TB.  Otherwise it'll generate an extra move to satisfy the
constraint.

Given that movzwl can operate on any source, and can copy to another
destination at the same time, it's wasteful to force the register allocator to
generate the extra move.

This ext16u+shift form is what we'll generate without the special case here.
So if you prefer I could drop the %[abcd]h special case entirely.

The one that's particularly valuable is the 32-bit shift as extraction from a
64-bit input.  That turns out to happen lots for e.g. ppc64abi32 guest.


r~

Reply via email to