On 4/19/19 1:07 PM, Alex Bennée wrote:
>
> Richard Henderson <[email protected]> writes:
>
>> This is a case where we generate more than 64k code for a mere 231
>> guest instructions.
>
> I would like to know more! Are these unrolled vector ops or something else?
Yes. E.g.
ld4 { v0.16b - v3.16b }, [x0]
will generate 64 guest byte loads. Given the size of the code
generated for each guest memory operation, we should probably
change this to use 64-bit loads and dole out the bytes manually.
Even for linux-user, with direct host memory ops this converts
to 1k code.
r~