On Mon, Aug 26, 2013 at 03:34:15PM -0700, Richard Henderson wrote:
> On 08/26/2013 03:26 PM, Paolo Bonzini wrote:
> > Something that can be done on top of this patch: what about moving the
> > "-1" to helper_ret_*? It is common to pretty much all the targets
> > (except ARM has -2), and it would allow some simplifications.
>
> I suppose so, yes.
>
> > li rN, retaddr
> > mtlr rN
> > b st_trampoline[i]
> >
> > sequence instead of one of
> >
> > li rN, retaddr
> > mtlr rN
> > bl st_trampoline[i]
> > b retaddr
>
> This sort of thing is very difficult to evaluate, because of the
> cpu's return address prediction stack. I have so far avoided it.
>
> The only cpus that I believe can make good use of tail calls into
> the memory helpers are those with predicated stores and calls, i.e.
> arm and ia64.
>
On the other hand calling the helper is the exception more than the
rule (that's why they have been moved at the end of the TB), so we
should not look to much at generating fast code, but rather small code
in order to use the caches (both TB and CPU caches) more efficiently.
Therefore even on x86, if we move the -1 at the helper level, it should
be possible to use a tail call for the stores, something like:
mov %r14,%rdi
mov %ebx,%edx
xor %ecx,%ecx
lea -0x10f(%rip),%r8 # 0x7f2541a6f69a
pushq %r8
jmpq 0x7f25526757a0
Instead of:
mov %r14,%rdi
mov %ebx,%edx
xor %ecx,%ecx
lea -0x10f(%rip),%r8 # 0x7f2541a6f69a
callq 0x7f25526757a0
jmpq 0x7f2541a6f69b
--
Aurelien Jarno GPG: 1024D/F1BCDB73
[email protected] http://www.aurel32.net