On Mon, Jun 19, 2017 at 7:51 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> On Mon, Jun 19, 2017 at 11:45:13AM -0600, Jeff Law wrote:
>> On 06/19/2017 11:29 AM, Jakub Jelinek wrote:
>> >
>> > Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack,
>> > while it is shorter, is it actually faster or as slow as movq $0, (%rsp)
>> > or movl $0, (%esp) ?
>> Florian raised this privately to me as well.  THere's a couple issues.
>>
>> 1. Is there a performance penalty/gain for sub-word operations?  If not,
>>    we can improve things slighly there.  Even if it's performance
>>    neutral we can probably do better on code size.
>
> CCing Uros and Honza here, I believe there are at least on x86 penalties
> for 2-byte, maybe for 1-byte and then sometimes some stalls when you
> write or read in a different size from a recent write or read.

Don't use orq $0, (%rsp), as this is a high latency RMW insn.

movq $0x0, (%rsp) is fast, but also quite long insn.

push $0x0 increases the stack pointer for 4 or 8 bytes, depending on
target word size. Push insn also updates delta stack pointer, so
update of SP is required (effectively, another ALU operation) if SP is
later referenced from insn other than push/pop/call/ret. There are no
non-word-sized register pushes.

I think that for the purpose of stack probe, we can write a byte to
the end of the stack, so

movb $0x0, (%rsp).

This is relatively short insn, and operates in the same way for 32bit
and 64bit targets. There are no issues with partial memory stalls
since nothing immediately reads a different sized value from the
written location.

Uros.

Reply via email to