On Tue, Jun 20, 2017 at 10:03 AM, Uros Bizjak <ubiz...@gmail.com> wrote: > On Mon, Jun 19, 2017 at 7:51 PM, Jakub Jelinek <ja...@redhat.com> wrote: >> On Mon, Jun 19, 2017 at 11:45:13AM -0600, Jeff Law wrote: >>> On 06/19/2017 11:29 AM, Jakub Jelinek wrote: >>> > >>> > Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack, >>> > while it is shorter, is it actually faster or as slow as movq $0, (%rsp) >>> > or movl $0, (%esp) ? >>> Florian raised this privately to me as well. THere's a couple issues. >>> >>> 1. Is there a performance penalty/gain for sub-word operations? If not, >>> we can improve things slighly there. Even if it's performance >>> neutral we can probably do better on code size. >> >> CCing Uros and Honza here, I believe there are at least on x86 penalties >> for 2-byte, maybe for 1-byte and then sometimes some stalls when you >> write or read in a different size from a recent write or read. > > Don't use orq $0, (%rsp), as this is a high latency RMW insn.
Well, but _maybe_ it's optimized because oring 0 never changes anything? At least it would be nice if it would only trigger the page-fault side-effect and then not consume other CPU resources. I guess micro-benchmark plus performance counters might tell. > movq $0x0, (%rsp) is fast, but also quite long insn. > > push $0x0 increases the stack pointer for 4 or 8 bytes, depending on > target word size. Push insn also updates delta stack pointer, so > update of SP is required (effectively, another ALU operation) if SP is > later referenced from insn other than push/pop/call/ret. There are no > non-word-sized register pushes. I only suggested push $0x0 because that doesn't leave the window open for the async signal where %rsp points somewhere we didn't probe yet. > I think that for the purpose of stack probe, we can write a byte to > the end of the stack, so > > movb $0x0, (%rsp). > > This is relatively short insn, and operates in the same way for 32bit > and 64bit targets. There are no issues with partial memory stalls > since nothing immediately reads a different sized value from the > written location. > > Uros.