-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Right now, there is a conversation going on in the bug-gnulib list, trying to determine if XSI supports the ability to distinguish between stack overflow and programmer error (or even intentional SEGV, such as when implementing user-space paging on top of mmap). This is certainly possible using non-POSIX extensions (for example, http://www.gnu.org/software/libsigsegv/ uses /proc/self/maps with a fallback to mincore() on Linux to distinguish between stack overflow and all other SEGV), but the question is whether we can stick to POSIX interfaces to accomplish the same thing.
Relevant quotes from the 5.1 draft: rlimit(RLIMIT_STACK) (line 35628) states: "If this limit is exceeded, SIGSEGV shall be generated for the thread. If the thread is blocking SIGSEGV, or the process is ignoring or catching SIGSEGV and has not made arrangements to use an alternate stack, the disposition of SIGSEGV shall be set to SIG_DFL before it is generated." This makes it clear that stack overflow cannot be dealt with by the program unless it has also used sigaltstack() to install an alternate stack, as well as sigaction() to install an SA_ONSTACK handler for SIGSEGV. And since sigaltstack is XSI, this also makes it clear that non-XSI systems are out of luck. Using just this information, it is sufficient to write an XSI program that can handle stack overflow by gracefully print an error message and call _exit(), or even using a siglongjmp() back into the main processing loop. But without more information, the program can only assume that all SEGV are stack overflows; it fails to distinguish between intentional SEGV on user-space mmap() page faults, as well as any accidental SEGV due to programmer errors where a core dump would be nicer than a misleading error message about stack overflow. sigaction() (line 60919) states that if the signal handler is additionally registered with SA_SIGINFO: the handler's "third argument can be cast to a pointer to an object of type ucontext_t to refer to the receiving thread's context that was interrupted when the signal was delivered." With SA_SIGINFO, and using the second argument's si_addr field, an XSI application can also determine which address caused the SEGV. But we are still stuck with the issue of determining whether that address occurs near the bounds of the primary stack. Is the above statement intended to require that the third argument's uc_stack member describes the stack that was interrupted (the primary stack) or the stack where the handler is executing (the alternate stack)? Also, is the uc_link member supposed to be populated or NULL? One argument in favor of pointing uc_stack to the primary stack is that you can still use sigaltstack() to determine details about the alternate stack, including whether the current signal handler is executing on the alternate stack (even if it was not registered SA_ONSTACK, but occurred during the handling of another signal already on the alternate stack). But Linux (at least the 2.6.9 kernel that I was testing on) leaves uc_link NULL and populates the uc_stack member with details on the alternate stack, making uc_stack worthless for determining if si_addr fell within a page or so of the main stack. Is this a bug in the Linux kernel or the intended behavior of the standard? Is there anything in POSIX that would be equivalent to using the non-standard mincore() to determine if the faulting si_addr lands near the mapped memory region that contains the primary stack? Would using raise(SIGSEGV) with a SA_SIGINFO but non-SA_ONSTACK handler prior to sigaltstack() be sufficient to get the details about the primary stack (in this instance, it should be possible to handle the SEGV on the primary stack) in a standard-compliant manner? Since the primary stack can automatically grow, those details are likely to be different than the eventual size of the stack at the time of stack overflow; but assuming we can even get a uc_stack describing the primary stack, use getrlimit() to determine how large things can grow, and probe to see the direction of stack growth, is that enough to safely determine at which address stack overflow will occur? As a side note, I noticed that ucontext_t was promoted from XSI to Base as part of the draft; should we have also changed the signature of the three-argument handler to use ucontext_t * rather than void * now that all implementations are required to support ucontext_t? Meanwhile, I don't see anything in the draft that describes using ucontext_t->uc_link (other than its definition on line 11082); in the 2001 edition, this was only covered in the (now withdrawn) getcontext(), which stated: "If the uc_link member of the ucontext_t structure pointed to by the ucp argument is equal to 0, then this context is the main context, and the thread shall exit when this context returns." But you can argue that for the handling of a stack overflow SEGV, the behavior is undefined if the handler does not either _exit or siglongjmp; therefore, in the case of handling SEGV on the alternate stack, it is not clear whether the uc_link member needs to be populated, since the context doesn't ever really return. - -- Don't work too hard, make some time for fun as well! Eric Blake [EMAIL PROTECTED] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkhJOl4ACgkQ84KuGfSFAYBCPACgsRzcyfwA7MB2KKpuyMZnxqGA 5icAninfMoeTivU7weV9mvZoMDiS9DZi =OzRU -----END PGP SIGNATURE-----