sigaction and sigaltstack - is detecting stack overflow possible?

Eric Blake Fri, 06 Jun 2008 06:24:26 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Right now, there is a conversation going on in the bug-gnulib list, trying
to determine if XSI supports the ability to distinguish between stack
overflow and programmer error (or even intentional SEGV, such as when
implementing user-space paging on top of mmap).  This is certainly
possible using non-POSIX extensions (for example,
http://www.gnu.org/software/libsigsegv/ uses /proc/self/maps with a
fallback to mincore() on Linux to distinguish between stack overflow and
all other SEGV), but the question is whether we can stick to POSIX
interfaces to accomplish the same thing.


Relevant quotes from the 5.1 draft:

rlimit(RLIMIT_STACK) (line 35628) states:
"If this limit is exceeded, SIGSEGV shall be generated for the thread. If
the thread is blocking SIGSEGV, or the process is ignoring or catching
SIGSEGV and has not made arrangements to use an alternate stack, the
disposition of SIGSEGV shall be set to SIG_DFL before it is generated."

This makes it clear that stack overflow cannot be dealt with by the
program unless it has also used sigaltstack() to install an alternate
stack, as well as sigaction() to install an SA_ONSTACK handler for
SIGSEGV.  And since sigaltstack is XSI, this also makes it clear that
non-XSI systems are out of luck.  Using just this information, it is
sufficient to write an XSI program that can handle stack overflow by
gracefully print an error message and call _exit(), or even using a
siglongjmp() back into the main processing loop.  But without more
information, the program can only assume that all SEGV are stack
overflows; it fails to distinguish between intentional SEGV on user-space
mmap() page faults, as well as any accidental SEGV due to programmer
errors where a core dump would be nicer than a misleading error message
about stack overflow.

sigaction() (line 60919) states that if the signal handler is additionally
registered with SA_SIGINFO:
the handler's "third argument can be cast to a pointer to an object of
type ucontext_t to refer to the receiving thread's context that was
interrupted when the signal was delivered."

With SA_SIGINFO, and using the second argument's si_addr field, an XSI
application can also determine which address caused the SEGV.  But we are
still stuck with the issue of determining whether that address occurs near
the bounds of the primary stack.  Is the above statement intended to
require that the third argument's uc_stack member describes the stack that
was interrupted (the primary stack) or the stack where the handler is
executing (the alternate stack)?  Also, is the uc_link member supposed to
be populated or NULL?  One argument in favor of pointing uc_stack to the
primary stack is that you can still use sigaltstack() to determine details
about the alternate stack, including whether the current signal handler is
executing on the alternate stack (even if it was not registered
SA_ONSTACK, but occurred during the handling of another signal already on
the alternate stack).

But Linux (at least the 2.6.9 kernel that I was testing on) leaves uc_link
NULL and populates the uc_stack member with details on the alternate
stack, making uc_stack worthless for determining if si_addr fell within a
page or so of the main stack.  Is this a bug in the Linux kernel or the
intended behavior of the standard?  Is there anything in POSIX that would
be equivalent to using the non-standard mincore() to determine if the
faulting si_addr lands near the mapped memory region that contains the
primary stack?

Would using raise(SIGSEGV) with a SA_SIGINFO but non-SA_ONSTACK handler
prior to sigaltstack() be sufficient to get the details about the primary
stack (in this instance, it should be possible to handle the SEGV on the
primary stack) in a standard-compliant manner?  Since the primary stack
can automatically grow, those details are likely to be different than the
eventual size of the stack at the time of stack overflow; but assuming we
can even get a uc_stack describing the primary stack, use getrlimit() to
determine how large things can grow, and probe to see the direction of
stack growth, is that enough to safely determine at which address stack
overflow will occur?

As a side note, I noticed that ucontext_t was promoted from XSI to Base as
part of the draft; should we have also changed the signature of the
three-argument handler to use ucontext_t * rather than void * now that all
implementations are required to support ucontext_t?  Meanwhile, I don't
see anything in the draft that describes using ucontext_t->uc_link (other
than its definition on line 11082); in the 2001 edition, this was only
covered in the (now withdrawn) getcontext(), which stated:
"If the uc_link member of the ucontext_t structure pointed to by the ucp
argument is equal to 0, then this context is the main context, and the
thread shall exit when this context returns."
But you can argue that for the handling of a stack overflow SEGV, the
behavior is undefined if the handler does not either _exit or siglongjmp;
therefore, in the case of handling SEGV on the alternate stack, it is not
clear whether the uc_link member needs to be populated, since the context
doesn't ever really return.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             [EMAIL PROTECTED]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkhJOl4ACgkQ84KuGfSFAYBCPACgsRzcyfwA7MB2KKpuyMZnxqGA
5icAninfMoeTivU7weV9mvZoMDiS9DZi
=OzRU
-----END PGP SIGNATURE-----

sigaction and sigaltstack - is detecting stack overflow possible?

Reply via email to