> Date: Wed, 7 Oct 2015 11:53:32 +0100 > From: Stuart Henderson <s...@spacehopper.org> > > On 2015/10/07 11:52, Stuart Henderson wrote: > > monitoring-plugins has a program that checks available space on partitions. > > Before doing this it does a stat() to check that the requested directory > > exists and is accessible. In their devel tree they have moved to doing > > this stat() in a thread - commit log was "don't let check_disk hang on > > hanging file systems". However this code doesn't work for us. > > > > I've attached a stripped-down test program based on their code that works > > as expected on Linux but fails on OpenBSD. I can always patch to use the > > non-pthread code, but I wondered if anyone has an idea what's up and > > whether the bug is theirs or ours - is pthread_kill(thread, 0) working > > as expected? > > > > $ make thread LDFLAGS=-lpthread > > cc -O2 -pipe -lpthread -o thread thread.c > > > > $ ./thread > > 4 > > child > > 3 > > 2 > > 1 > > 0 > > child thread did not return within 5s
My reading of the POSIX standard is that our implementation of pthread_kill(3) is correct and that the program's expectations are wrong. POSIX says that in the "General Information" section on threads: The lifetime of a thread ID ends after the thread terminates if it was created with the detachstate attribute set to PTHREAD_CREATE_DETACHED or if pthread_detach() or pthread_join() has been called for that thread. At the point where the program calls pthread_kill(), pthread_join() has not been called yet. So the thread ID is still "alive". The pthread_exit() page says: As in kill(), if sig is zero, error checking shall be performed but no signal shall actually be sent. The only "shall fail" is EINVAL for passing a bogus signal number. The "informative" RATIONALE section mentions an additional error condition: If an implementation detects use of a thread ID after the end of its lifetime, it is recommended that the function should fail and report an [ESRCH] error. But since the thread ID is still "alive", that doesn't apply. Also note that POSIX explicitly mentions kill() here. And for kill() POSIX says (again in the RATIONALE section): Existing implementations vary on the result of a kill() with pid indicating an inactive process (a terminated process that has not been waited for by its parent). Some indicate success on such a call (subject to permission checking), while others give an error of [ESRCH]. Since the definition of process lifetime in this volume of POSIX.1-2008 covers inactive processes, the [ESRCH] error as described is inappropriate in this case. In particular, this means that an application cannot have a parent process check for termination of a particular child with kill(). (Usually this is done with the null signal; this can be done reliably with waitpid().) Which strongly suggests that using pthread_kill(..., 0) on a non-detached, unjoined thread should not return ESRCH. > #include <sys/stat.h> > #include <errno.h> > #include <stdlib.h> > #include <pthread.h> > #include <stdio.h> > > void do_something(); > void *child(void *); > > int > main(int argc, char **argv) > { > do_something(); > } > > void do_something() > { > pthread_t stat_thread; > int done = 0; > int timer = 5; > struct timespec req, rem; > > req.tv_sec = 0; > pthread_create(&stat_thread, NULL, child, NULL); > while (timer-- > 0) { > printf("%u\n", timer); > req.tv_nsec = 10000000; > nanosleep(&req, &rem); > if (pthread_kill(stat_thread, 0)) { > done = 1; > break; > } else { > printf("e %u\n", errno); > req.tv_nsec = 990000000; > nanosleep(&req, &rem); > } > } > if (done == 1) { > pthread_join(stat_thread, NULL); > } else { > pthread_detach(stat_thread); > printf("child thread did not return within 5s\n"); > } > } > > void *child(void *in) > { > printf("child\n"); > }