> Date: Wed, 7 Oct 2015 11:53:32 +0100
> From: Stuart Henderson <s...@spacehopper.org>
> 
> On 2015/10/07 11:52, Stuart Henderson wrote:
> > monitoring-plugins has a program that checks available space on partitions.
> > Before doing this it does a stat() to check that the requested directory
> > exists and is accessible. In their devel tree they have moved to doing
> > this stat() in a thread - commit log was "don't let check_disk hang on
> > hanging file systems". However this code doesn't work for us.
> > 
> > I've attached a stripped-down test program based on their code that works
> > as expected on Linux but fails on OpenBSD. I can always patch to use the
> > non-pthread code, but I wondered if anyone has an idea what's up and
> > whether the bug is theirs or ours - is pthread_kill(thread, 0) working
> > as expected?
> > 
> > $ make thread LDFLAGS=-lpthread
> > cc -O2 -pipe   -lpthread -o thread thread.c
> > 
> > $ ./thread
> > 4
> > child
> > 3
> > 2
> > 1
> > 0
> > child thread did not return within 5s

My reading of the POSIX standard is that our implementation of
pthread_kill(3) is correct and that the program's expectations are
wrong.

POSIX says that in the "General Information" section on threads:

  The lifetime of a thread ID ends after the thread terminates if it
  was created with the detachstate attribute set to
  PTHREAD_CREATE_DETACHED or if pthread_detach() or pthread_join() has
  been called for that thread.

At the point where the program calls pthread_kill(), pthread_join()
has not been called yet.  So the thread ID is still "alive".

The pthread_exit() page says:

  As in kill(), if sig is zero, error checking shall be performed but
  no signal shall actually be sent.

The only "shall fail" is EINVAL for passing a bogus signal number.
The "informative" RATIONALE section mentions an additional error
condition:

  If an implementation detects use of a thread ID after the end of its
  lifetime, it is recommended that the function should fail and report
  an [ESRCH] error.

But since the thread ID is still "alive", that doesn't apply.

Also note that POSIX explicitly mentions kill() here.  And for kill()
POSIX says (again in the RATIONALE section):

  Existing implementations vary on the result of a kill() with pid
  indicating an inactive process (a terminated process that has not
  been waited for by its parent). Some indicate success on such a call
  (subject to permission checking), while others give an error of
  [ESRCH]. Since the definition of process lifetime in this volume of
  POSIX.1-2008 covers inactive processes, the [ESRCH] error as
  described is inappropriate in this case. In particular, this means
  that an application cannot have a parent process check for
  termination of a particular child with kill(). (Usually this is done
  with the null signal; this can be done reliably with waitpid().)

Which strongly suggests that using pthread_kill(..., 0) on a
non-detached, unjoined thread should not return ESRCH.

> #include <sys/stat.h>
> #include <errno.h>
> #include <stdlib.h>
> #include <pthread.h>
> #include <stdio.h>
> 
> void do_something();
> void *child(void *);
> 
> int
> main(int argc, char **argv)
> {
>       do_something();
> }
> 
> void do_something()
> {
>       pthread_t stat_thread;
>       int done = 0;
>       int timer = 5;
>       struct timespec req, rem;
> 
>       req.tv_sec = 0;
>       pthread_create(&stat_thread, NULL, child, NULL);
>       while (timer-- > 0) {
>               printf("%u\n", timer);
>               req.tv_nsec = 10000000;
>               nanosleep(&req, &rem);
>               if (pthread_kill(stat_thread, 0)) {
>                       done = 1;
>                       break;
>               } else {
>                       printf("e %u\n", errno);
>                       req.tv_nsec = 990000000;
>                       nanosleep(&req, &rem);
>               }
>       }
>       if (done == 1) {
>               pthread_join(stat_thread, NULL);
>       } else {
>               pthread_detach(stat_thread);
>               printf("child thread did not return within 5s\n");
>       }
> }
> 
> void *child(void *in)
> {
>       printf("child\n");
> }

Reply via email to