pthread_getspecific is too slow

John Carr Wed, 25 Sep 2013 10:13:45 -0700

Problem in a nutshell: pthread_getspecific serializes execution of
threads using it.  Without a better implementation my program doesn't
work on OpenBSD.


I am trying to port Cilk+ to OpenBSD (5.3).

Cilk+ is a multithreaded extension to C/C++.  It adds some bookkeeping
operations in the prologue of some functions.  In many Cilk programs
most function calls do this bookkeeping (dynamic count, not static).

The per-call bookkeeping calls pthread_getspecific.  pthread_getspecific
takes out a spinlock.  The lock is apparently needed in case of a race
with pthread_key_delete.  This is unlikely to happen, but I suppose it
is possible.  Every function call in this multithreaded program
serializes waiting on the lock.  Also, the cache line with the lock is
constantly moving between processors.

This is worse than useless for Cilk.  You're much better off with a
single threaded program.

An older version of Cilk used a thread local storage class (__thread).
If memory serves, the switch to pthread_getspecific was driven by a few
considerations:

1. Thread local variables don't get along well with shared libraries.

2. Thread local variables are less portable.  OpenBSD doesn't really
support them, for example. They are emulated with pthread_getspecific.

3. On Linux/x86 pthread_getspecific is very fast, essentially a move
instruction with a segment override.

It seems to me the implementation of pthread_getspecific doesn't need to
be as slow as it is.

It ought to be possible to have multiple readers be always nonblocking
as long as the key list doesn't change, and possibly even if it does
change.  pthread_getspecific only needs a read lock rather than a mutex.
The rwlock in librthread starts with a spinlock, so it's not the answer.

Any thoughts?

pthread_getspecific is too slow

Reply via email to