Paul Eggert wrote:
> >    1. Install a toolchain for creating binaries linked against musl-libc:
> >       $ sudo apt install musl-gcc
> >    2. Create a testdir for the module 'pthread-rwlock'.
> >    3. Build it with CC="gcc". You should still see the test failure after
> >       10 minutes.
> >    4. Build it with CC="musl-gcc". Run
> >       $ time gltests/test-pthread-rwlock
> > 
> > If 4. takes more than 30 seconds, your OS is probably running on top of
> > a hypervisor, and that is causing the problem.
> > 
> > If 4. takes just a few seconds, it looks like a glibc bug.
> 
> 4. takes just a few seconds on my platform.

Thanks. So, it looks like a glibc bug.

I have investigated this from three different angles.

1) In maint-tools/ I have added a directory test-programs/pthread-rwlock/
   with a test program that analyzes how an implementation of POSIX rwlocks
   handles the wait queue.

   This is based on the same idea as m4/pthread_rwlock_rdlock.m4, but where
   that extracts only a single bit of information about one particular
   situation, the new test program exercises many situations, analyzes them,
   and provides a textual summary.

   Find here attached the results of this analysis.

   glibc ≥ 2.25 with the default pthread_rwlock_t initializer is particular:
   It is the *only* implementation for which the result contains the statement
     "This implementation always prefers readers."

2) I have redone the timings of most platforms, each in a VirtualBox VM with
   paravirtualization set to "minimal" (so as to avoid the bug mentioned in
   <https://lists.gnu.org/archive/html/bug-gnulib/2024-06/msg00318.html>).

   Find here attached the timings.

   While there is sometimes a big difference between the columns "1 CPU" and
   "2 CPUs", it can be attributed to single-thread optimizations. Therefore,
   what we need to look at are the columns "2 CPUs", "4 CPUs", "8 CPUs".
   While some platforms (Cygwin, Solaris 11, gnulib on mingw) show a moderate
   increase of the times in this progression, the result is:
   glibc ≥ 2.25 with the default pthread_rwlock_t initializer is particular:
   It is the *only* implementation for which the times in this progression
   grow by a factor of more than 10.

So far, there is a correlation between the analysis results and the timings.
Is this correlation a causal relation?

3) To investigate this question, I modified the gnulib implementation on mingw
   to prefer readers. Namely, in lib/windows-rwlock.c, lib/windows-timedrwlock.c
   in function glwthread_[timed]rwlock_unlock I changed the line

      if (lock->waiting_writers.count > 0)

   to

      if (lock->waiting_writers.count > 0 && lock->waiting_readers.count == 0)

   As expected, the analysis results are:

     "This implementation always prefers readers."

   And the timings are:

     gnulib on mingw, modified to prefer readers
               1 CPU   2 CPUs  4 CPUs  8 CPUs
               0.787   2.962 385.999   > 7200

   That is, a steep increase in running time from 2 CPUs to 8 CPUs.

This provides enough justification for gnulib to override the glibc behaviour
here. I will therefore go ahead and override
  PTHREAD_RWLOCK_INITIALIZER
  pthread_rwlock_init
  pthread_rwlock_attr_init
to use PREFER_WRITER or PREFER_WRITER_NONRECURSIVE, which both behave in a
sane way even in glibc ≥ 2.25.

We knew about the problem since 2018
<https://lists.gnu.org/archive/html/bug-gnulib/2018-06/msg00026.html>,
but had not realized the impact that it has on systems with 8 or more CPUs.
Thanks for your report, Paul!

Bruno

How do the implementations work?

glibc/Linux <2.25 default, glibc/Linux PREFER_WRITER, AIX, Android
  When releasing the last reader lock:
    If at least one of the enqueued lock attempts is for reading, the
    first one of them is granted.
    Otherwise, the first of the waiting write attempts is granted.
  When releasing a writer lock:
    If at least one of the enqueued lock attempts is for writing, the
    first one of them is granted.
    Otherwise, one of the waiting read attempts is granted.
  This implementation does not globally prefer readers, only when releasing
  a reader lock.
  This implementation does not globally prefer writers, only when releasing
  a writer lock.

glibc/Linux PREFER_WRITER_NONRECURSIVE, FreeBSD, Cygwin
  When releasing the last reader lock:
    The first of the enqueued lock attempts is granted.
  When releasing a writer lock:
    If at least one of the enqueued lock attempts is for writing, the
    first one of them is granted.
    Otherwise, one of the waiting read attempts is granted.
  This implementation does not prefer readers.
  This implementation does not globally prefer writers, only when releasing
  a writer lock.

glibc/Linux ≥2.25 default
  When releasing the last reader lock:
    If at least one of the enqueued lock attempts is for reading, the
    first one of them is granted.
    Otherwise, the first of the waiting write attempts is granted.
  When releasing a writer lock:
    If at least one of the enqueued lock attempts is for reading, one of
    them is granted.
    Otherwise, the first of the waiting write attempts is granted.
  This implementation always prefers readers.
  This implementation does not prefer writers.

glibc/Hurd
  When releasing the last reader lock:
    If at least one of the enqueued lock attempts is for reading, the
    first one of them is granted.
    Otherwise, the latest (LIFO!) waiting write attempt is granted.
  When releasing a writer lock:
    If at least one of the enqueued lock attempts is for writing, the
    latest (LIFO!) one of them is granted.
    Otherwise, the latest (LIFO!) of the waiting read attempts is granted.
  This implementation does not globally prefer readers, only when releasing
  a reader lock.
  This implementation does not globally prefer writers, only when releasing
  a writer lock.
  This implementation is deterministic.

musl, OpenBSD
  When releasing the last reader lock:
    If at least one of the enqueued lock attempts is for reading, the
    first one of them is granted.
    Otherwise, the first of the waiting write attempts is granted.
  When releasing a writer lock:
    The first of the enqueued lock attempts is granted.
  This implementation does not globally prefer readers, only when releasing
  a reader lock.
  This implementation does not prefer writers.
  This implementation is deterministic.

macOS
  When releasing the last reader lock:
    The first of the enqueued lock attempts is granted.
  When releasing a writer lock:
    If at least one of the enqueued lock attempts is for writing, the
    first of them is granted.
    Otherwise, one of the waiting read attempts is granted.
  This implementation does not prefer readers.
  This implementation does not prefer writers.

NetBSD
  When releasing the last reader lock:
    The first of the enqueued lock attempts is granted.
  When releasing a writer lock:
    If at least one of the enqueued lock attempts is for writing, the
    first one of them is granted.
    Otherwise, the latest (LIFO!) of the waiting read attempts is granted.
  This implementation does not prefer readers.
  This implementation does not globally prefer writers, only when releasing
  a writer lock.
  This implementation is deterministic.

Solaris 10
  When releasing the last reader lock:
    If at least one of the enqueued lock attempts is for writing, the
    latest (LIFO!) one of them is granted.
  When releasing a writer lock:
    If at least one of the enqueued lock attempts is for writing, one of
    the waiting write attempts is granted (not necessarily the first one).
    Otherwise, one of the waiting read attempts is granted.
  This implementation does not prefer readers.
  This implementation always prefers writers.

Solaris 11
  When releasing the last reader lock:
    If at least one of the enqueued lock attempts is for writing, one
    of them is granted.
  When releasing a writer lock:
    If at least one of the enqueued lock attempts is for writing, one of
    the waiting write attempts is granted (not necessarily the first one).
    Otherwise, one of the waiting read attempts is granted.
  This implementation does not prefer readers.
  This implementation always prefers writers.

gnulib on native Windows
  When releasing the last reader lock:
    The first of the enqueued lock attempts is granted.
  When releasing a writer lock:
    If at least one of the enqueued lock attempts is for writing, the
    first one of them is granted.
    Otherwise, the first of the waiting read attempts is granted.
  This implementation does not prefer readers.
  This implementation does not globally prefer writers, only when releasing
  a writer lock.
  This implementation is deterministic.

gnulib on native Windows, modified to prefer readers
  When releasing the last reader lock:
    If at least one of the enqueued lock attempts is for reading, the
    first one of them is granted.
    Otherwise, the first of the waiting write attempts is granted.
  When releasing a writer lock:
    If at least one of the enqueued lock attempts is for reading, the
    first of them is granted.
    Otherwise, the first of the waiting write attempts is granted.
  This implementation always prefers readers.
  This implementation does not prefer writers.
  This implementation is deterministic.
Timings of test-pthread-rwlock, created from
  ./gnulib-tool --create-testdir --dir=../testdir1 --single-configure 
pthread-rwlock
in a VirtualBox VM (minimal paravirtualization) with 1, 2, 4, 8 CPUs.
Each value is the average of the real time of 3 runs of
"time ./test-pthread-rwlock", in seconds.


glibc/Linux <2.25 default (Debian 9.6)
          1 CPU   2 CPUs  4 CPUs  8 CPUs
          0.712   0.884   0.589   0.970

glibc/Linux PREFER_WRITER (Ubuntu 22.04)
          1 CPU   2 CPUs  4 CPUs  8 CPUs
          0.983   2.009   1.238   1.683


glibc/Linux PREFER_WRITER_NONRECURSIVE (Ubuntu 22.04)
          1 CPU   2 CPUs  4 CPUs  8 CPUs
          1.022   2.289   1.141   0.868

FreeBSD (FreeBSD 14.0)
          1 CPU   2 CPUs  4 CPUs  8 CPUs
          1.62    1.40    1.14    1.33

Cygwin (Cygwin 2.9.0)
          1 CPU   2 CPUs  4 CPUs  8 CPUs
          1.094  18.683  36.945  43.999


glibc/Linux ≥2.25 default (Ubuntu 22.04)
          1 CPU   2 CPUs  4 CPUs  8 CPUs
          1.032   1.998   3.515  67.596


glibc/Hurd
          1 CPU   2 CPUs  4 CPUs  8 CPUs
          0.433   0.520   0.713   0.630


musl (Alpine Linux 3.20)
          1 CPU   2 CPUs  4 CPUs  8 CPUs
          1.633   2.413   1.390   0.843

OpenBSD (OpenBSD 7.5)
          1 CPU   2 CPUs  4 CPUs  8 CPUs
          0.833   2.837   2.757   2.853


NetBSD (NetBSD 10.0)
          1 CPU   2 CPUs  4 CPUs  8 CPUs
          6.600   7.297   6.993   7.967


Solaris 11 (Solaris 11.4)
          1 CPU   2 CPUs  4 CPUs  8 CPUs
          1.997  19.697  22.924  28.039


gnulib on mingw
          1 CPU   2 CPUs  4 CPUs  8 CPUs
          0.763   9.769  15.542  15.954


gnulib on mingw, modified to prefer readers
          1 CPU   2 CPUs  4 CPUs  8 CPUs
          0.787   2.962 385.999   > 7200

Reply via email to