On Wed, May 14, 2025 at 02:07:04PM +0200, Natanael Copa wrote: > [Adding Rich Felker to CC] > > On Tue, 13 May 2025 18:05:50 +0200 > Bruno Haible <br...@clisp.org> wrote: > > > Natanael Copa wrote: > > > > So, you could try to install a different scheduler by default and repeat > > > > the test. > > > > > > It passed with chrt --fifo (I had to do it from outside the LXC > > > container): > > > > > > # time chrt --fifo 10 ./test-pthread-rwlock > > > Starting test_rwlock ... OK > > > real 0m 33.00s > > > user 6m 50.63s > > > sys 0m 16.23s > > > > > > I also verified that it still times out from outside the LXC container > > > with the default: > > > > > > # time ./test-pthread-rwlock > > > Starting test_rwlock ...Command terminated by signal 14 > > > real 10m 0.01s > > > user 1h 46m 24s > > > sys 2m 59.39s > > > > > > > > > # time chrt --rr 10 ./test-pthread-rwlock > > > Starting test_rwlock ... OK > > > real 0m 30.00s > > > user 6m 2.07s > > > sys 0m 19.16s > > > > > > # time chrt --rr 99 ./test-pthread-rwlock > > > Starting test_rwlock ... OK > > > real 0m 30.00s > > > user 6m 9.40s > > > sys 0m 13.37s > > > > > > So even if the CPU cores are slow, they appear to finish in ~30 sec. > > > > > > chrt --other and chrt --idle appears to trigger the deadlock. > > > > For comparison, some other data (below the details): > > > > * On x86_64 (glibc), I see essentially no influence of the scheduling > > policy on 'time ./test-pthread-rwlock'. > > > > * On x86_64 (Alpine Linux), the test performs about 25% faster > > under SCHED_FIFO and SCHED_RR. > > > > * On three other riscv64 systems, the test needs less than 4 seconds > > real time. Even on my QEMU-emulated riscv64 VM, it needs less > > than 4 seconds. > > > > So, it seems that > > 1) Your riscv64 system is generally slower that the cfarm* ones. > > 2) The performance penalty of SCHED_OTHER compared to SCHED_FIFO and > > SCHED_RR exists also on x86_64, but not to such an extreme extent. > > > > AFAICS, there are three differences in your setup compared to what I > > see in stock Ubuntu: > > - Linux is of a PREEMPTY_DYNAMIC flavour. > > - musl libc. > > - the LXR container. > > Note that there are 64 CPU cores. I have only tested with that many cores on > aarch64. > > I don't think LXC container should matter, nor should apps deadlock > when running on PREEMPTY_DYNAMIC. > > I'm not sure what the difference is in codepaths compared to GNU libc. > > I also don't get a timeout on an hifive premier p550 system:
Do you even know yet whether this is a deadlock or just a timeout from taking inordinate time? Watching if the hung process is still executing (e.g. even with strace) would be a good start. If it is deadlocked we really need to look at the deadlocked state in the debugger (what all the threads are blocked waiting on, so at least a backtrace for each thread) to determine where the fault lies. > I suspect the deadlock happens when > > - musl libc systems > - more than 10 cores(?) > - CPU cores are slow(?) > > Not sure the exact codepath it takes on GNU libc systems. Is it the > same as with musl libc? > > > Note: Most Gnulib applications don't use pthread_rwlock directly, but > > the glthread_rwlock facility. On musl systems, it works around the > > possible writer starvation by reimplementing read-write locks based > > on condition variables. This may be slower for a single operation, > > but it is guaranteed to avoid writer starvation and therefore is > > preferrable globally. This is why you don't see a timeout in > > './test-lock', only in './test-pthread-rwlock'. > > Wait a second. The test does not run the gnulib locking? It just tests > the system (musl libc) pthread rwlock, while the app (gettext) would > use the gnulib implementation? > > I though the test verified that production code (gettext in this case) > works as intended. Does this test expose a deadlock that could happen > in gettext in production? > > I'm confused. AFAICT the code is in tests/test-lock.c from the gnulib repo and calls the gl_rwlock_* functions, which should be using the gnulib condvar based implementation. Rich