[Adding Rich Felker to CC]

On Tue, 13 May 2025 18:05:50 +0200
Bruno Haible <br...@clisp.org> wrote:

> Natanael Copa wrote:
> > > So, you could try to install a different scheduler by default and repeat
> > > the test.  
> > 
> > It passed with chrt --fifo (I had to do it from outside the LXC container):
> > 
> > # time chrt --fifo 10 ./test-pthread-rwlock
> > Starting test_rwlock ... OK
> > real        0m 33.00s
> > user        6m 50.63s
> > sys 0m 16.23s
> > 
> > I also verified that it still times out from outside the LXC container with 
> > the default:
> > 
> > # time ./test-pthread-rwlock
> > Starting test_rwlock ...Command terminated by signal 14
> > real        10m 0.01s
> > user        1h 46m 24s
> > sys 2m 59.39s
> > 
> > 
> > # time chrt --rr 10 ./test-pthread-rwlock
> > Starting test_rwlock ... OK
> > real        0m 30.00s
> > user        6m 2.07s
> > sys 0m 19.16s
> > 
> > # time chrt --rr 99 ./test-pthread-rwlock
> > Starting test_rwlock ... OK
> > real        0m 30.00s
> > user        6m 9.40s
> > sys 0m 13.37s
> > 
> > So even if the CPU cores are slow, they appear to finish in ~30 sec.
> > 
> > chrt --other and chrt --idle appears to trigger the deadlock.  
> 
> For comparison, some other data (below the details):
> 
> * On x86_64 (glibc), I see essentially no influence of the scheduling
>   policy on 'time ./test-pthread-rwlock'.
> 
> * On x86_64 (Alpine Linux), the test performs about 25% faster
>   under SCHED_FIFO and SCHED_RR.
> 
> * On three other riscv64 systems, the test needs less than 4 seconds
>   real time. Even on my QEMU-emulated riscv64 VM, it needs less
>   than 4 seconds.
> 
> So, it seems that
>   1) Your riscv64 system is generally slower that the cfarm* ones.
>   2) The performance penalty of SCHED_OTHER compared to SCHED_FIFO and
>      SCHED_RR exists also on x86_64, but not to such an extreme extent.
> 
> AFAICS, there are three differences in your setup compared to what I
> see in stock Ubuntu:
>   - Linux is of a PREEMPTY_DYNAMIC flavour.
>   - musl libc.
>   - the LXR container.

Note that there are 64 CPU cores. I have only tested with that many cores on 
aarch64.

I don't think LXC container should matter, nor should apps deadlock
when running on PREEMPTY_DYNAMIC.

I'm not sure what the difference is in codepaths compared to GNU libc.

I also don't get a timeout on an hifive premier p550 system:

alpine-p550:~/aports/main/gettext/src/gettext-0.24.1/gettext-tools/gnulib-tests$
 time ./test-pthread-rwlock
Starting test_rwlock ... OK
real    0m 1.02s
user    0m 1.24s
sys     0m 2.80s


This has only 4 CPU cores though. Each core is faster than the cores on
sophgo system, but I don't think it is more than 2x faster per core.

$ uname -a
Linux alpine-p550 6.6.67-0-p550 #1-Alpine SMP PREEMPT_DYNAMIC 2024-12-28 
06:23:29 riscv64 GNU/Linux
alpine-p550:~/aports/main/gettext/src/gettext-0.24.1/gettext-tools/gnulib-tests$
 cat /etc/os-release 
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.22.0_alpha20250108
PRETTY_NAME="Alpine Linux edge"
HOME_URL="https://alpinelinux.org/";
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues";



And on banana pi f3 (with 8 cores):
alpine-bpi-f3:/var/home/ncopa/aports/main/gettext/src/gettext-0.24.1/gettext-tools/gnulib-tests$
 lscpu
Architecture:           riscv64
  Byte Order:           Little Endian
CPU(s):                 8
  On-line CPU(s) list:  0-7
Vendor ID:              0x710
  Model name:           Spacemit(R) X60
    CPU family:         0x8000000058000001
    Model:              0x1000000049772200
    Thread(s) per core: 1
    Core(s) per socket: 8
    Socket(s):          1
    CPU(s) scaling MHz: 100%
    CPU max MHz:        1600.0000
    CPU min MHz:        614.4000
Caches (sum of all):    
  L1d:                  256 KiB (8 instances)
  L1i:                  256 KiB (8 instances)
  L2:                   1 MiB (2 instances)
alpine-bpi-f3:/var/home/ncopa/aports/main/gettext/src/gettext-0.24.1/gettext-tools/gnulib-tests$
 time ./test-pthread-rwlock
Starting test_rwlock ... OK
real    0m 5.36s
user    0m 17.25s
sys     0m 17.60s




I suspect the deadlock happens when

- musl libc systems
- more than 10 cores(?)
- CPU cores are slow(?)

Not sure the exact codepath it takes on GNU libc systems. Is it the
same as with musl libc?

> Note: Most Gnulib applications don't use pthread_rwlock directly, but
> the glthread_rwlock facility. On musl systems, it works around the
> possible writer starvation by reimplementing read-write locks based
> on condition variables. This may be slower for a single operation,
> but it is guaranteed to avoid writer starvation and therefore is
> preferrable globally. This is why you don't see a timeout in
> './test-lock', only in './test-pthread-rwlock'.

Wait a second. The test does not run the gnulib locking? It just tests
the system (musl libc) pthread rwlock, while the app (gettext) would
use the gnulib implementation?

I though the test verified that production code (gettext in this case)
works as intended. Does this test expose a deadlock that could happen
in gettext in production?

I'm confused.

-nc

> 
> Bruno
> 
> ========================= DETAILS =======================
> 
> Ubuntu x86_64 (glibc):
> 
> # time ./test-pthread-rwlock
> Starting test_rwlock ... OK
> 
> real    0m0,161s
> user    0m0,268s
> sys     0m1,047s
> 
> # time chrt --other 0 ./test-pthread-rwlock
> Starting test_rwlock ... OK
> 
> real    0m0,166s
> user    0m0,243s
> sys     0m1,046s
> # time chrt --fifo 10 ./test-pthread-rwlock
> Starting test_rwlock ... OK
> 
> real    0m0,161s
> user    0m0,249s
> sys     0m1,080s
> # time chrt --rr 10 ./test-pthread-rwlock
> Starting test_rwlock ... OK
> 
> real    0m0,164s
> user    0m0,307s
> sys     0m1,019s
> # time chrt --batch 0 ./test-pthread-rwlock
> Starting test_rwlock ... OK
> 
> real    0m0,151s
> user    0m0,217s
> sys     0m1,024s
> # time chrt --idle 0 ./test-pthread-rwlock
> Starting test_rwlock ... OK
> 
> real    0m0,195s
> user    0m0,264s
> sys     0m1,115s
> # time chrt --deadline 0 ./test-pthread-rwlock
> chrt: failed to set pid 0's policy: Invalid argument
> 
> 
> Alpine Linux 3.20 x86_64 in a VM:
> 
> 
> # time ./test-pthread-rwlock
> Starting test_rwlock ... OK
> 
> real    1.54 s
> 
> # time chrt --other 0 ./test-pthread-rwlock
> Starting test_rwlock ... OK
> 
> real    1.56 s
> # time chrt --fifo 10 ./test-pthread-rwlock
> Starting test_rwlock ... OK
> 
> real    1.24 s
> # time chrt --rr 10 ./test-pthread-rwlock
> Starting test_rwlock ... OK
> 
> real    1.25 s
> # time chrt --batch 0 ./test-pthread-rwlock
> Starting test_rwlock ... OK
> 
> real    1.59 s
> # time chrt --idle 0 ./test-pthread-rwlock
> Starting test_rwlock ... OK
> 
> real    1.59 s
> # time chrt --deadline 0 ./test-pthread-rwlock
> chrt: failed to set pid 0's policy: Invalid argument
> 
> 
> cfarm91 (glibc):
> 
> $ time ./test-pthread-rwlock
> Starting test_rwlock ... OK
> 
> real    0m3,908s
> user    0m0,909s
> sys     0m6,863s
> 
> 
> cfarm94 (Alpine Linux):
> 
> $ time ./test-pthread-rwlock
> Starting test_rwlock ... OK
> real    0m 0.84s
> user    0m 0.60s
> sys     0m 2.77s
> 
> 
> cfarm95 (glibc):
> 
> $ time ./test-pthread-rwlock
> Starting test_rwlock ... OK
> 
> real    0m2,166s
> user    0m9,287s
> sys     0m3,145s
> 
> 
> 


Reply via email to