[Bug libgcc/119796] Atomic Operations Can Deadlock Without Hardware Support

cvs-commit at gcc dot gnu.org via Gcc-bugs Wed, 16 Apr 2025 08:22:44 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796


--- Comment #9 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>:

https://gcc.gnu.org/g:61dfb0747afcece3b7a690807b83b366ff34f329

commit r15-9525-g61dfb0747afcece3b7a690807b83b366ff34f329
Author: Jakub Jelinek <ja...@redhat.com>
Date:   Wed Apr 16 17:21:39 2025 +0200

    libatomic: Fix up libat_{,un}lock_n [PR119796]

    As mentioned in the PR (and I think in PR101075 too), we can run into
    deadlock with libat_lock_n calls with larger n.
    As mentioned in PR66842, we use multiple locks (normally 64 mutexes
    for each 64 byte cache line in 4KiB page) and currently can lock more
    than one lock, in particular for n [0, 64] a single lock, for n [65, 128]
    2 locks, for n [129, 192] 3 locks etc.
    There are two problems with this:
    1) we can deadlock if there is some wrap-around, because the locks are
       acquired always in the order from addr_hash (ptr) up to
       locks[NLOCKS-1].mutex and then if needed from locks[0].mutex onwards;
       so if e.g. 2 threads perform libat_lock_n with n = 2048+64, in one
       case at pointer starting at page boundary and in another case at
       page boundary + 2048 bytes, the first thread can lock the first
       32 mutexes, the second thread can lock the last 32 mutexes and
       then first thread wait for the lock 32 held by second thread and
       second thread wait for the lock 0 held by the first thread;
       fixed below by always locking the locks in order of increasing
       index, if there is a wrap-around, by locking in 2 loops, first
       locking some locks at the start of the array and second at the
       end of it
    2) the number of locks seems to be determined solely depending on the
       n value, I think that is wrong, we don't know the structure alignment
       on the libatomic side, it could very well be 1 byte aligned struct,
       and so how many cachelines are actually (partly or fully) occupied
       by the atomic access depends not just on the size, but also on
       ptr % WATCH_SIZE, e.g. 2 byte structure at address page_boundary+63
       should IMHO lock 2 locks because it occupies the first and second
       cacheline

    Note, before this patch it locked exactly one lock for n = 0, while
    with this patch it could lock either no locks at all (if it is at cacheline
    boundary) or 1 (otherwise).
    Dunno of libatomic APIs can be called for zero sizes and whether
    we actually care that much how many mutexes are locked in that case,
    because one can't actually read/write anything into zero sized memory.
    If you think it is important, I could add else if (nlocks == 0) nlocks = 1;
    in both spots.

    2025-04-16  Jakub Jelinek  <ja...@redhat.com>

            PR libgcc/101075
            PR libgcc/119796
            * config/posix/lock.c (libat_lock_n, libat_unlock_n): Start with
            computing how many locks will be needed and take into account
            ((uintptr_t)ptr % WATCH_SIZE).  If some locks from the end of the
            locks array and others from the start of it will be needed, first
            lock the ones from the start followed by ones from the end.

[Bug libgcc/119796] Atomic Operations Can Deadlock Without Hardware Support

Reply via email to