https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796
--- Comment #9 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>: https://gcc.gnu.org/g:61dfb0747afcece3b7a690807b83b366ff34f329 commit r15-9525-g61dfb0747afcece3b7a690807b83b366ff34f329 Author: Jakub Jelinek <ja...@redhat.com> Date: Wed Apr 16 17:21:39 2025 +0200 libatomic: Fix up libat_{,un}lock_n [PR119796] As mentioned in the PR (and I think in PR101075 too), we can run into deadlock with libat_lock_n calls with larger n. As mentioned in PR66842, we use multiple locks (normally 64 mutexes for each 64 byte cache line in 4KiB page) and currently can lock more than one lock, in particular for n [0, 64] a single lock, for n [65, 128] 2 locks, for n [129, 192] 3 locks etc. There are two problems with this: 1) we can deadlock if there is some wrap-around, because the locks are acquired always in the order from addr_hash (ptr) up to locks[NLOCKS-1].mutex and then if needed from locks[0].mutex onwards; so if e.g. 2 threads perform libat_lock_n with n = 2048+64, in one case at pointer starting at page boundary and in another case at page boundary + 2048 bytes, the first thread can lock the first 32 mutexes, the second thread can lock the last 32 mutexes and then first thread wait for the lock 32 held by second thread and second thread wait for the lock 0 held by the first thread; fixed below by always locking the locks in order of increasing index, if there is a wrap-around, by locking in 2 loops, first locking some locks at the start of the array and second at the end of it 2) the number of locks seems to be determined solely depending on the n value, I think that is wrong, we don't know the structure alignment on the libatomic side, it could very well be 1 byte aligned struct, and so how many cachelines are actually (partly or fully) occupied by the atomic access depends not just on the size, but also on ptr % WATCH_SIZE, e.g. 2 byte structure at address page_boundary+63 should IMHO lock 2 locks because it occupies the first and second cacheline Note, before this patch it locked exactly one lock for n = 0, while with this patch it could lock either no locks at all (if it is at cacheline boundary) or 1 (otherwise). Dunno of libatomic APIs can be called for zero sizes and whether we actually care that much how many mutexes are locked in that case, because one can't actually read/write anything into zero sized memory. If you think it is important, I could add else if (nlocks == 0) nlocks = 1; in both spots. 2025-04-16 Jakub Jelinek <ja...@redhat.com> PR libgcc/101075 PR libgcc/119796 * config/posix/lock.c (libat_lock_n, libat_unlock_n): Start with computing how many locks will be needed and take into account ((uintptr_t)ptr % WATCH_SIZE). If some locks from the end of the locks array and others from the start of it will be needed, first lock the ones from the start followed by ones from the end.