https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101075
Bug ID: 101075 Summary: libatomic's libat_lock_n can deadlock from inconsistent locking order Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgcc Assignee: unassigned at gcc dot gnu.org Reporter: rprichard at google dot com Target Milestone: --- Created attachment 51019 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51019&action=edit deadlock-gcc-libatomic.c libatomic's libat_lock_n[1] maps each 64-byte memory block to a different mutex. When the function acquires the locks for an atomic access, it starts with the lock for the first 64-byte span of the access, then the 2nd span, and so forth. If it hits the end of the table of locks, it wraps around to the start of the table. The result is that two concurrent atomic accesses can lock mutexes in an inconsistent order, then deadlock. e.g. Suppose we have 4 threads, WATCH_SIZE is 64, NLOCKS is 64, and the threads are doing atomic accesses of size 4KiB. Each access acquires all 64 locks, but they start at different indices, so if one thread can acquire a lock while another thread is still acquiring its locks, the program deadlocks. I attached a test program reproducing the deadlock. Aside: I'm not sure why libatomic uses multiple locks for an access. This question came up before (Bug 66842). I can see the rationale for using only the page-offset of an address, to handle an object aliased from two virtual pages. (It seems that LLVM uses only one lock for an access, and it hashes the low 30 bits into its lock index, so I think it won't handle two aliased pages?[2]) [1] https://github.com/gcc-mirror/gcc/blob/d9f1466f88abef7c814d02ba39a6ea5ef420aaec/libatomic/config/posix/lock.c#L82-L96 [2] https://github.com/llvm/llvm-project/blob/b8919fb0eac15d13c5f56d3d30ce378a588dd78c/compiler-rt/lib/builtins/atomic.c#L108-L123