https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101075

            Bug ID: 101075
           Summary: libatomic's libat_lock_n can deadlock from
                    inconsistent locking order
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgcc
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rprichard at google dot com
  Target Milestone: ---

Created attachment 51019
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51019&action=edit
deadlock-gcc-libatomic.c

libatomic's libat_lock_n[1] maps each 64-byte memory block to a different
mutex. When the function acquires the locks for an atomic access, it starts
with the lock for the first 64-byte span of the access, then the 2nd span, and
so forth. If it hits the end of the table of locks, it wraps around to the
start of the table.

The result is that two concurrent atomic accesses can lock mutexes in an
inconsistent order, then deadlock.

e.g. Suppose we have 4 threads, WATCH_SIZE is 64, NLOCKS is 64, and the threads
are doing atomic accesses of size 4KiB. Each access acquires all 64 locks, but
they start at different indices, so if one thread can acquire a lock while
another thread is still acquiring its locks, the program deadlocks.

I attached a test program reproducing the deadlock.

Aside: I'm not sure why libatomic uses multiple locks for an access. This
question came up before (Bug 66842). I can see the rationale for using only the
page-offset of an address, to handle an object aliased from two virtual pages.
(It seems that LLVM uses only one lock for an access, and it hashes the low 30
bits into its lock index, so I think it won't handle two aliased pages?[2])

[1]
https://github.com/gcc-mirror/gcc/blob/d9f1466f88abef7c814d02ba39a6ea5ef420aaec/libatomic/config/posix/lock.c#L82-L96

[2]
https://github.com/llvm/llvm-project/blob/b8919fb0eac15d13c5f56d3d30ce378a588dd78c/compiler-rt/lib/builtins/atomic.c#L108-L123

Reply via email to