Hi All, Over the past few weeks, we've been doing some deadlock tracking and figured a couple of causes of deadlocks:
1) copy in two directions (user a => user b at the same time as user b => user a) 2) re-taking the user namespace lock while still holding a mailbox namelock The work so far is happening on the https://github.com/cyrusimap/cyrus-imapd/pull/5392 pull request, because some of the changes clash with it otherwise. I'll address each of these in turn. *Case 1: activity between two users:* We already had user namespace locking in order in mboxlist.c for mailbox renames, however it was missing for IMAP copy, and I suspect it might still be missing for JMAP. I fixed the IMAP version at least. Right now, we sort based on the mboxname of the users' INBOX folders, in intname order. This is not ideal, I'd prefer to sort on their actual username so the locks are alphabetical on disk! I think what I actually want is an API that takes two mboxnames or two userids and locks them in the correct order, potentially returning a single object which holds both locks. This object could allow its second argument to be NULL or identical, so you don't need to do too many checks, just call it to lock all the users you are touching. *Case 2: re-taking the namespace lock* This is much more pernicious. Many years ago I added the mboxname locks. At the time they were on the intname of the mailbox, and you could take them before a mailbox even existed in order to safely create it without any race conditions. This was good (modulo the above ordering issues). Later, we renamed mailboxes to be stored by uniqueid, and the namelock also became the uniqueid. Now it couldn't be created until we had decided on the uniqueid, but that's OK because that's a uuid and they don't clash. So the lock was just used for its other purpose, blocking repacking or indexes.or unlinking of emails while a slow reader was running (a POP3 or slow IMAP fetch). Even later, we added a user namespace lock, making it safe to create, rename, or delete mailboxes under that user's part of the namespace without race conditions. Indeed, you can't even open and operate on a mailbox without a namespacelock - either an EXCLUSIVE lock if doing any writing, or a SHARED lock if only reading. Anyway - there was an invariant we were supposed to be respecting. You have to take the namespace lock first. And we always do, except there were cases where we were still holding that shared mboxname lock and then trying to lock a mailbox in the same user to do something else, which created a lock inversion. I think I've caught and fixed all of them now. And added a test which will assert failure if we try to take any user namespace locks while any mailbox at all is open! So far so good. *Do we still need the mboxname lock?* I think, at this stage - we have too many locks. Writes to the cyrus.index are already protected by the user namespace lock. We can exclusively lock cyrus.index while we write as well, but we can ALSO just switch to SHARED lock on cyrus.index if we need to hold it for longer and use that to protect against repacks, which removes one whole class of locks. We'd be taking that shared lock under the protection of the user namespace lock, and we would only (and explicitly!) take it where we currently mailbox_unlock_index. So that's my plan. Maybe after this current set of changes lands - to remove that whole class of locks and just lock on the cyrus.index file. Cheers, Bron. -- Bron Gondwana, CEO, Fastmail Pty Ltd / Fastmail US LLC br...@fastmailteam.com ------------------------------------------ Cyrus: Devel Permalink: https://cyrus.topicbox.com/groups/devel/Tdedc3c55380044c9-M4f2fe5b433694fdb4b161faf Delivery options: https://cyrus.topicbox.com/groups/devel/subscription