Hi All,

Over the past few weeks, we've been doing some deadlock tracking and figured a 
couple of causes of deadlocks:

1) copy in two directions (user a => user b at the same time as user b => user 
a)
2) re-taking the user namespace lock while still holding a mailbox namelock

The work so far is happening on the 
https://github.com/cyrusimap/cyrus-imapd/pull/5392 pull request, because some 
of the changes clash with it otherwise.

I'll address each of these in turn.

*Case 1: activity between two users:*

We already had user namespace locking in order in mboxlist.c for mailbox 
renames, however it was missing for IMAP copy, and I suspect it might still be 
missing for JMAP.  I fixed the IMAP version at least.

Right now, we sort based on the mboxname of the users' INBOX folders, in 
intname order.  This is not ideal, I'd prefer to sort on their actual username 
so the locks are alphabetical on disk!  I think what I actually want is an API 
that takes two mboxnames or two userids and locks them in the correct order, 
potentially returning a single object which holds both locks.  This object 
could allow its second argument to be NULL or identical, so you don't need to 
do too many checks, just call it to lock all the users you are touching.

*Case 2: re-taking the namespace lock*

This is much more pernicious.  Many years ago I added the mboxname locks.  At 
the time they were on the intname of the mailbox, and you could take them 
before a mailbox even existed in order to safely create it without any race 
conditions.  This was good (modulo the above ordering issues).

Later, we renamed mailboxes to be stored by uniqueid, and the namelock also 
became the uniqueid.  Now it couldn't be created until we had decided on the 
uniqueid, but that's OK because that's a uuid and they don't clash.  So the 
lock was just used for its other purpose, blocking repacking or indexes.or 
unlinking of emails while a slow reader was running (a POP3 or slow IMAP fetch).

Even later, we added a user namespace lock, making it safe to create, rename, 
or delete mailboxes under that user's part of the namespace without race 
conditions.  Indeed, you can't even open and operate on a mailbox without a 
namespacelock - either an EXCLUSIVE lock if doing any writing, or a SHARED lock 
if only reading.

Anyway - there was an invariant we were supposed to be respecting.  You have to 
take the namespace lock first.  And we always do, except there were cases where 
we were still holding that shared mboxname lock and then trying to lock a 
mailbox in the same user to do something else, which created a lock inversion.

I think I've caught and fixed all of them now.  And added a test which will 
assert failure if we try to take any user namespace locks while any mailbox at 
all is open!  So far so good.

*Do we still need the mboxname lock?*

I think, at this stage - we have too many locks.  Writes to the cyrus.index are 
already protected by the user namespace lock.  We can exclusively lock 
cyrus.index while we write as well, but we can ALSO just switch to SHARED lock 
on cyrus.index if we need to hold it for longer and use that to protect against 
repacks, which removes one whole class of locks.  We'd be taking that shared 
lock under the protection of the user namespace lock, and we would only (and 
explicitly!) take it where we currently mailbox_unlock_index.

So that's my plan.  Maybe after this current set of changes lands - to remove 
that whole class of locks and just lock on the cyrus.index file.

Cheers,

Bron.

--
  Bron Gondwana, CEO, Fastmail Pty Ltd / Fastmail US LLC
  br...@fastmailteam.com


------------------------------------------
Cyrus: Devel
Permalink: 
https://cyrus.topicbox.com/groups/devel/Tdedc3c55380044c9-M4f2fe5b433694fdb4b161faf
Delivery options: https://cyrus.topicbox.com/groups/devel/subscription

Reply via email to