Hi, A while ago I implemented searching emails with Solr for my IMAP server (www.dovecot.org). Seems to work ok, but now I'm having a bit of trouble trying to figure out how to implement searching from multiple mailboxes efficiently. Would be great if someone had suggestions how to do things better.
The main problem is that before doing the search, I first have to check if there are any unindexed messages and then add them to Solr. This is done using a query like: - fl=uid - rows=1 - sort=uid desc - q=uidv:<uidvalidity> box:<mailbox> user:<user> So it returns the highest IMAP UID field (which is an always-ascending integer) for the given mailbox (you can ignore the uidvalidity). I can then add all messages with higher UIDs to Solr before doing the actual search. When searching multiple mailboxes the above query would have to be sent to every mailbox separately. That really doesn't seem like the best solution, especially when there are a lot of mailboxes. But I don't think Solr has a way to return "highest uid field for each box:<mailbox>"? Is that above query even efficient for a single mailbox? I did consider using separate documents for storing the highest UID for each mailbox, but that causes annoying desynchronization possibilities. Especially because currently I can just keep sending documents to Solr without locking and let it drop duplicates automatically (should be rare). With per-mailbox highest-uid documents I can't really see a way to do this without locking or allowing duplicate fields to be added and later some garbage collection deleting all but the one highest value (annoyingly complex). I could of course also keep track of what's indexed on Dovecot's side, but that could also lead to desynchronization issues and I'd like to avoid them. I guess the ideal solution would be if it was somehow possible to create a SQL-like trigger that updates the per-mailbox highest-uid document whenever adding a new document with a higher UID value.
signature.asc
Description: This is a digitally signed message part