On Tue, 2006-05-16 at 08:42 +0200, Paul J Stevens wrote:
> The body[]<> (partial) message retrieval will always require message
> retrieval by nature.

No it wouldn't. Unless you really mean BODY[] and not things like
BODY[1] etc.

Split the message on its mime boundaries on insert. We'd then need a
separate table for keeping the BODYSTRUCTURE, but it'd mean decoding
would only occur once.

I posted some notes about this earlier.

Since all parts come in order, what we'd include is a "offset" for each
block. That is, the blocks table would include their "base offset". We'd
simply search for the MAX() offset lower than our starting position, and
entries with offset<MIN() the offset lower than our ending position.

If we arrange the offset as TWO values (one for the body part, and one
for BODY[]) then we can accelerate both extractions.


> The last _can_ be optimized but will require more extensive changes
> and a new daemon.

Re: <URL:http://www.dbmail.org/dokuwiki/doku.php?id=bodysearch>

Some points about this:

1. Indexing everything is slow. ZOE is pretty fast, but it can't seem to
do much better than 2 messages per second.

2. Thunderbird and Outlook do client-side filtering. Lots of IMAP
clients do (esp. Evolution). This means that messages showing up
initially are likely going to be deleted or copied [naively]
immediately. It also means that reindexing needs to be handled.

3. People seem fairly divided; some use their INBOX for everything, and
others keep it as empty as possible. Those that keep their INBOX for
everything tend to have some mental cutoff point where they stop
considering it current.

4. I can only search for things I know about. Usually this means I've
seen it, which means USUALLY I've already read it. Sometimes this means
(however) that someone called me on the phone and I already got it but
ignored it for some reason.

As a result, I don't think that attaching to dbmail-smtp/lmtpd is such a
good idea. I think having an agent regularly poll the mailboxes for
messages that:
        * haven't been \Recent for at least 5 minutes
        * were uploaded via APPEND
and generate the dbmail_messagewords records would probably be a better
idea.

dbmail-imapd would first scan the indexes, and then fall back on the
current behavior for "new messages" (i.e. ones that are less than 5
minutes \Recent). This would give clients immediate access to the
entries in the index, and would only cause us to search the last 5
minutes of messages when an IMAP client is actually searching (and
presumably done with filtering).

-- 
Internet Connection High Quality Web Hosting
http://www.internetconnection.net/

Reply via email to