Re: [Dbmail-dev] db schema changes discussion

Jesse Norell Tue, 8 Jul 2003 17:18:47 +0200 (CEST)

> > This would be ideal with a single table and one VARCHAR column 
> > for each _interesting_ header. How do we know what the clients 
> > think are interesting headers?
> 
> This is why I suggested the value pair design, since we won't always
> know what's interesting.


  Yep.  We need to be able to specify what it interesting (either
at compile time or preferrably in config file), which could
potentially vary from site to site, and also be able to turn it
off completely for performance (eg. on a pop3-only site).


> > WE DO GAIN if the clients very often request headers from 
> > multiple mail in the mailbox, but do the clients make these 
> > requests often? Mozilla only do this when the mailbox cache is 
> > non existant.
> > 
> > I beleive in splitting the message into two parts, header and body.

  Yeah, either in seperate tables or as it is now, but add a flag
for the blocks which contain headers.  I don't know what would be
faster, of if there'd really be a huge difference.  Also, another
table for the (optional) common message headers.


> My real question is what queries are being done by the server.  When I
> use evolution and ask it to search the message body for a piece of text,
> does it send this request to the server? Or does it only search what is
> cached.

  Along with "what queries are being done by the server," make sure
we also look at whether these queries make sense.  Ie. after looking
at common requests for IMAP clients, can the current queries done by
the server be improved upon?

  I'm not at all familiar with IMAP either, but I know from previous
discussions, the header caching that would be done is to handle the
most common cases (eg. searches within From/To/Subject), but not 100%
of them (eg. full message searches).  And it will be even more of a
help to direct access webmail applications (which typically read,
parse, and display From, Subject, Date, etc. every time you look at
the mailbox).


[From another message]
> >   An idea we've started (not completed) implimenting in weDBmail
> > along these lines is dynamically parsing/caching the headers.  Any
> > time the message headers are requested, it'll check the "header cache"
> > table and use entries if found, but if not, it'll parse the headers
> > from messageblks and use the results while saving appropriate ones in
> > the cache.  Could make the pop3 and imap servers do that as well.
> 
> Could you expand a little on what you mean by "dynamically
> parsing/caching the headers"  do you mean writing it to the
> database,
> keeping them in application memory? etc...  Would be interesting to hear
> this flushed out a little.

  This is basically a workaround/workalike for what dbmail will
eventually do itsself.  Just a seperate table along the lines of
(message_indr, hdr_name, header).  In weDBMail we make large use of
From, To, Subject and Date headers, and have to parse it in php
for every message display, every folder view, etc.  Instead of the
current {read message header, parse, use data} we'll have a process
of {check header cache table, read message header, parse, save in
header cache table, use data} for a non-cached message, and a
process of {check header cache table, use data} for cached messages.

> The idea of doing this at message header request time somewhat defeats
> the purpose.  I think doing this work at message delivery time is
> optimal.  Yes it adds overhead, but messages arrive in a fairly steady
> stream thus distributing this load over time as opposed to having to do
> all this work while the user is waiting.  A little work at message
> delivery time to get get the data into an "optimal" state (still to be
> determined what optimal really means) will certainly help response
> times, perhaps with a slight increase in overall server load as a
> consequence. 

  Right - we're working on the dynamic caching because with us being
too lazy to change dbmail to handle it itsself, that's the only
practical option available.  The only reason to add that to the pop3/
imap daemons would be for seemless migration (all new messages get
cached at injection time, old messages happen as needed).  It may
be easier just to write that into dbmail-maintenance though, and
when you update from the old non-header-caching setup to the new,
you just have to run dbmail-maintenance to fix things up.


  One other thought; I'm not recommending the approach for splitting
message components (eg. file attachments) and saving as discrete
components, as I think the work involved and complexity introduced
may not be worth the benefits, but one of the benefits not yet
mentioned could be less storage requirements for duplicate
attachments.  As I watched my wife enjoy a flash application that
was emailed to her, and knew I'd seen it in her inbox at least once
in the past, I thought, "we could save the md5 checksum of each
decoded message component and would only have to store a single copy
of any given file within the entire mail spool!"  That would
complicate matters even more for proper message reconstruction, but
is not entirely without appeal.

Jesse

--
Jesse Norell
jesse (at) kci.net

Re: [Dbmail-dev] db schema changes discussion

Reply via email to