Matthew T. O'Connor wrote:
On Tue, 2003-07-08 at 11:18, Jesse Norell wrote:

 One other thought; I'm not recommending the approach for splitting
message components (eg. file attachments) and saving as discrete
components, as I think the work involved and complexity introduced
may not be worth the benefits, but one of the benefits not yet
mentioned could be less storage requirements for duplicate
attachments.  As I watched my wife enjoy a flash application that
was emailed to her, and knew I'd seen it in her inbox at least once
in the past, I thought, "we could save the md5 checksum of each
decoded message component and would only have to store a single copy
of any given file within the entire mail spool!"  That would
complicate matters even more for proper message reconstruction, but
is not entirely without appeal.


I agree this would be good to do and it's one of the things I think
exchange does well (I think...) if I sent a 10M video to 20 coworkers
the total data store on exchange only increases 10M, not 200M.  I don't
think we need to store anything outside the database to do this (not
sure if that is what you were saying).
Question: Right now in dbmail, if I copy a message from my inbox to a
saved folder, does dbmail also copy all of the message_blks?  Or does it
just make a new entry in the messages table that is also referenced by
by the message_blks?  If not, I think this would be easy to do, and
since even moving a message in IMAP is actually a copy (I think), this
is probably a worthwhile optimization.

The database size will increase with the size of your copy.
There is nothing in the database structure that supports multiple entries of a single message. This also impairs the performance of the IMAP coy command, as perviously has been stated on this list.

I actually do not know where the soft spot between disk usage and CPU usage is, you know,
The flow would be something like this:
1 Split the message
2 Calculate the MD5 hash for each part GREATER than 10kbyte (TBD)
   (This way we get NULL in the MD5 index for small chunks and
    the select speed will probably improve)
3 Store the unique parts.

In organisations where people store their mail in a sent folder and then distribute the message to a lot of people in their organisation, you will certainly save disk space, certainly when they are attaching Word documents. But as Jesse stated during the time I was writing this mail, it is a nice feature, but quite troublesome to implement.

What are we aiming for
- Minimal disk space
- Minimum CPU usage, this can actually be done with the scheme above, depending on the organization mail usage
- Minimum problems during the development phase


/Magnus



Matthew
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev





Reply via email to