Matthew T. O'Connor wrote:
On Tue, 2003-07-08 at 11:18, Jesse Norell wrote:
One other thought; I'm not recommending the approach for splitting
message components (eg. file attachments) and saving as discrete
components, as I think the work involved and complexity introduced
may not be worth the benefits, but one of the benefits not yet
mentioned could be less storage requirements for duplicate
attachments. As I watched my wife enjoy a flash application that
was emailed to her, and knew I'd seen it in her inbox at least once
in the past, I thought, "we could save the md5 checksum of each
decoded message component and would only have to store a single copy
of any given file within the entire mail spool!" That would
complicate matters even more for proper message reconstruction, but
is not entirely without appeal.
I agree this would be good to do and it's one of the things I think
exchange does well (I think...) if I sent a 10M video to 20 coworkers
the total data store on exchange only increases 10M, not 200M. I don't
think we need to store anything outside the database to do this (not
sure if that is what you were saying).
Question: Right now in dbmail, if I copy a message from my inbox to a
saved folder, does dbmail also copy all of the message_blks? Or does it
just make a new entry in the messages table that is also referenced by
by the message_blks? If not, I think this would be easy to do, and
since even moving a message in IMAP is actually a copy (I think), this
is probably a worthwhile optimization.
The database size will increase with the size of your copy.
There is nothing in the database structure that supports multiple
entries of a single message.
This also impairs the performance of the IMAP coy command, as
perviously has been stated on this list.
I actually do not know where the soft spot between disk usage and
CPU usage is, you know,
The flow would be something like this:
1 Split the message
2 Calculate the MD5 hash for each part GREATER than 10kbyte (TBD)
(This way we get NULL in the MD5 index for small chunks and
the select speed will probably improve)
3 Store the unique parts.
In organisations where people store their mail in a sent folder
and then distribute the message to a lot of people in their
organisation, you will certainly save disk space, certainly when
they are attaching Word documents.
But as Jesse stated during the time I was writing this mail, it
is a nice feature, but quite troublesome to implement.
What are we aiming for
- Minimal disk space
- Minimum CPU usage, this can actually be done with the scheme
above, depending on the organization mail usage
- Minimum problems during the development phase
/Magnus
Matthew
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev