Re: [Dbmail] 2.0 and transactions

Magnus Sundberg Thu, 20 Nov 2003 18:38:46 +0100 (CET)

Eric Soroos wrote:

On Nov 20, 2003, at 8:28 AM, Magnus Sundberg wrote:
Eric Soroos wrote:
You can. That's effectively what I was doing in my message, exceptthat you're not seeing it in the messageblk view. You probablydon't want a unique constraint on messageblk, since the idea of thefingerprint is that it's a 1:1 mapping of the messageblk down to128 bits.
I agree with you if I had written "UNIQUE (fingerprint)", but mybelief is that the combination of both keys should be unique.
It's redundant.
I agree about that, but I my belief is that
"UNIQUE (messageblock, fingerprint)" is faster than
"UNIQUE (messageblock)"

Well the first is something I have found out earlier, consider atable with the fields A and B and the following records:

for that record, you can not add the constraint UNIQUE(A) norUNIQUE(B), but you can add the constraint UNIQUE(A,B)

Well, as you have stated there is a non zero possibility to getthe same fingerprint for two different messageblocks.Therefore you can use the UNIQUE(messageblock, fingerprint) as aprotection against duplicate messageblocks.I beleive the query optimizer in the databsae will only use thefingerprint as index, since this probably would be much fasterthan searching on the messageblock itself.It is probably wrong to have the fingerprint as a bigint, itshould probably only be 32 or 64 bits.

I don't understand. Can you explain your reasoning? Especially how it'sdifferent/faster/better than Unique(fingerprint).
Thats the reason I put it together like that.
The fingerprint is going to be unique iff the messageblock is unique.If that wasn't the case, then there's no point in storing orcalculating it. Unique constraints are generally done with indexes(not sure about mysql, but it's certainly the case with postgres), soyou'd end up adding the entire messageblk to the index. And it's nota particularly useful index, since that's the only thing that indexwould be good for.
I agree with you on this point.
Is it possible to have some hash algoritms and just index themessageblock? I.e. let the database itself do the fingerprinting andhiding it for us?
Yes, that was my proposal. I'm using md5 in my test case. The proposalrequires a couple of rules, a stored procedure, and a view, so it'sdoubtful that it will work on mysql.

I mean some kind of for us invisible index, that only thedatabase query engine uses, without us really knowing.

MySQL has some text search functions that could be useful.
You should just add it to the DB like INDEX(messageblock)

I am still wondering if this is possible or if it kills the DB.If the DB die, then we would have to use a separate table/recordwith fingerprint records etc. since the DB can live with it.

Would this kill the database?
No, it seems to have about the same performance as the original schema.It appears that the md5 hash of the messageblk data is lost in theoverhead of all the other work that's happening on insert.
eric

Magnus

Re: [Dbmail] 2.0 and transactions

Reply via email to