Re: [Dbmail] 2.0 and transactions

Eric Soroos Thu, 20 Nov 2003 19:03:53 +0100 (CET)

Well the first is something I have found out earlier, consider a tablewith the fields A and B and the following records:
A B
1 1
1 2
2 1
for that record, you can not add the constraint UNIQUE(A) norUNIQUE(B), but you can add the constraint UNIQUE(A,B)

Correct. But if we're starting with A and B= f(a) where f is 1:1 limemd5, then you will actually get a table like:

A B
1 aa...43
1 aa...43
2 48...6c

Well, as you have stated there is a non zero possibility to get thesame fingerprint for two different messageblocks.

Non-zero, but on the order of 1 in 2^127 chance, or virtually never inthe age of the universe, at least for md5. You can obviously getduplicates, since if you hash every 128 bit string into 128 bits,you're assured of getting a collision on the next string that you hash,if you haven't done so already. But as I've said, that's a major timeinvestment.

I beleive the query optimizer in the databsae will only use thefingerprint as index, since this probably would be much faster thansearching on the messageblock itself.It is probably wrong to have the fingerprint as a bigint, it shouldprobably only be 32 or 64 bits.

128 is probably the minimum that I've seen for a good hash, 160 bitsha1 is another possibility.

I mean some kind of for us invisible index, that only the databasequery engine uses, without us really knowing.

MySQL has some text search functions that could be useful.
You should just add it to the DB like INDEX(messageblock)

That would be a fulltext index on messageblock, and rather expensiveand wouldn't even help the uniqueness problem.

I am still wondering if this is possible or if it kills the DB. If theDB die, then we would have to use a separate table/record withfingerprint records etc. since the DB can live with it.


I don't see how storing fingerprints could kill a database.

eric

Re: [Dbmail] 2.0 and transactions

Reply via email to