On Nov 20, 2003, at 8:28 AM, Magnus Sundberg wrote:
Eric Soroos wrote:
You can. That's effectively what I was doing in my message, except
that you're not seeing it in the messageblk view. You probably
don't want a unique constraint on messageblk, since the idea of the
fingerprint is that it's a 1:1 mapping of the messageblk down to
128 bits.
I agree with you if I had written "UNIQUE (fingerprint)", but my
belief is that the combination of both keys should be unique.
It's redundant.
I agree about that, but I my belief is that
"UNIQUE (messageblock, fingerprint)" is faster than
"UNIQUE (messageblock)"
I don't understand. Can you explain your reasoning? Especially how
it's different/faster/better than Unique(fingerprint).
Thats the reason I put it together like that.
The fingerprint is going to be unique iff the messageblock is unique.
If that wasn't the case, then there's no point in storing or
calculating it. Unique constraints are generally done with indexes
(not sure about mysql, but it's certainly the case with postgres), so
you'd end up adding the entire messageblk to the index. And it's not
a particularly useful index, since that's the only thing that index
would be good for.
I agree with you on this point.
Is it possible to have some hash algoritms and just index the
messageblock? I.e. let the database itself do the fingerprinting and
hiding it for us?
Yes, that was my proposal. I'm using md5 in my test case. The proposal
requires a couple of rules, a stored procedure, and a view, so it's
doubtful that it will work on mysql.
Would this kill the database?
No, it seems to have about the same performance as the original schema.
It appears that the md5 hash of the messageblk data is lost in the
overhead of all the other work that's happening on insert.
eric