On Nov 20, 2003, at 8:28 AM, Magnus Sundberg wrote:

Eric Soroos wrote:

You can. That's effectively what I was doing in my message, except that you're not seeing it in the messageblk view. You probably don't want a unique constraint on messageblk, since the idea of the fingerprint is that it's a 1:1 mapping of the messageblk down to 128 bits.
I agree with you if I had written "UNIQUE (fingerprint)", but my belief is that the combination of both keys should be unique.
It's redundant.
I agree about that, but I my belief is that
"UNIQUE (messageblock, fingerprint)" is faster than
"UNIQUE (messageblock)"

I don't understand. Can you explain your reasoning? Especially how it's different/faster/better than Unique(fingerprint).


Thats the reason I put it together like that.
The fingerprint is going to be unique iff the messageblock is unique. If that wasn't the case, then there's no point in storing or calculating it. Unique constraints are generally done with indexes (not sure about mysql, but it's certainly the case with postgres), so you'd end up adding the entire messageblk to the index. And it's not a particularly useful index, since that's the only thing that index would be good for.

I agree with you on this point.


Is it possible to have some hash algoritms and just index the messageblock? I.e. let the database itself do the fingerprinting and hiding it for us?

Yes, that was my proposal. I'm using md5 in my test case. The proposal requires a couple of rules, a stored procedure, and a view, so it's doubtful that it will work on mysql.

Would this kill the database?

No, it seems to have about the same performance as the original schema. It appears that the md5 hash of the messageblk data is lost in the overhead of all the other work that's happening on insert.

eric

Reply via email to