On Mon, 17 Dec 2007 14:43:55 -0500
"Norskog, Lance" <[EMAIL PROTECTED]> wrote:

> We are using MD5 to generate our IDs. MD5s are 128 bits creating a very
> unique and very randomized number for the content. Nobody has ever
> reported two different data sets that create the same MD5.

yup, we use 2 Md5 concatenated . the first part is the MD5 of a group name,the 
2nd part is related to the item in the group (the same item can be in different 
groups, so this 2nd part can also be repeated ) - of course, only 1 item can 
exist in each group, so it is always unique.

> 
> We use the standard (some RFC) text representation of 32 hex characters.
> This has the advantage that F* pulls 1/16 of the total index, with a
> completely randomized distribution, F**  1/256, etc.  This is very handy
> for data analysis and document extraction. 

yup, and in our case, the first half of the docId could be used to get all 
items in a group. But your example is a good one - I haven't used it for that 
yet, but it's a simple and practical  use of the doc id :)

cheers,
B
_________________________
{Beto|Norberto|Numard} Meijome

"I was born not knowing and have had only a little time to change that here and 
there." 
  Richard Feynman

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.

Reply via email to