On Mon, 17 Dec 2007 14:43:55 -0500 "Norskog, Lance" <[EMAIL PROTECTED]> wrote:
> We are using MD5 to generate our IDs. MD5s are 128 bits creating a very > unique and very randomized number for the content. Nobody has ever > reported two different data sets that create the same MD5. yup, we use 2 Md5 concatenated . the first part is the MD5 of a group name,the 2nd part is related to the item in the group (the same item can be in different groups, so this 2nd part can also be repeated ) - of course, only 1 item can exist in each group, so it is always unique. > > We use the standard (some RFC) text representation of 32 hex characters. > This has the advantage that F* pulls 1/16 of the total index, with a > completely randomized distribution, F** 1/256, etc. This is very handy > for data analysis and document extraction. yup, and in our case, the first half of the docId could be used to get all items in a group. But your example is a good one - I haven't used it for that yet, but it's a simple and practical use of the doc id :) cheers, B _________________________ {Beto|Norberto|Numard} Meijome "I was born not knowing and have had only a little time to change that here and there." Richard Feynman I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.