We are using MD5 to generate our IDs. MD5s are 128 bits creating a very unique and very randomized number for the content. Nobody has ever reported two different data sets that create the same MD5.
We use the standard (some RFC) text representation of 32 hex characters. This has the advantage that F* pulls 1/16 of the total index, with a completely randomized distribution, F** 1/256, etc. This is very handy for data analysis and document extraction. MD5 creates 128 bits, but if your index is small enough that you are willing to risk it, you could pick 64 bits and park them in a Java long. -----Original Message----- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, December 17, 2007 8:15 AM To: solr-user@lucene.apache.org Subject: Re: retrieve lucene "doc id" Yonik Seeley wrote: > On Dec 17, 2007 1:40 AM, Ben Incani <[EMAIL PROTECTED]> wrote: >> I have converted to using the Solr search interface and I am trying >> to retrieve documents from a list of search results (where previously >> I had used the doc id directly from the lucene query results) and the >> solr id I have got currently indexed is unfortunately configured not be unique! > > Ouch... I'd try to make a unique Id then! > Or barring that, just try to make the query match exactly the docs you > want back (don't do the 2 phase thing). > In 1.3-dev, you can use UUIDField to have solr generate a UUID for each doc. ryan