RE: retrieve lucene "doc id"

Norskog, Lance Mon, 17 Dec 2007 11:42:25 -0800

We are using MD5 to generate our IDs. MD5s are 128 bits creating a very
unique and very randomized number for the content. Nobody has ever
reported two different data sets that create the same MD5.

We use the standard (some RFC) text representation of 32 hex characters.
This has the advantage that F* pulls 1/16 of the total index, with a
completely randomized distribution, F**  1/256, etc.  This is very handy
for data analysis and document extraction. 

MD5 creates 128 bits, but if your index is small enough that you are
willing to risk it, you could pick 64 bits and park them in a Java long.

-----Original Message-----
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 17, 2007 8:15 AM
To: solr-user@lucene.apache.org
Subject: Re: retrieve lucene "doc id"

Yonik Seeley wrote:
> On Dec 17, 2007 1:40 AM, Ben Incani <[EMAIL PROTECTED]>
wrote:
>> I have converted to using the Solr search interface and I am trying 
>> to retrieve documents from a list of search results (where previously

>> I had used the doc id directly from the lucene query results) and the

>> solr id I have got currently indexed is unfortunately configured not
be unique!
> 
> Ouch... I'd try to make a unique Id then!
> Or barring that, just try to make the query match exactly the docs you

> want back (don't do the 2 phase thing).
> 

In 1.3-dev, you can use UUIDField to have solr generate a UUID for each
doc.

ryan

RE: retrieve lucene "doc id"

Reply via email to