Exactly. We have done some projects where we extract records en masse. With this technique we can make a query that will fetch exactly 3000 +-50 records, and walk through every 50 records using the query as a filter. Works pretty well.
Lance -----Original Message----- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 18, 2007 11:07 AM To: solr-user@lucene.apache.org Subject: Re: retrieve lucene "doc id" Hi Lance, You said: We use the standard (some RFC) text representation of 32 hex characters. This has the advantage that F* pulls 1/16 of the total index, with a completely randomized distribution, F** 1/256, etc. This is very handy for data analysis and document extraction. Could you elaborate on the last sentence? Maybe give an example of what you have in mind? Are you thinking that this, because of uniform distribution, lets you easily get a subset of documents of predictable size and thus have an apriori knowledge of how large of a data set you'll get and work with? Or something else? Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- From: "Norskog, Lance" <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, December 17, 2007 2:43:55 PM Subject: RE: retrieve lucene "doc id" We are using MD5 to generate our IDs. MD5s are 128 bits creating a very unique and very randomized number for the content. Nobody has ever reported two different data sets that create the same MD5. We use the standard (some RFC) text representation of 32 hex characters. This has the advantage that F* pulls 1/16 of the total index, with a completely randomized distribution, F** 1/256, etc. This is very handy for data analysis and document extraction. MD5 creates 128 bits, but if your index is small enough that you are willing to risk it, you could pick 64 bits and park them in a Java long. -----Original Message----- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, December 17, 2007 8:15 AM To: solr-user@lucene.apache.org Subject: Re: retrieve lucene "doc id" Yonik Seeley wrote: > On Dec 17, 2007 1:40 AM, Ben Incani <[EMAIL PROTECTED]> wrote: >> I have converted to using the Solr search interface and I am trying >> to retrieve documents from a list of search results (where previously >> I had used the doc id directly from the lucene query results) and the >> solr id I have got currently indexed is unfortunately configured not be unique! > > Ouch... I'd try to make a unique Id then! > Or barring that, just try to make the query match exactly the docs you > want back (don't do the 2 phase thing). > In 1.3-dev, you can use UUIDField to have solr generate a UUID for each doc. ryan