Hi Ming, Quoting my anwser on a diff. thread ( http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201210.mbox/%3ccaonbidbuzzsaqctdhtlxlgeoori_ghrjbt-84bm0zb-fsps...@mail.gmail.com%3E ):
> > [code] > > Directory indexDir = FSDirectory.open(new File(pathToDir)); > > IndexReader input = IndexReader.open(indexDir, true); > > > > FieldSelector fieldSelector = new SetBasedFieldSelector( > > null, // to retrive all stored fields > > Collections.<String>emptySet()); > > > > int maxDoc = input.maxDoc(); > > for (int i = 0; i < maxDoc; i++) { > > if (input.isDeleted(i)) { > > // deleted document found, retrieve it > > Document document = input.document(i, fieldSelector); > > // analyze its field values here... > > } > > } > > [/code] Have a look here for code of a complete standalone example. It does different thing with the Lucene index, so *do not* run it on your index. Dmitry On Mon, May 6, 2013 at 7:36 PM, Mingfeng Yang <mfy...@wisewindow.com> wrote: > Hi Dmitry, > > My index is not sharded, and since its size is so big, sharding won't help > much on the paging issue. Do you know any API which can help read from > lucene binary index directly? I will be nice if we can just scan > through the docs directly. > > Thanks! > Ming- > > > On Mon, May 6, 2013 at 3:33 AM, Dmitry Kan <solrexp...@gmail.com> wrote: > > > Are you doing it once? Is your index sharded? If so, can you ask each > shard > > individually? > > Another way would be to do it on Lucene level, i.e. read from the binary > > indices (API exists). > > > > Dmitry > > > > > > On Mon, May 6, 2013 at 5:48 AM, Mingfeng Yang <mfy...@wisewindow.com> > > wrote: > > > > > Dear Solr Users, > > > > > > Does anyone know what is the best way to iterate through each document > > in a > > > Solr index with billion entries? > > > > > > I tried to use select?q=*:*&start=xx&rows=500 to get 500 docs each > time > > > and then change start value, but it got very slow after getting through > > > about 10 million docs. > > > > > > Thanks, > > > Ming- > > > > > >