Re: iterate through each document in Solr

Dmitry Kan Mon, 06 May 2013 12:57:27 -0700

Hi Ming,

Quoting my anwser on a diff. thread (
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201210.mbox/%3ccaonbidbuzzsaqctdhtlxlgeoori_ghrjbt-84bm0zb-fsps...@mail.gmail.com%3E
):


> > [code]
> > Directory indexDir = FSDirectory.open(new File(pathToDir));
> > IndexReader input = IndexReader.open(indexDir, true);
> >
> > FieldSelector fieldSelector = new SetBasedFieldSelector(
> >                 null, // to retrive all stored fields
> >                 Collections.<String>emptySet());
> >
> > int maxDoc = input.maxDoc();
> > for (int i = 0; i < maxDoc; i++) {
> > if (input.isDeleted(i)) {
> > // deleted document found, retrieve it
> > Document document = input.document(i, fieldSelector);
> > // analyze its field values here...
> > }
> > }
> > [/code]

Have a look here for code of a complete standalone example. It does
different thing with the Lucene index, so *do not* run it on your
index.

Dmitry



On Mon, May 6, 2013 at 7:36 PM, Mingfeng Yang <mfy...@wisewindow.com> wrote:

> Hi Dmitry,
>
> My index is not sharded, and since its size is so big, sharding won't help
> much on the paging issue.  Do you know any API which can help read from
> lucene binary index directly?     I will be nice if we can just scan
> through the docs directly.
>
> Thanks!
> Ming-
>
>
> On Mon, May 6, 2013 at 3:33 AM, Dmitry Kan <solrexp...@gmail.com> wrote:
>
> > Are you doing it once? Is your index sharded? If so, can you ask each
> shard
> > individually?
> > Another way would be to do it on Lucene level, i.e. read from the binary
> > indices (API exists).
> >
> > Dmitry
> >
> >
> > On Mon, May 6, 2013 at 5:48 AM, Mingfeng Yang <mfy...@wisewindow.com>
> > wrote:
> >
> > > Dear Solr Users,
> > >
> > > Does anyone know what is the best way to iterate through each document
> > in a
> > > Solr index with billion entries?
> > >
> > > I tried to use  select?q=*:*&start=xx&rows=500  to get 500 docs each
> time
> > > and then change start value, but it got very slow after getting through
> > > about 10 million docs.
> > >
> > > Thanks,
> > > Ming-
> > >
> >
>

Re: iterate through each document in Solr

Reply via email to