Re: how to optimize same query with different start values

Mikhail Khludnev Tue, 15 Jan 2013 07:51:07 -0800

It's a well know search engines limitation. This post will help you get
into the core problem
http://www.searchworkings.org/blog/-/blogs/lucene-solr-and-deep-paging . it
seems that the solution is contributed into Lucene, but not yet for Solr.



On Tue, Jan 15, 2013 at 6:36 PM, Upayavira <u...@odoko.co.uk> wrote:

> You are setting yourself up for disaster.
>
> If you ask Solr for documents 1000 to 1010, it needs to sort documents 1
> to 1010, and discard the first 1000, which causes horrible performance.
>
> I'm curious to hear if others have strategies to extract content
> sequentially from an index. I suspect a new SearchComponent could really
> help here.
>
> I suspect it would work better if you don't sort at all, in which case
> you'll return the documents in index order. The issue is that a commit,
> or a background merge could change index order which would mess up your
> export.
>
> Sorry no clearer answers.
>
> Upayavira
>
> On Tue, Jan 15, 2013, at 02:07 PM, elisabeth benoit wrote:
> > Hello,
> >
> > I have a Solr instance (solr 3.6.1) with around 3 000 000 documents. I
> > want
> > to read (in a java test application) all my documents, but not in one
> > shot
> > (because it takes too much memory).
> >
> > So I send the same request, over and over, with
> >
> > q=*:*
> > rows=1000
> > sort=id desc  => to be sure I always get same ordering*
> > and start parameter increased of 1000 at each iteration
> >
> >
> > checking the solr logs, I realized that the query responding time
> > increases
> > as the start parameter gets bigger
> >
> > for instance
> >
> > with start < 500 000, it takes about 500ms
> > with start > 1 100 000  and < 1 200 000, it takes between 5000 and 5200
> > ms
> > with start > 1 250 000 and < 1 320 000, it takes between 6100 and 6400 ms
> >
> >
> > Does someone have an idea how to optimize this query?
> >
> > Thanks,
> > Elisabeth
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Re: how to optimize same query with different start values

Reply via email to