It's a well know search engines limitation. This post will help you get into the core problem http://www.searchworkings.org/blog/-/blogs/lucene-solr-and-deep-paging . it seems that the solution is contributed into Lucene, but not yet for Solr.
On Tue, Jan 15, 2013 at 6:36 PM, Upayavira <u...@odoko.co.uk> wrote: > You are setting yourself up for disaster. > > If you ask Solr for documents 1000 to 1010, it needs to sort documents 1 > to 1010, and discard the first 1000, which causes horrible performance. > > I'm curious to hear if others have strategies to extract content > sequentially from an index. I suspect a new SearchComponent could really > help here. > > I suspect it would work better if you don't sort at all, in which case > you'll return the documents in index order. The issue is that a commit, > or a background merge could change index order which would mess up your > export. > > Sorry no clearer answers. > > Upayavira > > On Tue, Jan 15, 2013, at 02:07 PM, elisabeth benoit wrote: > > Hello, > > > > I have a Solr instance (solr 3.6.1) with around 3 000 000 documents. I > > want > > to read (in a java test application) all my documents, but not in one > > shot > > (because it takes too much memory). > > > > So I send the same request, over and over, with > > > > q=*:* > > rows=1000 > > sort=id desc => to be sure I always get same ordering* > > and start parameter increased of 1000 at each iteration > > > > > > checking the solr logs, I realized that the query responding time > > increases > > as the start parameter gets bigger > > > > for instance > > > > with start < 500 000, it takes about 500ms > > with start > 1 100 000 and < 1 200 000, it takes between 5000 and 5200 > > ms > > with start > 1 250 000 and < 1 320 000, it takes between 6100 and 6400 ms > > > > > > Does someone have an idea how to optimize this query? > > > > Thanks, > > Elisabeth > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>