: Then I remembered we currently don't allow deep paging in our current : search indexes as performance declines the deeper you go. Is this still : the case?
Coincidently, i'm working on a new cursor based API to make this much more feasible as we speak.. https://issues.apache.org/jira/browse/SOLR-5463 I did some simple perf testing of the strawman approach and posted the results last week... http://searchhub.org/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/ ...current iterations on the patch are to eliminate the strawman code to improve performance even more and beef up the test cases. : If so, is there another approach to make all the data in a collection : easily available for retrieval? The only thing I can think of is to ... : Then I was thinking we could have a field with an incrementing numeric : value which could be used to perform range queries as a substitute for : paging through everything. Ie queries like 'IncrementalField:[1 TO : 100]' 'IncrementalField:[101 TO 200]' but this would be difficult to : maintain as we update the index unless we reindex the entire collection : every time we update any docs at all. As i mentioned in the blog above, as long as you have a uniqueKey field that supports range queries, bulk exporting of all documents is fairly trivial by sorting on your uniqueKey field and using an fq that also filters on your uniqueKey field modify the fq each time to change the lower bound to match the highest ID you got on the previous "page". This approach works really well in simple cases where you wnat to "fetch all" documents matching a query and then process/sort them by some other criteria on the client -- but it's not viable if it's important to you that the documents come back from solr in score order before your client gets them because you want to "stop fetching" once some criteria is met in your client. Example: you have billions of documents matching a query, you want to fetch all sorted by score desc and crunch them on your client to compute some stats, and once your client side stat crunching tells you you have enough results (which might be after the 1000th result, or might be after the millionth result) then you want to stop. SOLR-5463 will help even in that later case. The bulk of the patch should easy to use in the next day or so (having other people try out and test in their applications would be *very* helpful) and hopefully show up in Solr 4.7 -Hoss http://www.lucidworks.com/