: Thanks for your help. I found a workaround for this use case, which is to : avoid using a shards query and just asking each shard for a dump of the
that would be (step#1 in) the method i would recomend for your usecase of "check whats in the entire index" because it drasitcally reduces the amount of work that needed in each query -- you're just tlaking to one node at a time, not doing multiplexing and mergeing of results from all the nodes. : do any ranking or sorting. What I am now seeing is that qtimes have gone : up from about 5 seconds per request to nearly a minute as the start : parameter gets higher. I don't know if this is actually because of the : start parameter or if something is happening with memory use and/or caching it's because in order to give you results 36000000-37000000 it has to collect all the results are from 1-37000000 in order to then pull out the last 1000000 (or to put it another way: the request for start=36000000 doesn't know what the 36000000 it already gave you were, it has to figure it out again) step #2 in hte method i would use to deal with your situation would be to not use "start" at all -- sort the docs on your uniqeuKey field, make rows as big as you are willing to handle in a single request, and then instead of incrementing "start" on each request add an fq on to each subsequent query after the first one where you filtered my results to docs with a uniqueKey field greater then the last one seen in my previous response. this is similiar to what a lot of REST APIs seem to do (twitter comes to mind) to avoid the problem of dealing with deep paging efficiently or trying to keep track of "cursor" reservations on the the server side -- they just they don't offer either, and instead they let the client keep track of the the state (ie: "max_id") between requests. -Hoss