: Thanks for your help.   I found a workaround for this use case, which is to
: avoid using a shards query and just asking each shard for a dump of the

that would be (step#1 in) the method i would recomend for your usecase of 
"check whats in the entire index" because it drasitcally reduces the 
amount of work that needed in each query -- you're just tlaking to one 
node at a time, not doing multiplexing and mergeing of results from all 
the nodes.

: do any ranking or sorting.   What I am now seeing is that qtimes have gone
: up from about 5 seconds per request to nearly a minute as the start
: parameter gets higher.  I don't know if this is actually because of the
: start parameter or if something is happening with memory use and/or caching

it's because in order to give you results 36000000-37000000 it has to 
collect all the results are from 1-37000000 in order to then pull out the 
last 1000000 (or to put it another way: the request for start=36000000 
doesn't know what the 36000000 it already gave you were, it has to figure 
it out again)

step #2 in hte method i would use to deal with your situation would be to 
not use "start" at all -- sort the docs on your uniqeuKey field, make rows 
as big as you are willing to handle in a single request, and then instead 
of incrementing "start" on each request add an fq on to each subsequent 
query after the first one where you filtered my results to docs with a 
uniqueKey field greater then the last one seen in my previous response.

this is similiar to what a lot of REST APIs seem to do (twitter comes to 
mind) to avoid the problem of dealing with deep paging efficiently or 
trying to keep track of "cursor" reservations on the the server side -- 
they just they don't offer either, and instead they let the client keep 
track of the the state (ie: "max_id") between requests.


-Hoss

Reply via email to