RE: Need Debug Direction on Performance Problem

Toke Eskildsen Sun, 18 Jan 2015 02:51:07 -0800

Naresh Yadav [nyadav....@gmail.com] wrote:
> In both setups, we are reading in batches of 50k and each batch taking
> Setup1  : approx 7 seconds and for completing all batches of total 10 lakh
> results takes 1 to 2 minutes.
> Setup2 : approx 2-3 minutes and for completing all batches of total 10 lakh
> results  takes 114 minutes.


Deep paging across shards without cursors means that for each request, the full 
result set up to that point must be requested from each shard. The deeper your 
page, the longer it takes for each request. If you only extracted 500K results 
instead of the 1M in setup 2, it would likely take a lot less than 114/2 
minutes.

Since you are exporting the full result set, you should be using a cursor:
https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
This should make your extraction linear to the number of documents and 
hopefully a lot faster than your current setup.

Also, please refrain from using regional units such as "lakh" in an 
international forum. It requires some readers (me for example) to perform a 
search in order to be sure on what you are talking about.

- Toke Eskildsen

RE: Need Debug Direction on Performance Problem

Reply via email to