Naresh Yadav [nyadav....@gmail.com] wrote: > In both setups, we are reading in batches of 50k and each batch taking > Setup1 : approx 7 seconds and for completing all batches of total 10 lakh > results takes 1 to 2 minutes. > Setup2 : approx 2-3 minutes and for completing all batches of total 10 lakh > results takes 114 minutes.
Deep paging across shards without cursors means that for each request, the full result set up to that point must be requested from each shard. The deeper your page, the longer it takes for each request. If you only extracted 500K results instead of the 1M in setup 2, it would likely take a lot less than 114/2 minutes. Since you are exporting the full result set, you should be using a cursor: https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results This should make your extraction linear to the number of documents and hopefully a lot faster than your current setup. Also, please refrain from using regional units such as "lakh" in an international forum. It requires some readers (me for example) to perform a search in order to be sure on what you are talking about. - Toke Eskildsen