Goutham I suggest you read Hossman's excellent article on deep paging and why 
returning rows=(some large number) is a bad idea. It provides an thorough 
overview of the concept and will explain it better than I ever could 
(https://lucidworks.com/post/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#update_2013_12_18).
 In short if you want to extract that many documents out of your corpus use 
cursor mark, streaming expressions, or Solr's parallel SQL interface (that uses 
streaming expressions under the hood)
https://lucene.apache.org/solr/guide/8_6/streaming-expressions.html.

Thanks,

Dwane
________________________________
From: Goutham Tholpadi <gtholp...@gmail.com>
Sent: Friday, 25 September 2020 4:19 PM
To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
Subject: Solr queries slow down over time

Hi,

I have around 30M documents in Solr, and I am doing repeated *:* queries
with rows=10000, and changing start to 0, 10000, 20000, and so on, in a
loop in my script (using pysolr).

At the start of the iteration, the calls to Solr were taking less than 1
sec each. After running for a few hours (with start at around 27M) I found
that each call was taking around 30-60 secs.

Any pointers on why the same fetch of 10000 records takes much longer now?
Does Solr need to load all the 27M before getting the last 10000 records?
Is there a better way to do this operation using Solr?

Thanks!
Goutham

Reply via email to