On 1/13/2020 11:53 AM, Gael Jourdan-Weil wrote:
Just to clarify something, we are not returning 1000 docs per request, we are 
only returning 100.
We get 10 requests to Solr querying for docs 1 to 100, then 101 to 200, ... 
until 901 to 1000.
But all that in the exact same second.

But I understand that to retrieve docs 901 to 1000, Solr needs to first get and 
sort the first 900 docs, so the request to get 901 to 1000 is as costly as 
asking for 1 to 1000 directly?
If the sort applies on an indexed field (isn't it mandatory?), why do Solr 
needs to read the first 900 docs ?

In order to get the 10th page, it must sort to determine the IDs for the top 1000, skip 900 of them, and then retrieve the last 100. So the query portion (not counting document retrieval) for page 10 has nearly the same cost as asking for all 1000 in the same request.

Asking for the first 100 involves only the top 100 documents. Then because the request for the next 100 must obtain the top 200, it is a little bit slower. The third request must obtain the top 300, so it's slower again. And so on.

Are those 10 requests happening simultaneously, or consecutively? If it's simultaneous, then they won't benefit from Solr caching. Because Solr can cache certain things, it would probably be faster to make 10 consecutive requests than 10 simultaneous.

What are you trying to accomplish when you make these queries? If we understand that, perhaps we can come up with something better.

Thanks,
Shawn

Reply via email to