On 1/13/2020 11:53 AM, Gael Jourdan-Weil wrote:
Just to clarify something, we are not returning 1000 docs per request, we are
only returning 100.
We get 10 requests to Solr querying for docs 1 to 100, then 101 to 200, ...
until 901 to 1000.
But all that in the exact same second.
But I understand that to retrieve docs 901 to 1000, Solr needs to first get and
sort the first 900 docs, so the request to get 901 to 1000 is as costly as
asking for 1 to 1000 directly?
If the sort applies on an indexed field (isn't it mandatory?), why do Solr
needs to read the first 900 docs ?
In order to get the 10th page, it must sort to determine the IDs for the
top 1000, skip 900 of them, and then retrieve the last 100. So the
query portion (not counting document retrieval) for page 10 has nearly
the same cost as asking for all 1000 in the same request.
Asking for the first 100 involves only the top 100 documents. Then
because the request for the next 100 must obtain the top 200, it is a
little bit slower. The third request must obtain the top 300, so it's
slower again. And so on.
Are those 10 requests happening simultaneously, or consecutively? If
it's simultaneous, then they won't benefit from Solr caching. Because
Solr can cache certain things, it would probably be faster to make 10
consecutive requests than 10 simultaneous.
What are you trying to accomplish when you make these queries? If we
understand that, perhaps we can come up with something better.
Thanks,
Shawn