Had you already seen Solr deep paging? https://lucidworks.com/post/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
> On Tue, 14 Jan 2020 at 20:41, Erick Erickson <erickerick...@gmail.com> wrote: > Conceptually asking for cods 900-1000 works something like this. Solr (well, > Lucene actually) has to keep a sorted list 1,000 items long of scores and doc > IDs because you can’t know whether doc N+1 will be in the list, or where. So > the list manipulation is what takes the extra time. For even 1,000 docs, that > shouldn’t be very much overhead, when it gets up in the 10s of K (or, I’ve > seen millions) it’s _very_ noticeable. > > With the example you’ve talked about, I doubt this is really a problem. > > FWIW, > Erick > > > On Jan 14, 2020, at 1:40 PM, Gael Jourdan-Weil > > <gael.jourdan-w...@kelkoogroup.com> wrote: > > > > Ok I understand better. > > Solr does not "read" the 1 to 900 docs to retrieve 901 to 1000 but it still > > needs to compute some stuff (docset intersection or something like that, > > right?) and sort, which is costly, and then "read" the docs. > > > >> Are those 10 requests happening simultaneously, or consecutively? If > >> it's simultaneous, then they won't benefit from Solr caching. Because > >> Solr can cache certain things, it would probably be faster to make 10 > >> consecutive requests than 10 simultaneous. > > > > The 10 requests are simultaneous which is I think an explanation of the > > issues we encounter. If they were consecutive, I'd expect to take benefit > > of the cache indeed. > > > >> What are you trying to accomplish when you make these queries? If we > >> understand that, perhaps we can come up with something better. > > > > Actually we are exposing a search engine and it's a behavior from some of > > our clients. > > It's not a behavior we are deliberately doing or encouraging. > > But before discussing with them, we wanted to understand a bit better what > > in Solr explain those response times. > > > > Regards, > > Gaël > >