Don't spend your time reading this, I've just found an answer in the documentation:
> *One way to ensure that a document will never be returned more then once, > is to use the uniqueKey field as the primary (and therefore: only > significant) sort criterion. **In this situation, you will be guaranteed > that each document is only returned once, no matter how it may be be > modified during the use of the cursor.* https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results On Thu, Aug 3, 2017 at 12:47 PM, Vincenzo D'Amore <v.dam...@gmail.com> wrote: > Hi all, > > I have a collection that is frequently updated, is it possible that a Solr > Cloud query returns duplicate documents while paginating? > > Just to be clear, there is a collection with about 3M of documents and a > Solr query selects just 500K documents sorted by Id, which are returned > simply paginating the results with the parameters start, rows and sort. > > The query is like this one: > > http://localhost:8983/solr/collection1/select?q=idCat:1& > start=0&rows=20000&sort=id asc > > To be honest, I've not verified personally, but the consumer of this query > claims that after few trials, duplicate documents where returned. > > Given that the collection is frequently updated, I suppose that adding a > large bunch of new documents during the pagination can affect the index and > change the order of results. > > In other words, if I have 500K documents returned by 25 queries (20K > documents for each request) and during the iteration, 1000 new documents > are inserted. > Given that I have a query sorted by Id, I think it is possibile that the > documents returned reflect the new order, so it is possible that a document > returned in a previous query now is also present in the current results. > > Again, I'm trying to solve this problem using the deep paging. > > I have read that "unlike basic pagination, Cursor pagination does not rely > on using an absolute "offset" into the completed sorted list of matching > documents. Instead, the cursorMark specified in a request encapsulates > information about the relative position of the last document returned, > based on the absolute sort values of that document. This means that the > impact of index modifications is much smaller when using a cursor compared > to basic pagination." > > What do you think about, am I right? The deep paging can help to solve > this problem? > > Best regards and thanks for your time, > Vincenzo > >