Hi all, I have a collection that is frequently updated, is it possible that a Solr Cloud query returns duplicate documents while paginating?
Just to be clear, there is a collection with about 3M of documents and a Solr query selects just 500K documents sorted by Id, which are returned simply paginating the results with the parameters start, rows and sort. The query is like this one: http://localhost:8983/solr/collection1/select?q=idCat:1&start=0&rows=20000&sort=id asc To be honest, I've not verified personally, but the consumer of this query claims that after few trials, duplicate documents where returned. Given that the collection is frequently updated, I suppose that adding a large bunch of new documents during the pagination can affect the index and change the order of results. In other words, if I have 500K documents returned by 25 queries (20K documents for each request) and during the iteration, 1000 new documents are inserted. Given that I have a query sorted by Id, I think it is possibile that the documents returned reflect the new order, so it is possible that a document returned in a previous query now is also present in the current results. Again, I'm trying to solve this problem using the deep paging. I have read that "unlike basic pagination, Cursor pagination does not rely on using an absolute "offset" into the completed sorted list of matching documents. Instead, the cursorMark specified in a request encapsulates information about the relative position of the last document returned, based on the absolute sort values of that document. This means that the impact of index modifications is much smaller when using a cursor compared to basic pagination." What do you think about, am I right? The deep paging can help to solve this problem? Best regards and thanks for your time, Vincenzo