Using Solr 4.6.1 and in my schema I have a date field storing the time a document was added to Solr.
I have a utility program which: - queries for all of the documents in the previous day sorted by create date - pages through the results keeping track of the unique document ids - compare the total number of unique doc ids to the numFound to see if it they match I've noticed that if I use a page size larger than the number of documents for the given day (aka get everything in one query), then everything works as expected (results sorted correctly, unique doc ids size == numFound). However, when I use a smaller page say, say 10 rows per page, I randomly see cases where the last document of a page will be duplicated as the first document of the next page, even though the "start" and "rows" parameters increased correctly. So I might see something like numFound=100 but unique doc ids is 97, and then I see three occurrences where the last doc id on a page was also the first on the next page. It is not consistent between tests, the number of occurrences changes and the locations of the occurrences can change as well. The larger the result set, and smaller the page size, the more frequent the occurrences are. The only thing I have noticed is that if I change the sorting of the initial query to use a non-date field, then this doesn't happen anymore. Are there any know issues/limitations sorting/paging on a date field ? The only mention I can find is this thread: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200909.mbox/%3c57912a0644b6ab4381816de07cb1c38d02a00...@s2na1excluster.na1.ad.group%3E