: You probably have duplicates (docs on different shards with the same id). : Deeper paging will detect more of them. : It does raise the question of if we should be changing numFound, or : indicating a separate duplicate count. Duplicates aren't eliminated
random thought (from someone whose never really considered distributed searching in much depth) ... why do we bother detecthing/removing the duplicates? strictly speaking docs with duplicate IDs on multiple shards is a "garbage in" situation, i can understanding Solr taking a little extra effort to not fail hard if this situation is encountered, but why update the numFound at all, or remove the duplicates from the list? ... why not leave them in as is? (then numFound would never change) -Hoss