Uhm... that sounds reasonable. My data model may allow duplicate keys, but it's quite difficult. My key is a hash formed by an URL during a crawling process, and it's posible to re-crawl an existing URL. I think that I need to find a new way to compose an unique key to avoid this kind of bad behavior. However, that would be very useful if can Solr alert about duplicate keys or something. Maybe an extra parameter included as a field in the response plus numFound, docs, facets, etc. would be nice. Thank you very much!
Best regards, - Luis Cappa 2013/5/23 Shawn Heisey <s...@elyograg.org> > On 5/23/2013 1:51 AM, Luis Cappa Banda wrote: > > I've query each Solr shard server one by one and the total number of > > documents is correct. However, when I change rows parameter from 10 to > 100 > > the total numFound of documents change: > > I've seen this problem on the list before and the cause has been > determined each time to be caused by documents with the same uniqueKey > value appearing in more than one shard. > > What I think happens here: > > With rows=10, you get the top ten docs from each of the three shards, > and each shard sends its numFound for that query to the core that's > coordinating the search. The coordinator adds up numFound, looks > through those thirty docs, and arranges them according to the requested > sort order, returning only the top 10. In this case, there happen to be > no duplicates. > > With rows=100, you get a total of 300 docs. This time, duplicates are > found and removed by the coordinator. I think that the coordinator > adjusts the total numFound by the number of duplicate documents it > removed, in an attempt to be more accurate. > > I don't know if adjusting numFound when duplicates are found in a > sharded query is the right thing to do, I'll leave that for smarter > people. Perhaps Solr should return a message with the results saying > that duplicates were found, and if a config option is not enabled, the > server should throw an exception and return a 4xx HTTP error code. One > idea for a config parameter name would be allowShardDuplicates, but > something better can probably be found. > > Thanks, > Shawn > > -- - Luis Cappa