If you don't want "downtime", you could add a <field name="indextime" type="tdate" default="NOW" /> field to your schema, reload, do a full re-index on top of your existing index, and then delete all documents that were not updated, via a delelteByQuery, e.g.: indextime:[* TO NOW-1DAY]
-- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 12. okt. 2014 kl. 21:59 skrev Shawn Heisey <apa...@elyograg.org>: > On 10/12/2014 12:26 PM, vidit.asthana wrote: >> I have a strange problem where select q=*:* is returning different number of >> documents. Sometime its returning numFound = 5866712 and sometimes it >> returns numFound = 5852274. *numFound is always one of these 2 values.* >> >> Here is the query: >> >> *http://localhost:5011/solr/mycollection/select?q=*:*&rows=0* >> >> >> I am running Solr in cloud mode and this problem is occurring with both >> solr-4.5.1 and solr-4.10.0. I have exactly same data indexed in both >> versions. 4.5.1 is running on a 8 nodes cluster (4x2 shards) and solr-4.10.0 >> is running on a 4 node (2x2 shards)cluster. > > I really need to make a wiki page for this. It would save so much > typing! I also need to boil it down to a small-scale real-world example > and show how the numbers get calculated and what goes wrong, which means > I need to have a complete understanding of the problem, and at this > moment, I don't have that. > > This is a problem that's unique to distributed indexes. What causes it > is having documents with the same value in the uniqueKey field indexed > in more than one shard. > > It is not a bug, it's a result of the way that results from multiple > shards are combined into one result. The only way to "fix" this problem > would involve so much additional processing that it would make all > queries extremely slow. > > If you're using automatic document routing, then your routing algorithm > may have changed at some point, and you didn't re-index. If you're > using manual document routing, then some documents were indexed on the > wrong shard, and later indexed on another shard as well. > > Preventing the problem is easy -- always index documents onto the > correct shard. Fixing the problem at this point might involve clearing > your index and re-indexing from scratch, unless you can figure out which > documents have been indexed on more than one shard and you can delete > them from the incorrect shard(s). > > Thanks, > Shawn >