Theoretically this shouldn't happen, but is it possible that the two replicas for a given shard are not fully in sync?
Say shard1 replica1 is missing a document that is in shard1 replica2... if you run a query that would hit on that document and run it a bunch of times, sometimes replica 1 will handle the request and sometimes replica 2 will handle it, and it would change your number of results if one of them is missing a document. You could write a program that compares each replica's documents by querying them with distrib=false. If there was a replica out of sync, I would think it would detect that on a restart when comparing itself against the leader for that shard, but I'm not sure. On Wed, Aug 27, 2014 at 11:37 AM, Joshi, Shital <shital.jo...@gs.com> wrote: > Hi, > > We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes. We have > three collections. We recently upgraded from 4.4.0 from 4.8. We have ~850 > mil documents. > > We are facing an issue where refreshing a Solr query may give different > results (number of documents returned). This issue is seen in all three > collections. > > We found that Solr admin would report Solr instance states as not > “current”. Is it indicative of the above issue? > > We checked logs and found various errors/warnings, but they don’t seem to > be indicative of the above issue (or if they are – it’s not yet > clear/obvious or maybe indirectly related). The error message is like this: > 8/27/2014 2:01:24 AM ERROR SolrCmdDistributor > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error > opening new searcher. exceeded limit of maxWarmingSearchers=2, try again > later. > > This is our autocommit setting. > > <autoCommit> > <maxTime>15000</maxTime> > <maxDocs>100000</maxDocs> > <openSearcher>false</openSearcher> > </autoCommit> > <autoSoftCommit> > <maxTime>300000</maxTime> > </autoSoftCommit> > The searcher takes less than 1.5 minutes and the soft commit setting is > set for every 5 minutes. So there is no way to end up with more than two > searchers. > > The searcher registeredAttime and openedAttime are sometimes 12-13 hours > old and we end up bouncing could. > > Any help to solve this issue is appreciated. > > > > > > > > >