Very good point. I've seen this issue occur once before when I was playing with 4.3.1 and don't remember it happening since 4.5.0+, so that is good news - we are just behind.
For anyone that is curious, on my earlier mention that Zookeeper/clusterstate.json was not taking updates: this was NOT correct. Zookeeper has no issues taking set/creates to clusterstate.json (or any znode), just this one node seemed to stay stuck as "state: active" while it was very inconsistent for reasons unknown, potentially just bugs. The good news is this will be resolved today with a create/destroy of the bad replica. Thanks all! Tim On 4 December 2013 16:50, Mark Miller <markrmil...@gmail.com> wrote: > Keep in mind, there have been a *lot* of bug fixes since 4.3.1. > > - Mark > > On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt <t...@elementspace.com> wrote: > > > Hey all, > > > > Now that I am getting correct results with "distrib=false", I've > identified that 1 of my nodes has just 1/3rd of the total data set and > totally explains the flapping in results. The fix for this is obvious > (rebuild replica) but the cause is less obvious. > > > > There is definately more than one issue going on with this SolrCloud > (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that > /clusterstate.json doesn't seem to get updated when nodes are brought > down/up is the reason why this replica remained in the distributed request > chain without recovering/re-replicating from leader. > > > > I imagine my Zookeeper ensemble is having some problems unrelated to > Solr that is the real root cause. > > > > Thanks! > > > > Tim > > > > On 04/12/13 03:00 PM, Tim Vaillancourt wrote: > >> Chris, this is extremely helpful and it's silly I didn't think of this > sooner! Thanks a lot, this makes the situation make much more sense. > >> > >> I will gather some proper data with your suggestion and get back to the > thread shortly. > >> > >> Thanks!! > >> > >> Tim > >> > >> On 04/12/13 02:57 PM, Chris Hostetter wrote: > >>> : > >>> : I may be incorrect here, but I assumed when querying a single core > of a > >>> : SolrCloud collection, the SolrCloud routing is bypassed and I am > talking > >>> : directly to a plain/non-SolrCloud core. > >>> > >>> No ... every query received from a client by solr is handled by a > single > >>> core -- if that core knows it's part of a SolrCloud collection then it > >>> will do a distributed search across a random replica from each shard in > >>> that collection. > >>> > >>> If you want to bypass the distribute search logic, you have to say so > >>> explicitly... > >>> > >>> To ask an arbitrary replica to only search itself add "distrib=false" > to > >>> the request. > >>> > >>> Alternatively: you can ask that only certain shard names (or certain > >>> explicit replicas) be included in a distribute request.. > >>> > >>> https://cwiki.apache.org/confluence/display/solr/Distributed+Requests > >>> > >>> > >>> > >>> -Hoss > >>> http://www.lucidworks.com/ > >