Right. Consider your situation where you have 12 shards on 12 machines. 1/12 of your index is stored on each, so by definition if one of them is down you cannot get 1/12 of your data. shards.tolerant is the only option here. Although I'm surprised shards.tolerant makes things slow, perhaps there's a timeout happening. I thought that we just didn't send requests to shards that were down, so it shouldn't take any extra time.
The fact that replicationFactor is 1 is just because it's simple. By having a higher replication factor, and assuming that no two replicas for the same shard are on the same node, then taking one of the nodes down doesn't affect search and you get all your docs back. The SolrCloud tutorial works through this in some detail, perhaps it would be good if you reviewed it. Best, Erick On Fri, Sep 12, 2014 at 10:50 AM, Amey - codeinventory <ameyjad...@codeinventory.com> wrote: > Hi thanks for feedback erik. > > > What do you mean all the nodes in shards are down? i have 12 shards and i > suppose i have 12 nodes right? correct me if i am wrong. > > > Now whenever any one of them is down , say 1 down 11 active still i am > getting same error... is there any way to get results ignoring it other than > shard.tolerent=true? > > I have replicationFactor=1 ,seems default is 1 so i kept it as it is, if i > change it in my clusterstate.json to 2 or 3 ,is it possible i will get all > results though any node is down? > > Regards, > Amey > > --- Original Message --- > > From: "Erick Erickson" <erickerick...@gmail.com> > Sent: September 12, 2014 9:23 PM > To: solr-user@lucene.apache.org > Subject: Re: How to make solr fault tolerant for query? > > Hmmm, if all the nodes for a shard are down, shards.tolerant=true > shouldn't be slow unless there's some kind of bug. Solr should be > smart enough not to wait for a timeout. So I'm a bit surprised by that > statement, how sure of it are you? Do you have a test case? > > bq: but this is slow and dont gives all results. > > Well, you can _never_ have all results if all the replicase for a > shard are down, so the second part of that statement is just the way > the system _has_ to work. > > SolrCloud is a wonderful system. The HA/DR handling is predicated upon > at least one member of each shard being available. When you violate > that expectation you have to pay the price of at least incomplete > responses. > > Best, > Erick > > On Fri, Sep 12, 2014 at 2:33 AM, Amey Jadiye > <ameyjad...@codeinventory.com> wrote: >> Just a dumb question but how can i make solr cloud fault tolerant for >> queries ? why i am asking this question because, i have 12 different >> physical server and i am running 12 solr shards on that, whenever any one >> of them is going down because of any reason it gives me below error, i have >> 3 zookeeper for 12 servers all are leader and no replica for this solr cloud. >> I have option of using shards.tolerant=true but this is slow and dont gives >> all results. >> Best,Amey >> { >> "responseHeader": { >> "status": 503, >> "QTime": 7, >> "params": { >> "sort": "last_modified asc", >> "indent": "true", >> "q": "+links:[* TO *]", >> "_": "1410512274068", >> "wt": "json" >> } >> }, >> "error": { >> "msg": "no servers hosting shard: ", >> "code": 503 >> } >> }