On 1/11/2017 1:47 PM, Chetas Joshi wrote: > I have deployed a SolrCloud (solr 5.5.0) on hdfs using cloudera 5.4.7. The > cloud has 86 nodes. > > This is my config for the collection > > numShards=80 > ReplicationFactor=1 > maxShardsPerNode=1 > autoAddReplica=true > > I recently decommissioned a node to resolve some disk issues. The shard > that was being hosted on that host is now being shown as "gone" on the solr > admin UI. > > The got the cluster status using the collection API. It says > shard: active, replica: down > > The overseer does not seem to be creating an extra core even though > autoAddReplica=true ( > https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS). > > Is this happening because the overseer sees the shard as active as > suggested by the cluster status? > If yes, is "autoAddReplica" not reliable? should I add a replica for this > shard when such cases arise?
Your replicationFactor is one. When there's one replica, you have no redundancy. If that replica goes down, the shard is completely gone. As I understand it (I've got no experience with HDFS at all), autoAddReplicas is designed to automatically add replicas until replicationFactor is satisfied. As already mentioned, your replicationFactor is one. This means that it will always be satisfied. If autoAddReplicas were to kick in any time a replica went down, then Solr would be busy adding replicas anytime you restarted a node ... which would be a very bad idea. If your number of replicas is one, and that replica goes down, where would Solr go to get the data to create another replica? The single replica is down, so there's nothing to copy from. You might be thinking "from the leader" ... but a leader is nothing more than a replica that has been temporarily elected to have an extra job. A replicationFactor of two doesn't mean a leader and two copies .. it means there are a total of two replicas, one of which is elected leader. If you want autoAddReplicas to work, you're going to need to have a replicationFactor of at least two, and you're probably going to have to delete the dead replica before another will be created. Thanks, Shawn