On 1/11/2017 1:47 PM, Chetas Joshi wrote:
> I have deployed a SolrCloud (solr 5.5.0) on hdfs using cloudera 5.4.7. The
> cloud has 86 nodes.
>
> This is my config for the collection
>
> numShards=80
> ReplicationFactor=1
> maxShardsPerNode=1
> autoAddReplica=true
>
> I recently decommissioned a node to resolve some disk issues. The shard
> that was being hosted on that host is now being shown as "gone" on the solr
> admin UI.
>
> The got the cluster status using the collection API. It says
> shard: active, replica: down
>
> The overseer does not seem to be creating an extra core even though
> autoAddReplica=true (
> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS).
>
> Is this happening because the overseer sees the shard as active as
> suggested by the cluster status?
> If yes, is "autoAddReplica" not reliable? should I add a replica for this
> shard when such cases arise?

Your replicationFactor is one.  When there's one replica, you have no
redundancy.  If that replica goes down, the shard is completely gone.

As I understand it (I've got no experience with HDFS at all),
autoAddReplicas is designed to automatically add replicas until
replicationFactor is satisfied.  As already mentioned, your
replicationFactor is one.  This means that it will always be satisfied.

If autoAddReplicas were to kick in any time a replica went down, then
Solr would be busy adding replicas anytime you restarted a node ...
which would be a very bad idea.

If your number of replicas is one, and that replica goes down, where
would Solr go to get the data to create another replica?  The single
replica is down, so there's nothing to copy from.  You might be thinking
"from the leader" ... but a leader is nothing more than a replica that
has been temporarily elected to have an extra job.  A replicationFactor
of two doesn't mean a leader and two copies .. it means there are a
total of two replicas, one of which is elected leader.

If you want autoAddReplicas to work, you're going to need to have a
replicationFactor of at least two, and you're probably going to have to
delete the dead replica before another will be created.

Thanks,
Shawn

Reply via email to