Hi Shawn, This is what I understand about how Solr works on HDFS. Please correct me if I am wrong.
Although solr shard replication Factor = 1, HDFS default replication = 3. When the node goes down, the solr server running on that node goes down and hence the instance (core) representing the replica goes down. The data in on HDFS (distributed across all the datanodes of the hadoop cluster with 3X replication). This is the reason why I have kept replicationFactor=1. As per the link: https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS One benefit to running Solr in HDFS is the ability to automatically add new replicas when the Overseer notices that a shard has gone down. Because the "gone" index shards are stored in HDFS, a new core will be created and the new core will point to the existing indexes in HDFS. This is the expected behavior of Solr overseer which I am not able to see. After a couple of hours a node was assigned to host the shard but the status of the shard is still "down" and the instance dir is missing on that node for that particular shard_replica. Thanks! On Wed, Jan 11, 2017 at 5:03 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 1/11/2017 1:47 PM, Chetas Joshi wrote: > > I have deployed a SolrCloud (solr 5.5.0) on hdfs using cloudera 5.4.7. > The > > cloud has 86 nodes. > > > > This is my config for the collection > > > > numShards=80 > > ReplicationFactor=1 > > maxShardsPerNode=1 > > autoAddReplica=true > > > > I recently decommissioned a node to resolve some disk issues. The shard > > that was being hosted on that host is now being shown as "gone" on the > solr > > admin UI. > > > > The got the cluster status using the collection API. It says > > shard: active, replica: down > > > > The overseer does not seem to be creating an extra core even though > > autoAddReplica=true ( > > https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS). > > > > Is this happening because the overseer sees the shard as active as > > suggested by the cluster status? > > If yes, is "autoAddReplica" not reliable? should I add a replica for this > > shard when such cases arise? > > Your replicationFactor is one. When there's one replica, you have no > redundancy. If that replica goes down, the shard is completely gone. > > As I understand it (I've got no experience with HDFS at all), > autoAddReplicas is designed to automatically add replicas until > replicationFactor is satisfied. As already mentioned, your > replicationFactor is one. This means that it will always be satisfied. > > If autoAddReplicas were to kick in any time a replica went down, then > Solr would be busy adding replicas anytime you restarted a node ... > which would be a very bad idea. > > If your number of replicas is one, and that replica goes down, where > would Solr go to get the data to create another replica? The single > replica is down, so there's nothing to copy from. You might be thinking > "from the leader" ... but a leader is nothing more than a replica that > has been temporarily elected to have an extra job. A replicationFactor > of two doesn't mean a leader and two copies .. it means there are a > total of two replicas, one of which is elected leader. > > If you want autoAddReplicas to work, you're going to need to have a > replicationFactor of at least two, and you're probably going to have to > delete the dead replica before another will be created. > > Thanks, > Shawn > >