Thanks for the reply. Now it's working but I'm not sure what change fixed this .. It might have been a communication error with ZooKeeper although I could not see anything as such in the logs. I found that ZooKeeper was for example generating some trace files in a location that was running out of space. Now I am sure about the config file ZooKeeper is using, maybe I was missing something on this side and because of this I had that weird issue ...
On Friday, April 24, 2015 7:30 AM, Shawn Heisey <apa...@elyograg.org> wrote: On 4/23/2015 5:59 AM, mihaela olteanu wrote: > I have setup a SolrCloud 5.1 cluster consisting of 3 nodes (let's cal them > host1, host2 and host3). I have started each node by specifying property > -DnumShards=1. Afterwards I have created a collection with the following > parameters and imported some data using DIH: > http://host1:port/solr/admin/collections?action=CREATE&name=collection1&numShards=1&autoAddReplicas=true&maxShardsPerNode=1&collection.configName=myConfig > > I would have expected (given that somehow the cluster size was set to 1) that > the other nodes: host2 and host3 would automatically create a replica of my > shard but they didn't. > The only way I could create a replica on the remaining nodes was by > specifying the replicationFactor while creating the collection: > http://host1:port/solr/admin/collections?action=CREATE&name=collection1&numShards=1&autoAddReplicas=true&maxShardsPerNode=1&collection.configName=myConfig&replicationFactor=3&maxShardsPerNode=1 The issue that adds the autoAddReplicas feature says that it was originally designed for shared filesystems, like HDFS. Later in the ticket, some mention is made about making it work on other filesystems. https://issues.apache.org/jira/browse/SOLR-5656 Even if it works on filesytems other than HDFS, one note about the feature stands out as relevant: "The Overseer class gets a new thread that periodically evaluates live nodes and cluster state and fires off SolrCore create commands to add replicas when there are not enough replicas up to meet a collections replicationFactor." If you don't specify replicationFactor, I believe that it defaults to 1 ... which means that you have met your replicationFactor, so additional replicas will not be automatically created. > but in this case I bumped into another problem: after I have imported the > data, all replicas were active. If I shut down one node, let's say host2, > when I restarted it keeps staying in state recovering. I took a look on the > log files and could see no error. I know for sure that I haven't added new > data after I shut down the node host2 hence the replicas should still be in > sync. I don't understand why it doesn't recover though ... I have waited even > a day but with no change. > What am I missing? Could someone explain me how works the replication > mechanism? Recovery/replication should be fully automatic, if all the communication works right. Are there any error messages in your Solr log on any of the three nodes? Thanks, Shawn