Hi, Solr4 is 4.0 or 4.1? If the former try the latter first?
Otis Solr & ElasticSearch Support http://sematext.com/ On Jan 23, 2013 2:51 PM, "John Skopis (lists)" <jli...@skopis.com> wrote: > Hello, > > We have recently put solr4 into production. > > We have a 3 node cluster with a single shard. Each solr node is also a > zookeeper node, but zookeeper is running in cluster mode. We are using the > cloudera zookeeper package. > > There is no communication problems between nodes. They are in two > different racks directly connected over a 2Gb uplink. The nodes each have a > 1Gb uplink. > > I was thinking ideally mmsolr01 would be the leader, the application sends > all index requests directly to the leader node. A load balancer splits read > requests over the remaining two nodes. > > We autocommit every 300s or 10k documents with a softcommit every 5s. The > index is roughly 200mm documents. > > I have configured a cron to run every hour (on every node): > 0 * * * * /usr/bin/curl -s ' > http://localhost:8983/solr/collection1/replication?command=backup&numberToKeep=3' > > /dev/null > > Using a snapshot seems to be the easiest way to reproduce, but it's also > possible to reproduce under very heavy indexing load. > > When the snapshot is running, occasionally we get a zk timeout, causing > the leader to drop out of the cluster. We have also seen a few zk timeouts > when index load is very high. > > After the failure it can take the now inconsistent node a few hours to > recover. After numerous failed recovery attempts the failed node seems to > sync up. > > I have attached a log file demonstrating this. > > We see lots of timeout requests, seemingly when the failed node tries to > sync up with the current leader by doing a full sync. This seems wrong, > there should be no reason for a timeout to happen here? > > I am able to manually copy the index using tar + netcat in a few minutes. > The replication handler takes > > INFO: Total time taken for download : 3549 secs > > Why does it take so long to recover? > > Are we better off manually replicating the index? > > Much appreciated, > Thanks, > John > > > > > > > >