SolrCloud unstable

Martin de Vries Tue, 12 Nov 2013 00:47:04 -0800

Hi,

We have:


Solr 4.5.1 - 5 servers

36 cores, 2 shards each, 2 servers per shard (every core is on 4servers)

about 4.5 GB total data on disk per server
4GB JVM-Memory per server, 3GB average in use
Zookeeper 3.3.5 - 3 servers (one shared with Solr)
haproxy load balancing

Our Solrcloud is very unstable. About one time a week some cores go inrecovery state or down state. Many timeouts occur and we have to restartservers to get them back to work. The failover doesn't work in manycases, because one server has the core in down state, the other inrecovering state. Other cores work fine. When the cloud is stable Isometimes see log messages like:- shard update error StdNode:http://033.downnotifier.com:8983/solr/dntest_shard2_replica1/:org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at:http://033.downnotifier.com:8983/solr/dntest_shard2_replica1- forwarding update tohttp://033.downnotifier.com:8983/solr/dn_shard2_replica2/ failed -retrying ...

- null:ClientAbortException: java.io.IOException: Broken pipe

Before the the cloud problems start there are many large Qtime's in thelog (sometimes over 50 seconds), but there are no other errors until therecovery problems start.



Any clue about what can be wrong?


Kinds regards,

Martin

SolrCloud unstable

Reply via email to