Solr version: 4.7 Architecture: 2 solrs (1 shard, leader + replica) 3 zookeepers
Servers: * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores * zookeeper Solr data: * 21 collections * Many fields, small docs, docs count per collection from 1k to 500k About a week ago solr started crashing. It crashes every day, 3-4 times a day. Usually at nigh. I can't tell anything what could it be related to because at that time we haven't done any configuration changes. Load haven't changed too. Everything starts with Stopping recovery for .. warnings (every warnings is repeated several times): WARN org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for zkNodeName=core_node1core=****************** WARN org.apache.solr.cloud.ElectionContext; cancelElection did not find election node to remove WARN org.apache.solr.update.PeerSync; no frame of reference to tell if we've missed updates WARN - 2014-03-23 04:00:26.286; org.apache.solr.update.PeerSync; no frame of reference to tell if we've missed updates WARN - 2014-03-23 04:00:30.728; org.apache.solr.handler.SnapPuller; File _f9m_Lucene41_0.doc expected to be 6218278 while it is 7759879 WARN - 2014-03-23 04:00:54.126; org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay tlog{file=/path/solr/collection1_shard1_replica2/data/tlog/tlog.0000000000000003272 refcount=2} active=true starting pos=356216606 Then again Stopping recovery for .. warnings: WARN org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for zkNodeName=core_node1core=****************** ERROR - 2014-03-23 05:19:29.566; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found after waiting for 4000ms , collection: collection1 slice: shard1 ERROR - 2014-03-23 05:20:03.961; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: I was asked to wait on state down for IP:PORT_solr but I still do not see the requested state. I see state: active live:false After this serves mostly didn't recover.