Re: Solr 4.3.0 - SolrCloud lost all documents when leaders got rebuilt

Dominique Bejean Wed, 24 Jul 2013 15:53:55 -0700

With 6 zookeeper instances you need at least 4 instances running at the same 
time. How can you decide to stop 4 instances and have only 2 instances running 
? Zookeeper can't work anymore in these conditions.


Dominique 

Le 25 juil. 2013 à 00:16, "Joshi, Shital" <shital.jo...@gs.com> a écrit :

> We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute 
> boxes (cloud), where 5 machines (leaders) are in datacenter1 and replicas on 
> datacenter2.  We have 6 zookeeper instances - 4 on datacenter1 and 2 on 
> datacenter2. The zookeeper instances are on same hosts as Solr nodes. We're 
> using local disk (/local/data) to store solr index files.
> 
> Infrastructure team wanted to rebuild dynamic compute boxes on datacenter1 so 
> we handed over all leader hosts to them. By doing so, We lost 4 zookeeper 
> instances. We were expecting to see all replicas acting as leader. In order 
> to confirm that, I went to admin console -> cloud page but the page never 
> returned (kept hanging).  I checked log and saw constant zookeeper host 
> connection exceptions (the zkHost system property had all 6 zookeeper 
> instances). I restarted cloud on all replicas but got same error again. This 
> exception is I think due to the zookeeper bug: 
> https://issues.apache.org/jira/browse/SOLR-4899 I guess zookeeper never 
> registered the replicas as leader.
> 
> After dynamic compute machines were re-built (lost all local data) I 
> restarted entire cloud (with 6 zookeeper and 10 nodes), the original leaders 
> were still the leaders (I think zookeeper config never got updated with 
> replicas being leader, though 2 zookeeper instances were still up). Since all 
> leaders' /local/data/solr_data was empty, it got replicated to all replicas 
> and we lost all data in our replica. We lost 26 million documents on replica. 
> This was very awful.
> 
> In our start up script (which brings up solr on all nodes one by one), the 
> leaders are listed first.
> 
> Any solution to this until Solr 4.4 release?
> 
> Many Thanks!
> 
> 
> 
> 
>

Re: Solr 4.3.0 - SolrCloud lost all documents when leaders got rebuilt

Reply via email to