Hi, I have a problem with SolrCloud in an specific test case and I wanted to know if it is the way it should work or if is there any way to avoid this...
I have the next scenario: - Three machines - Each one with one zookeeper and one solr 4.1.0 - Each Solr stores 7 Million documents and the index is 2GB The test consist on sending queries to solr (100 concurrent queries continously) and then forcing the leader failure by shutting down both zookeeper and solr. When we shut down any solr that is not the leader there are no problems, the other two respond to the queries without problems. However if we shut down the leader the next steps occur: - Both Solrs continue responding to the queries until the leader election starts - One of them is elected as leader and the other one stops responding queries (I've read it goes to recovery mode until its index is synchronized with the leader's one) - Then, even though both indexes are the same (They were synchronized before the leader failure), the whole index is replicated. - During the time while the 2GB are replicated from leader to the remaining server, the server recovering is not responding to queries, therefore the leader must attend to the whole amount of queries and finally it crashes due to having to many queries to answer (Aside of replicating its index) My question here is... Is it normal that the whole index replicates in a leader change even though the leader and the other solr indexes should be the same? Is there any way to avoid it? Maybe I have some configuration wrong? Should changing Solr to 4.5.X avoid this operative? Aside from this problem everything seems to work fine, but that point of failure is too risky for us Thanks in advance -- Alejandro Marqués Rodríguez Paradigma Tecnológico http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42