Check your Solr transaction log size. It's possible that your killed Solr is replaying transaction logs. Or synching from the current leader (perhaps by replicating the entire shard index).
This is usually in the case when you're getting updates while killing the leader. Here's a writeup on tlogs etc. and how to control this. http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best, Erick On Tue, Nov 12, 2013 at 8:44 AM, Alejandro Marqués Rodríguez < amarq...@paradigmatecnologico.com> wrote: > Hi, > > We've been experiencing some problems during search stress tests and we > don't even have a clue on why is this happening. > > We have the following: > - 3 servers > - Websphere 7 > - Zookeeper 3.4.5 on each server > - Solr 4.5.0 on each server > - 1 shard (so it is one leader and 2 replicas) > - The index contains 7M documents (About 2GB) > > We've run several stress tests with JMeter with 100-500 concurrent threads. > Depending on how many threads, we have different scenarios, but appart from > times or wether the system fully recovers or not, we have the next steps: > > > 1. The solrs begin responding queries, with stable number of threads for > each solr (Less than 10) > 2. Once the test has been running for several minutes we kill one of the > solrs (Most of the times the one being the leader) > 3. The remaining solrs respond to the queries increasing slightly the > number of threads used. > 4. After a few minutes we restart the killed solr again (And here is > where our problem starts) > 5. Once it starts it begins increasing the number of threads used (Up to > 100 or above) and the worst thing is that even the other two solrs start > responding slowly (Or not responding at all). Then, depending on the > number > of concurrent queries, if there are few in more or less 3 minutes > everything goes back to normal (thought almost no queries are attended > during that period) or, if there are more than 200 concurrent queries > the > restarted server increases so much its used threads that it crashes. > > During the minutes that the three solrs are not responding there are no > logs, and after making a thread dump we've seen a lot of stalled threads > with sun.misc.Unsafe.park traces. > > I don't understand this behaviour at all, not only it works better with two > solrs than restarting the third but this restart affects the behaviour of > the two remaining solrs... > > Anybody has any clue about this? > > Thanks in advance > > > > -- > Alejandro Marqués Rodríguez > > Paradigma Tecnológico > http://www.paradigmatecnologico.com > Avenida de Europa, 26. Ática 5. 3ª Planta > 28224 Pozuelo de Alarcón > Tel.: 91 352 59 42 >