Yes, Emir. If I repeat the query, it will spread to other nodes but that's not the case. This is my test env and i am deliberately executing the query with very high offset and wildcard to cause OOM but executing only one time.
So it shouldn't spread to other replica sets and at the end of my test, the first 6 shard/replica set's which gets hit should go down while other 6 should survive but that's not what I see at the end. Setup : 400+ million docs, JVM is 12GB. Yes, only one collection. Total 12 machines with 6 shards and 6 replica's (replicationFactor = 2) On Mon, Dec 18, 2017 at 9:22 AM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Susheel, > The fact that only node that received query OOM tells that it is about > merging results from all shards and providing final result. It is expected > that repeating the same query on some other node will result in a similar > behaviour - it just mean that Solr does not have enough memory to execute > this heavy query. > Can you share more details on your test: size of collection, type of > query, expected number of results, JVM settings, is that the only > collection on cluster etc. > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 18 Dec 2017, at 15:07, Susheel Kumar <susheel2...@gmail.com> wrote: > > > > Hello, > > > > I was testing Solr to see if a query which would cause OOM and would > limit > > the OOM issue to only the replica set's which gets hit first. > > > > But the behavior I see that after all set of first replica's went down > due > > to OOM (gone on cloud view) other replica's starts also getting down. > Total > > 6 shards I have with each shard having 2 replica's and on separate > machines > > > > The expected behavior is that all shards replica which gets hit first > > should go down due to OOM and then other replica's should survive and > > provide High Availability. > > > > The setup I am testing with is Solr 6.0 and wondering if this is would > > remain same with 6.6 or there has been some known improvements made to > > avoid spreading OOM to second/third set of replica's and causing whole > > cluster to down. > > > > Any info on this is appreciated. > > > > Thanks, > > Susheel > >