Re: OOM spreads to other replica's/HA when OOM

Shawn Heisey Mon, 18 Dec 2017 06:47:39 -0800

On 12/18/2017 7:36 AM, Susheel Kumar wrote:

Yes, Emir.  If I repeat the query, it will spread to other nodes but that's
not the case.  This is my test env and i am deliberately executing the
query with very high offset and wildcard to cause OOM but executing only
one time.


So it shouldn't spread to other replica sets and at the end of my test,
the first 6 shard/replica set's which gets hit should go down while other 6
should survive but that's not what I see at the end.

Setup :  400+ million docs, JVM is 12GB.  Yes, only one collection. Total
12 machines with 6 shards and 6 replica's (replicationFactor = 2)

Do you know what the exact OOME you are encountering is? Is it "javaheap space" or something else?

While ordinarily I would expect multiple replicas in SolrCloud to ensurehigh availability, OutOfMemoryError is a special class of problem. Whenyou encounter OOME on one server, it is likely that other similarlyequipped/configured servers in the cloud will *also* encounter theerror, and if that happens, you're not going to have high availability.

To eliminate the problem, you're going to have to figure out whichresource is being depleted and either increase that resource or changethings so that Solr doesn't need as much of that resource.


Thanks,
Shawn

Re: OOM spreads to other replica's/HA when OOM

Reply via email to