On 12/18/2017 7:36 AM, Susheel Kumar wrote:
Yes, Emir.  If I repeat the query, it will spread to other nodes but that's
not the case.  This is my test env and i am deliberately executing the
query with very high offset and wildcard to cause OOM but executing only
one time.

So it shouldn't spread to other replica sets and at the end of my test,
the first 6 shard/replica set's which gets hit should go down while other 6
should survive but that's not what I see at the end.

Setup :  400+ million docs, JVM is 12GB.  Yes, only one collection. Total
12 machines with 6 shards and 6 replica's (replicationFactor = 2)

Do you know what the exact OOME you are encountering is? Is it "java heap space" or something else?

While ordinarily I would expect multiple replicas in SolrCloud to ensure high availability, OutOfMemoryError is a special class of problem.  When you encounter OOME on one server, it is likely that other similarly equipped/configured servers in the cloud will *also* encounter the error, and if that happens, you're not going to have high availability.

To eliminate the problem, you're going to have to figure out which resource is being depleted and either increase that resource or change things so that Solr doesn't need as much of that resource.

Thanks,
Shawn

Reply via email to