On 10/15/2019 2:49 AM, Vassil Velichkov (Sensika) wrote:
I've reduced the JVM heap on one of the shards to 20GB and then simulated some 
heavy load to reproduce the issue in a faster way.
The solr.log ROOT was set to TRACE level, but I can't really see anything meaningful, the 
solr.log ends @ 07:31:40.352 GMT, while the GC log shows later entries and "Pause 
Full (Allocation Failure)".
BTW, I've never seen in the GC logs any automatic attempts for Full GC. I can't 
see any OOME messages in any of the logs, only in the separate solr_oom_killer 
log, but this is the log of the killer script.

Also, to answer your previous questions:
        1. We run completely stock Solr, not custom code, no plugins. 
Regardless, we never had such OOMs with Solr 4.x or Solr 6.x
        2. It seems that Full GC is never triggered. In some cases in the past 
I've seen log entries for Full GC attempts, but the JVM crashes with OOM long 
before the Full GC could do anything.

The goal for good GC tuning is to avoid full GCs ever being needed. It cannot be prevented entirely, especially when humongous allocations are involved ... but a well-tuned GC should not do them very often.

You have only included snippets from your logs. We would need full logs for any of that information to be useful. Attachments to the list rarely work, so you will need to use some kind of file sharing site. I find dropbox to be useful for this, but if you prefer something else that works well, feel free to use it.

If the OutOfMemoryError exceptions is logged, it will be in solr.log. It is not always logged. I will ask the Java folks if there is a way we can have the killer script provide the reason for the OOME.

It should be unnecessary to increase Solr's log level beyond INFO, but DEBUG might provide some useful info. TRACE will be insanely large and I would not recommend it.

Thanks,
Shawn

Reply via email to