On 10/15/2019 2:49 AM, Vassil Velichkov (Sensika) wrote:
I've reduced the JVM heap on one of the shards to 20GB and then simulated some
heavy load to reproduce the issue in a faster way.
The solr.log ROOT was set to TRACE level, but I can't really see anything meaningful, the
solr.log ends @ 07:31:40.352 GMT, while the GC log shows later entries and "Pause
Full (Allocation Failure)".
BTW, I've never seen in the GC logs any automatic attempts for Full GC. I can't
see any OOME messages in any of the logs, only in the separate solr_oom_killer
log, but this is the log of the killer script.
Also, to answer your previous questions:
1. We run completely stock Solr, not custom code, no plugins.
Regardless, we never had such OOMs with Solr 4.x or Solr 6.x
2. It seems that Full GC is never triggered. In some cases in the past
I've seen log entries for Full GC attempts, but the JVM crashes with OOM long
before the Full GC could do anything.
The goal for good GC tuning is to avoid full GCs ever being needed. It
cannot be prevented entirely, especially when humongous allocations are
involved ... but a well-tuned GC should not do them very often.
You have only included snippets from your logs. We would need full logs
for any of that information to be useful. Attachments to the list
rarely work, so you will need to use some kind of file sharing site. I
find dropbox to be useful for this, but if you prefer something else
that works well, feel free to use it.
If the OutOfMemoryError exceptions is logged, it will be in solr.log.
It is not always logged. I will ask the Java folks if there is a way we
can have the killer script provide the reason for the OOME.
It should be unnecessary to increase Solr's log level beyond INFO, but
DEBUG might provide some useful info. TRACE will be insanely large and
I would not recommend it.
Thanks,
Shawn