Hi, We are using solrcloud 7.6.0 and we have containerized solr. We have around 30 collections and 7 solr nodes in the cluster. Though we have containerized , we have one zookeeper container and one solr container running in a host. We have 24GB heap and total container has memory 49GB , which leaves off- heap as 25GB. We have set
max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited max memory size (kbytes, -m) unlimited OOM for solr occur in every 5 days. When we examined heapdumps , the heap is only around 700MB , but we have off-heap memory as 29GB. Major consumer is java.nio.DirectByteBufferR Major Reference chains 8,820,117Kb (1462.3%): *java.nio.DirectByteBufferR*: 64 objects ↖*sun.misc.Cleaner**.referent* ↖*sun.misc.Cleaner**.{prev}* ↖*java.nio.DirectByteBuffer**.cleaner* ↖*java.nio.ByteBuffer[]* ↖*sun.nio.ch.Util$BufferCache**.buffers* ↖*j.l.ThreadLocal$ThreadLocalMap$Entry**.value* ↖*j.l.ThreadLocal$ThreadLocalMap$Entry[]* ↖*j.l.ThreadLocal$ThreadLocalMap**.table* ↖*j.l.Thread**.threadLocals* ↖*j.l.Thread[]* ↖*j.l.ThreadGroup**.threads* ↖*j.l.ThreadGroup[]* ↖*j.l.ThreadGroup**.groups* ↖Java Static *sun.rmi.runtime.NewThreadAction**.systemThreadGroup* 3,534,863Kb (586.0%): *java.nio.DirectByteBufferR*: 22 objects ↖*sun.misc.Cleaner**.referent* ↖*sun.misc.Cleaner**.{next}* ↖*sun.nio.fs.NativeBuffer**.cleaner* ↖*sun.nio.fs.NativeBuffer[]* ↖*j.l.ThreadLocal$ThreadLocalMap$Entry**.value* ↖*j.l.ThreadLocal$ThreadLocalMap$Entry[]* ↖*j.l.ThreadLocal$ThreadLocalMap**.table* ↖*j.l.Thread**.threadLocals* ↖*j.l.Thread[]* ↖*j.l.ThreadGroup**.threads* ↖*j.l.ThreadGroup[]* ↖*j.l.ThreadGroup**.groups* ↖Java Static *sun.rmi.runtime.NewThreadAction**.systemThreadGroup* 3,145,728Kb (521.5%): *java.nio.DirectByteBufferR*: 3 objects ↖*java.nio.ByteBuffer[]* ↖*org.apache.lucene.store.ByteBufferIndexInput$MultiBufferImpl**.buffers* ↖*org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader* *.fieldsStream* ↖*org.apache.lucene.index.SegmentCoreReaders**.fieldsReaderOrig* ↖*org.apache.lucene.index.SegmentReader**.core* ↖*org.apache.lucene.index.SegmentReader[]* ↖*org.apache.lucene.index.StandardDirectoryReader**.subReaders* ↖*org.apache.solr.search.SolrIndexSearcher**.rawReader* ↖*{j.u.concurrent.ConcurrentHashMap}**.values* ↖*org.apache.solr.core.SolrCore**.infoRegistry* ↖*{j.u.LinkedHashMap}**.values* ↖*org.apache.solr.core.SolrCores**.cores* ↖*org.apache.solr.core.CoreContainer**.solrCores* ↖*org.apache.solr.cloud.RecoveringCoreTermWatcher**.coreContainer* ↖*{j.u.HashSet}* ↖*org.apache.solr.cloud.ZkShardTerms**.listeners* ↖*{j.u.concurrent.ConcurrentHashMap}**.keys* ↖Java Static *org.apache.solr.common.util.ObjectReleaseTracker**.OBJECTS* 2,605,258Kb (431.9%): *java.nio.DirectByteBufferR*: 184 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* ↖*org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader* *.fieldsStream* ↖*org.apache.lucene.index.SegmentCoreReaders**.fieldsReaderOrig* ↖*org.apache.lucene.index.SegmentReader**.core* ↖*org.apache.lucene.index.SegmentReader[]* ↖*org.apache.lucene.index.StandardDirectoryReader**.subReaders* ↖*org.apache.solr.search.SolrIndexSearcher**.rawReader* ↖*{j.u.concurrent.ConcurrentHashMap}**.values* ↖*org.apache.solr.core.SolrCore**.infoRegistry* ↖*{j.u.LinkedHashMap}**.values* ↖*org.apache.solr.core.SolrCores**.cores* ↖*org.apache.solr.core.CoreContainer**.solrCores* ↖*org.apache.solr.cloud.RecoveringCoreTermWatcher**.coreContainer* ↖*{j.u.HashSet}* ↖*org.apache.solr.cloud.ZkShardTerms**.listeners* ↖*{j.u.concurrent.ConcurrentHashMap}**.keys* ↖Java Static *org.apache.solr.common.util.ObjectReleaseTracker**.OBJECTS* 1,790,441Kb (296.8%): *java.nio.DirectByteBufferR*: 70 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* ↖*org.apache.lucene.codecs.lucene50.Lucene50CompoundReader**.handle* ↖*org.apache.lucene.index.SegmentCoreReaders**.cfsReader* ↖*org.apache.lucene.index.SegmentReader**.core* ↖*org.apache.lucene.index.SegmentReader[]* ↖*org.apache.lucene.index.StandardDirectoryReader**.subReaders* ↖*org.apache.solr.search.SolrIndexSearcher**.rawReader* ↖*{j.u.concurrent.ConcurrentHashMap}**.values* ↖*org.apache.solr.core.SolrCore**.infoRegistry* ↖*{j.u.LinkedHashMap}**.values* ↖*org.apache.solr.core.SolrCores**.cores* ↖*org.apache.solr.core.CoreContainer**.solrCores* ↖*org.apache.solr.cloud.RecoveringCoreTermWatcher**.coreContainer* ↖*{j.u.HashSet}* ↖*org.apache.solr.cloud.ZkShardTerms**.listeners* ↖*{j.u.concurrent.ConcurrentHashMap}**.keys* ↖Java Static *org.apache.solr.common.util.ObjectReleaseTracker**.OBJECTS* 1,385,471Kb (229.7%): *java.nio.DirectByteBufferR*: 85 objects ↖*sun.misc.Cleaner**.referent* ↖*sun.misc.Cleaner**.{next}* ↖*java.nio.DirectByteBuffer**.cleaner* ↖*java.nio.ByteBuffer[]* ↖*sun.nio.ch.Util$BufferCache**.buffers* ↖*j.l.ThreadLocal$ThreadLocalMap$Entry**.value* ↖*j.l.ThreadLocal$ThreadLocalMap$Entry[]* ↖*j.l.ThreadLocal$ThreadLocalMap**.table* ↖*j.l.Thread**.threadLocals* ↖*j.l.Thread[]* ↖*j.l.ThreadGroup**.threads* ↖*j.l.ThreadGroup[]* ↖*j.l.ThreadGroup**.groups* ↖Java Static *sun.rmi.runtime.NewThreadAction**.systemThreadGroup* 1,358,286Kb (225.2%): *java.nio.DirectByteBufferR*: 3 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$MultiBufferImpl**.curBuf* 1,184,137Kb (196.3%): *java.nio.DirectByteBufferR*: 95 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 773,799Kb (128.3%): *java.nio.DirectByteBufferR*: 92 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 744,089Kb (123.4%): *java.nio.DirectByteBufferR*: 11 objects ↖*sun.misc.Cleaner**.referent* 659,739Kb (109.4%): *java.nio.DirectByteBufferR*: 66 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 588,605Kb (97.6%): *java.nio.DirectByteBufferR*: 9 objects ↖*sun.misc.Cleaner**.referent* 485,104Kb (80.4%): *java.nio.DirectByteBufferR*: 59 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 395,376Kb (65.5%): *java.nio.DirectByteBufferR*: 46 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 317,410Kb (52.6%): *java.nio.DirectByteBufferR*: 60 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 314,946Kb (52.2%): *java.nio.DirectByteBufferR*: 56 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 211,577Kb (35.1%): *java.nio.DirectByteBufferR*: 44 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 195,447Kb (32.4%): *java.nio.DirectByteBufferR*: 57 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 121,962Kb (20.2%): *java.nio.DirectByteBufferR*: 4 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 100,200Kb (16.6%): *java.nio.DirectByteBufferR*: 185 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 51,464Kb (8.5%): *java.nio.DirectByteBufferR*: 64 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 49,435Kb (8.2%): *java.nio.DirectByteBufferR*: 7 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 32,748Kb (5.4%): *java.nio.DirectByteBufferR*: 3 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 30,610Kb (5.1%): *java.nio.DirectByteBufferR*: 4 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 27,766Kb (4.6%): *java.nio.DirectByteBufferR*: 46 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 26,620Kb (4.4%): *java.nio.DirectByteBufferR*: 4 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 25,570Kb (4.2%): *java.nio.DirectByteBufferR*: 3 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 23,507Kb (3.9%): *java.nio.DirectByteBufferR*: 4 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 23,095Kb (3.8%): *java.nio.DirectByteBufferR*: 3 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 3,396Kb (0.6%): *java.nio.DirectByteBufferR*: 5 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 1,745Kb (0.3%): *java.nio.DirectByteBufferR*: 3 objects ↖*sun.misc.Cleaner**.referent* 1,457Kb (0.2%): *java.nio.DirectByteBuffer*: 55 objects ↖*java.util.concurrent.atomic.AtomicReference**.value* 1,450Kb (0.2%): *java.nio.DirectByteBufferR*: 7 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 1,309Kb (0.2%): *java.nio.DirectByteBufferR*: 2 objects ↖*org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl**.curBuf* 715Kb (0.1%): *java.nio.DirectByteBufferR*: 11 objects (78% of all objects referenced here) 610Kb (0.1%), *java.nio.DirectByteBuffer*: 2 objects (14% of all objects referenced here) 104Kb (< 0.1%) ↖*sun.misc.Cleaner**.referent* We are doing hardcommit with opensearcher false every 5 secs and soft commit every 2 mins. Has anyone encountered off-heap OOM. We are thinking of reducing heap further and increasing the hardcommit interval . Any other suggestions? . Please share your thoughts. Thanks, Raji