Istvan Farkas created SOLR-14373: ------------------------------------ Summary: HDFS block cache allows overallocation Key: SOLR-14373 URL: https://issues.apache.org/jira/browse/SOLR-14373 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: hdfs Affects Versions: 4.10 Reporter: Istvan Farkas
For the HDFS block cache, when we allocate more slabs the direct memory available, the error message seems to be hidden. In such cases The HdfsDirectoryFactory throws an OutOfMemoryError, which seems to be caught in the HdfsDirectoryFactory itself and thrown as a RuntimeException: {code} try { blockCache = new BlockCache(metrics, directAllocation, totalMemory, slabSize, blockSize); } catch (OutOfMemoryError e) { throw new RuntimeException( "The max direct memory is likely too low. Either increase it (by adding -XX:MaxDirectMemorySize=<size>g -XX:+UseLargePages to your containers startup args)" + " or disable direct allocation using solr.hdfs.blockcache.direct.memory.allocation=false in solrconfig.xml. If you are putting the block cache on the heap," + " your java heap size might not be large enough." + " Failed allocating ~" + totalMemory / 1000000.0 + " MB.", e); } {code} Which will manifest as a NullPointerException during core load. {code} 2020-02-24 06:50:23,492 ERROR (coreLoadExecutor-5-thread-8)-c: collection1-s:shard2-r:core_node2-x: collection1_shard2_replica1-o.a.s.c.SolrCore: Error while closing java.lang.NullPointerException at org.apache.solr.core.SolrCore.close(SolrCore.java:1352) at org.apache.solr.core.SolrCore.<init>(SolrCore.java:967) {code} When directAllocation is true, the directoryFactory has an approximation of the memory to be allocated. {code} 2020-02-24 06:49:53,153 INFO (coreLoadExecutor-5-thread-8)-c:collection1-s:shard2-r:core_node2-x:collection1_shard2_replica1-o.a.s.c.HdfsDirectoryFactory: Number of slabs of block cache [16384] with direct memory allocation set to [true] 2020-02-24 06:49:53,153 INFO (coreLoadExecutor-5-thread-8)-c:collection1-s:shard2-r:core_node2-x:collection1_shard2_replica1-o.a.s.c.HdfsDirectoryFactory: Block cache target memory usage, slab size of [134217728] will allocate [16384] slabs and use ~[2199023255552] bytes {code} This is detected on Solr 4.10 but it seems that it also affects current versions, I will double check. Plan to resolve: - correct logging and throwable instance checking so it does not manifest in a nullpointerexception during core load - add a detection which checks if the memory to be allocated is higher than the available direct memory. If yes, fall back to a smaller slab count and log a warning message. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org