Istvan Farkas created SOLR-14373:
------------------------------------
Summary: HDFS block cache allows overallocation
Key: SOLR-14373
URL: https://issues.apache.org/jira/browse/SOLR-14373
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: hdfs
Affects Versions: 4.10
Reporter: Istvan Farkas
For the HDFS block cache, when we allocate more slabs the direct memory
available, the error message seems to be hidden.
In such cases The HdfsDirectoryFactory throws an OutOfMemoryError, which seems
to be caught in the HdfsDirectoryFactory itself and thrown as a
RuntimeException:
{code}
try {
blockCache = new BlockCache(metrics, directAllocation, totalMemory,
slabSize, blockSize);
} catch (OutOfMemoryError e) {
throw new RuntimeException(
"The max direct memory is likely too low. Either increase it (by
adding -XX:MaxDirectMemorySize=<size>g -XX:+UseLargePages to your containers
startup args)"
+ " or disable direct allocation using
solr.hdfs.blockcache.direct.memory.allocation=false in solrconfig.xml. If you
are putting the block cache on the heap,"
+ " your java heap size might not be large enough."
+ " Failed allocating ~" + totalMemory / 1000000.0 + " MB.",
e);
}
{code}
Which will manifest as a NullPointerException during core load.
{code}
2020-02-24 06:50:23,492 ERROR (coreLoadExecutor-5-thread-8)-c:
collection1-s:shard2-r:core_node2-x:
collection1_shard2_replica1-o.a.s.c.SolrCore: Error while closing
java.lang.NullPointerException
at org.apache.solr.core.SolrCore.close(SolrCore.java:1352)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:967)
{code}
When directAllocation is true, the directoryFactory has an approximation of the
memory to be allocated.
{code}
2020-02-24 06:49:53,153 INFO
(coreLoadExecutor-5-thread-8)-c:collection1-s:shard2-r:core_node2-x:collection1_shard2_replica1-o.a.s.c.HdfsDirectoryFactory:
Number of slabs of block cache [16384] with direct memory allocation set to
[true]
2020-02-24 06:49:53,153 INFO
(coreLoadExecutor-5-thread-8)-c:collection1-s:shard2-r:core_node2-x:collection1_shard2_replica1-o.a.s.c.HdfsDirectoryFactory:
Block cache target memory usage, slab size of [134217728] will allocate
[16384] slabs and use ~[2199023255552] bytes
{code}
This is detected on Solr 4.10 but it seems that it also affects current
versions, I will double check.
Plan to resolve:
- correct logging and throwable instance checking so it does not manifest in a
nullpointerexception during core load
- add a detection which checks if the memory to be allocated is higher than the
available direct memory. If yes, fall back to a smaller slab count and log a
warning message.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]