Istvan Farkas created SOLR-14373:
------------------------------------

             Summary: HDFS block cache allows overallocation
                 Key: SOLR-14373
                 URL: https://issues.apache.org/jira/browse/SOLR-14373
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: hdfs
    Affects Versions: 4.10
            Reporter: Istvan Farkas


For the HDFS block cache, when we allocate more slabs the direct memory 
available, the error message seems to be hidden.

In such cases The HdfsDirectoryFactory throws an OutOfMemoryError, which seems 
to be caught in the HdfsDirectoryFactory itself and thrown as a 
RuntimeException: 

{code}
 try {
      blockCache = new BlockCache(metrics, directAllocation, totalMemory, 
slabSize, blockSize);
    } catch (OutOfMemoryError e) {
      throw new RuntimeException(
          "The max direct memory is likely too low.  Either increase it (by 
adding -XX:MaxDirectMemorySize=<size>g -XX:+UseLargePages to your containers 
startup args)"
              + " or disable direct allocation using 
solr.hdfs.blockcache.direct.memory.allocation=false in solrconfig.xml. If you 
are putting the block cache on the heap,"
              + " your java heap size might not be large enough."
              + " Failed allocating ~" + totalMemory / 1000000.0 + " MB.",
          e);
    }
{code}

Which will manifest as a NullPointerException during core load.

{code}
2020-02-24 06:50:23,492 ERROR (coreLoadExecutor-5-thread-8)-c: 
collection1-s:shard2-r:core_node2-x: 
collection1_shard2_replica1-o.a.s.c.SolrCore: Error while closing
java.lang.NullPointerException
        at org.apache.solr.core.SolrCore.close(SolrCore.java:1352)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:967)
{code}

When directAllocation is true, the directoryFactory has an approximation of the 
memory to be allocated.

{code}
2020-02-24 06:49:53,153 INFO 
(coreLoadExecutor-5-thread-8)-c:collection1-s:shard2-r:core_node2-x:collection1_shard2_replica1-o.a.s.c.HdfsDirectoryFactory:
 Number of slabs of block cache [16384] with direct memory allocation set to 
[true]
2020-02-24 06:49:53,153 INFO 
(coreLoadExecutor-5-thread-8)-c:collection1-s:shard2-r:core_node2-x:collection1_shard2_replica1-o.a.s.c.HdfsDirectoryFactory:
 Block cache target memory usage, slab size of [134217728] will allocate 
[16384] slabs and use ~[2199023255552] bytes
{code}

This is detected on Solr 4.10 but it seems that it also affects current 
versions, I will double check.

Plan to resolve:
- correct logging and throwable instance checking so it does not manifest in a 
nullpointerexception during core load
- add a detection which checks if the memory to be allocated is higher than the 
available direct memory. If yes, fall back to a smaller slab count and log a 
warning message.







--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to