On 10/16/2017 5:38 PM, Randy Fradin wrote:
> Each shard has around 4.2 million documents which are around 40GB on disk.
> Two nodes have 3 shard replicas each and the third has 2 shard replicas.
>
> The text of the exception is: java.lang.OutOfMemoryError: Java heap space
> And the heap dump is a full 24GB indicating the full heap space was being
> used.
>
> Here is the solrconfig as output by the config request handler:

I was hoping for the actual XML, but I don't see any red flags in the
output you've provided.  It does look like you've probably got a very
minimal configuration.  Some things that I expected to see (and do see
on my own systems) aren't in the handler output at all.

With only 12 million docs on the machine, I would not expect any need
for 24GB of heap except in the case of a large number of particularly
RAM-hungry complex queries.  The ratio of index size to document count
says that the documents are bigger than what I think of as typical, but
not what I would call enormous.  If there's any way you can adjust your
schema to remove unused parts and reduce the index size, that would be a
good idea, but I don't consider that to be an immediate action item. 
Your index size is well within what Solr should be able to handle easily
-- if there are sufficient system resources, memory in particular.

The 6.5.1 version of Solr that you're running should have most known
memory leak issues fixed -- and there are not many of those.  I'm not
aware of any leak problems that would affect Lucene's DocumentsWriter
class, where you said most of the heap was being consumed.  That doesn't
necessarily mean there isn't a leak bug that applies, just that I am not
aware of any.

You have indicated that you're doing a very large number of concurrent
update requests, up to 240 at the same time.  I cannot imagine a
situation where Lucene would require a buffer (100 MB in your config)
for every indexing thread.  That would really cause some major memory
issues with Lucene and Solr installations.

Your description of what you have in your heap sounds a little bit
different than a buffer per indexing thread.  It sounds like your
indexing has resulted in a LOT of flushes, which is probably normal,
except that the flush queue doesn't appear to be getting emptied.  If
I'm right, either your indexing is happening faster than Lucene can
flush the segments that get built, or there is something preventing
Lucene from actually doing the flush.  I do not see any indication in
the code that Lucene ever imposes a limit on the number of queued
flushes, but in a system that's working correctly, it probably doesn't
have to.  My theories here should be validated by somebody who has much
better insight into Lucene than I do.

I'm interested in seeing some details about the system and the processes
running.  What OS is this running on?  If it's something other than
Windows, you probably have the "top" utility installed.  The gnu version
of top has a keyboard shortcut (shift-M) to sort by memory usage.  If
it's available, run top (not htop or any other variant), press the key
to sort by memory, and grab a screenshot.

On recent versions of Windows, there's a program called Resource
Monitor.  If you're on Windows, run that program, click on the memory
tab, sort by Private, make sure that the memory graph and MB counts
below the process list are fully visible, and grab a screenshot.

It is unlikely that you'll be able to send a screenshot image to the
list, so you'll probably need a file sharing website.

Thanks,
Shawn

Reply via email to