On 10/16/2017 5:38 PM, Randy Fradin wrote: > Each shard has around 4.2 million documents which are around 40GB on disk. > Two nodes have 3 shard replicas each and the third has 2 shard replicas. > > The text of the exception is: java.lang.OutOfMemoryError: Java heap space > And the heap dump is a full 24GB indicating the full heap space was being > used. > > Here is the solrconfig as output by the config request handler:
I was hoping for the actual XML, but I don't see any red flags in the output you've provided. It does look like you've probably got a very minimal configuration. Some things that I expected to see (and do see on my own systems) aren't in the handler output at all. With only 12 million docs on the machine, I would not expect any need for 24GB of heap except in the case of a large number of particularly RAM-hungry complex queries. The ratio of index size to document count says that the documents are bigger than what I think of as typical, but not what I would call enormous. If there's any way you can adjust your schema to remove unused parts and reduce the index size, that would be a good idea, but I don't consider that to be an immediate action item. Your index size is well within what Solr should be able to handle easily -- if there are sufficient system resources, memory in particular. The 6.5.1 version of Solr that you're running should have most known memory leak issues fixed -- and there are not many of those. I'm not aware of any leak problems that would affect Lucene's DocumentsWriter class, where you said most of the heap was being consumed. That doesn't necessarily mean there isn't a leak bug that applies, just that I am not aware of any. You have indicated that you're doing a very large number of concurrent update requests, up to 240 at the same time. I cannot imagine a situation where Lucene would require a buffer (100 MB in your config) for every indexing thread. That would really cause some major memory issues with Lucene and Solr installations. Your description of what you have in your heap sounds a little bit different than a buffer per indexing thread. It sounds like your indexing has resulted in a LOT of flushes, which is probably normal, except that the flush queue doesn't appear to be getting emptied. If I'm right, either your indexing is happening faster than Lucene can flush the segments that get built, or there is something preventing Lucene from actually doing the flush. I do not see any indication in the code that Lucene ever imposes a limit on the number of queued flushes, but in a system that's working correctly, it probably doesn't have to. My theories here should be validated by somebody who has much better insight into Lucene than I do. I'm interested in seeing some details about the system and the processes running. What OS is this running on? If it's something other than Windows, you probably have the "top" utility installed. The gnu version of top has a keyboard shortcut (shift-M) to sort by memory usage. If it's available, run top (not htop or any other variant), press the key to sort by memory, and grab a screenshot. On recent versions of Windows, there's a program called Resource Monitor. If you're on Windows, run that program, click on the memory tab, sort by Private, make sure that the memory graph and MB counts below the process list are fully visible, and grab a screenshot. It is unlikely that you'll be able to send a screenshot image to the list, so you'll probably need a file sharing website. Thanks, Shawn