On 1/15/2014 3:10 PM, cwhi wrote:
Thanks for the quick reply.  I did notice the exception you pointed out and
had some thoughts about it maybe being the client library I'm using to
connect to Solr (C# SolrNet) disconnecting too early, but that doesn't
explain it eventually running out of memory altogether.  A large index
shouldn't cause Solr to run out of memory, since it would just go to disk on
queries to process requests instead of holding the entire index in memory.

If you're seeing OutOfMemoryError problems, that has nothing at all to do with total memory on the system or the OS disk cache. The OS disk cache is what holds all or part of the actual on-disk index data in memory. You're right that it would just go to the disk in order to process requests - but disks are *REALLY* slow compared to RAM, so whenever you have to actually hit the disk, performance drops drastically.

OutOfMemoryErrors have to to do with the Java heap. Solr (Lucene, really) doesn't hold the actual index in memory, but there are certain query patterns that do cause a lot of heap memory to be consumed and not released, in the interests of performance. One of those things is sorting, another is facets. I've heard that filters and field collapsing will do much the same thing. If you are doing heavy indexing or issuing frequent index commits, a lot of heap memory can be required by that as well.

I'm also not sure that the index size is the case, because I have another
SolrCloud instance running where I saw this behaviour at ~20 million, rather
than 2 million documents (same type of documents, so much larger on disk).
The machines these are running on are identical Amazon EC2 instances as
well, so that rules out the larger index succeeding for longer due to better
hardware.

When you use memory-hungry features, the amount of heap that's required will typically go up with the number of total documents. The discrepancy here is probably due to how Solr is being used and how everything is configured.

You said that approximately 1.5GB is allocated to Solr. Is this an actual Java heap setting, or are you seeing that number in a graph somewhere? If you look at the JVM-Memory graph on the Solr 4.x dashboard, you'll see three numbers. There is the currently utilized heap memory, the amount of memory that Java has currently allocated from the operating system, and the maximum amount that it CAN allocate. The middle number (*NOT* the first number) is the amount of system memory that the Java instance is using (not including a bunch of megabytes for Java itself), and you can assume that the third number is the amount that will be used in the long term.

I've condensed a bunch of memory and performance related info into this wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn

Reply via email to