On 2/6/07, Graham Stead <[EMAIL PROTECTED]> wrote:
Hi everyone,
My Solr JVM runs out of heap space quite frequently. I'm trying to
understand Solr/Lucene's memory usage so I can address the problem
correctly. Otherwise, I feel I'm taking random shots in the dark.
<>
4) In solrconfig.xml, I set filterCache, queryResultCache, and documentCache
to 0.
With this change, your memory consumption should be almost entirely on
the Lucene end of things. The types of queries, the
nature/distribution of your fields, etc. I'd recommend not lowering
the size of the documentCache below the size required to collect docs
for a single query, for performance reasons (especially since you are
highlighting).
Now for my index details:
- To facilitate highlighting, I currently store doc contents in the index,
so the index consumes 24GB on disk.
- numDocs : 4,953,736
maxDoc : 4,953,736 (just optimized)
- Term files:
logs # du -ksh ../solr/data/index/*.t??
5.9M ../solr/data/index/_1kjb.tii
429M ../solr/data/index/_1kjb.tis
- I have 22 fields and yes, they currently have norms.
FWIW, I have several indices of approximately that size, also with the
contents stored for highlighting (using compressThreshold=200). I
have 50-odd fields with norms. Memory consumption is rather small
(~1G, though the heap size is larger).
My machine has Gentoo Linux and 4gb RAM. 'top' indicates the JVM reaches
2.9g RAM (3472m virtual memory) after 10-20 searches and ~20 mins of use. It
seems just a matter of time before more searches or a snapinstaller 'commit'
will make it run out of heap space again.
I have flexibility in the changes we can make. I.e., I can omit norms for
most fields, or I can stop storing the doc contents in the index. But before
embarking on a new strategy, I need some assurance that the strategy will
work (crazy, I know). For example, it doesn't seem that removing norms would
save a great deal (I calculate saving 1 byte per norm per field on 21 fields
is ~99MB).
So...how do I deduce what's taking up so much memory? Any suggestions would
be very helpful to me (and hopefully to others, too).
I think it is in your queries. Are you sorting on many fields? What
is a typical query? I'm not a lucene expert, but there are lucene
experts on this list.
-Mike