On 2/6/07, Graham Stead <[EMAIL PROTECTED]> wrote:
Hi everyone,

My Solr JVM runs out of heap space quite frequently. I'm trying to
understand Solr/Lucene's memory usage so I can address the problem
correctly. Otherwise, I feel I'm taking random shots in the dark.

<>

4) In solrconfig.xml, I set filterCache, queryResultCache, and documentCache
to 0.

With this change, your memory consumption should be almost entirely on
the Lucene end of things.  The types of queries, the
nature/distribution of your fields, etc.  I'd recommend not lowering
the size of the documentCache below the size required to collect docs
for a single query, for performance reasons (especially since you are
highlighting).

Now for my index details:
- To facilitate highlighting, I currently store doc contents in the index,
so the index consumes 24GB on disk.
- numDocs : 4,953,736
  maxDoc : 4,953,736 (just optimized)
- Term files:
   logs # du -ksh ../solr/data/index/*.t??
   5.9M    ../solr/data/index/_1kjb.tii
   429M    ../solr/data/index/_1kjb.tis
- I have 22 fields and yes, they currently have norms.

FWIW, I have several indices of approximately that size, also with the
contents stored for highlighting (using compressThreshold=200).  I
have 50-odd fields with norms.  Memory consumption is rather small
(~1G, though the heap size is larger).

My machine has Gentoo Linux and 4gb RAM. 'top' indicates the JVM reaches
2.9g RAM (3472m virtual memory) after 10-20 searches and ~20 mins of use. It
seems just a matter of time before more searches or a snapinstaller 'commit'
will make it run out of heap space again.

I have flexibility in the changes we can make. I.e., I can omit norms for
most fields, or I can stop storing the doc contents in the index. But before
embarking on a new strategy, I need some assurance that the strategy will
work (crazy, I know). For example, it doesn't seem that removing norms would
save a great deal (I calculate saving 1 byte per norm per field on 21 fields
is ~99MB).

So...how do I deduce what's taking up so much memory? Any suggestions would
be very helpful to me (and hopefully to others, too).

I think it is in your queries.  Are you sorting on many fields?  What
is a typical query?  I'm not a lucene expert, but there are lucene
experts on this list.

-Mike

Reply via email to