-----Original Message-----
From: Stu Hood [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 09, 2007 10:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Facets and running out of Heap Space
Using the filter cache method on the things like media type and
location; this will occupy ~2.3MB of memory _per unique value_
Mike, how did you calculate that value? I'm trying to tune my
caches, and any equations that could be used to determine
some balanced settings would be extremely helpful. I'm in a
memory limited environment, so I can't afford to throw a ton
of cache at the problem.
(I don't want to thread-jack, but I'm also wondering whether
anyone has any notes on how to tune cache sizes for the
filterCache, queryResultCache and documentCache).
Thanks,
Stu
-----Original Message-----
From: Mike Klaas <[EMAIL PROTECTED]>
Sent: Tuesday, October 9, 2007 9:30pm
To: solr-user@lucene.apache.org
Subject: Re: Facets and running out of Heap Space
On 9-Oct-07, at 12:36 PM, David Whalen wrote:
(snip)
I'm sure we could stop storing many of these columns,
especially if
someone told me that would make a big difference.
I don't think that it would make a difference in memory
consumption, but storage is certainly not necessary for
faceting. Extra stored fields can slow down search if they
are large (in terms of bytes), but don't really occupy extra
memory, unless they are polluting the doc cache. Does 'text'
need to be stored?
what does the LukeReqeust Handler tell you about the # of distinct
terms in each field that you facet on?
Where would I find that? I could probably estimate that
myself on a
per-column basis. it ranges from 4 distinct values for
media_type to
30-ish for location to 200-ish for country_code to almost
10,000 for
site_id to almost 100,000 for journalist_id.
Using the filter cache method on the things like media type
and location; this will occupy ~2.3MB of memory _per unique
value_, so it should be a net win for those (although quite
close in space requirements for a 30-ary field on your index size).
-Mike