On 10-Oct-07, at 12:19 PM, David Whalen wrote:

It looks now like I can't use facets the way I was hoping
to because the memory requirements are impractical.

I can't remember if this has been mentioned, but upping the HashDocSet size is one way to reduce memory consumption. Whether this will work well depends greatly on the cardinality of your facet sets. facet.enum.cache.minDf set high is another option (will not generate a bitset for any value whose facet set is less that this value).

Both options have performance implications.

So, as an alternative I was thinking I could get counts
by doing rows=0 and using filter queries.

Is there a reason to think that this might perform better?
Or, am I simply moving the problem to another step in the
process?

Running one query per unique facet value seems impractical, if that is what you are suggesting. Setting minDf to a very high value should always outperform such an approach.

-Mike

DW



-----Original Message-----
From: Stu Hood [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 09, 2007 10:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Facets and running out of Heap Space

Using the filter cache method on the things like media type and
location; this will occupy ~2.3MB of memory _per unique value_

Mike, how did you calculate that value? I'm trying to tune my
caches, and any equations that could be used to determine
some balanced settings would be extremely helpful. I'm in a
memory limited environment, so I can't afford to throw a ton
of cache at the problem.

(I don't want to thread-jack, but I'm also wondering whether
anyone has any notes on how to tune cache sizes for the
filterCache, queryResultCache and documentCache).

Thanks,
Stu


-----Original Message-----
From: Mike Klaas <[EMAIL PROTECTED]>
Sent: Tuesday, October 9, 2007 9:30pm
To: solr-user@lucene.apache.org
Subject: Re: Facets and running out of Heap Space

On 9-Oct-07, at 12:36 PM, David Whalen wrote:

(snip)
I'm sure we could stop storing many of these columns,
especially  if
someone told me that would make a big difference.

I don't think that it would make a difference in memory
consumption, but storage is certainly not necessary for
faceting.  Extra stored fields can slow down search if they
are large (in terms of bytes), but don't really occupy extra
memory, unless they are polluting the doc cache.  Does 'text'
need to be stored?

what does the LukeReqeust Handler tell you about the # of distinct
terms in each field that you facet on?

Where would I find that?  I could probably estimate that
myself on a
per-column basis.  it ranges from 4 distinct values for
media_type to
30-ish for location to 200-ish for country_code to almost
10,000 for
site_id to almost 100,000 for journalist_id.

Using the filter cache method on the things like media type
and location; this will occupy ~2.3MB of memory _per unique
value_, so it should be a net win for those (although quite
close in space requirements for a 30-ary field on your index size).

-Mike



Reply via email to