Re: Facets and running out of Heap Space

Mike Klaas Wed, 10 Oct 2007 13:56:57 -0700

On 10-Oct-07, at 12:19 PM, David Whalen wrote:

It looks now like I can't use facets the way I was hoping
to because the memory requirements are impractical.

I can't remember if this has been mentioned, but upping theHashDocSet size is one way to reduce memory consumption. Whetherthis will work well depends greatly on the cardinality of your facetsets. facet.enum.cache.minDf set high is another option (will notgenerate a bitset for any value whose facet set is less that thisvalue).


Both options have performance implications.

So, as an alternative I was thinking I could get counts
by doing rows=0 and using filter queries.

Is there a reason to think that this might perform better?
Or, am I simply moving the problem to another step in the
process?

Running one query per unique facet value seems impractical, if thatis what you are suggesting. Setting minDf to a very high valueshould always outperform such an approach.


-Mike

DW

-----Original Message-----
From: Stu Hood [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 09, 2007 10:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Facets and running out of Heap Space

Using the filter cache method on the things like media type and
location; this will occupy ~2.3MB of memory _per unique value_


Mike, how did you calculate that value? I'm trying to tune my
caches, and any equations that could be used to determine
some balanced settings would be extremely helpful. I'm in a
memory limited environment, so I can't afford to throw a ton
of cache at the problem.

(I don't want to thread-jack, but I'm also wondering whether
anyone has any notes on how to tune cache sizes for the
filterCache, queryResultCache and documentCache).

Thanks,
Stu


-----Original Message-----
From: Mike Klaas <[EMAIL PROTECTED]>
Sent: Tuesday, October 9, 2007 9:30pm
To: solr-user@lucene.apache.org
Subject: Re: Facets and running out of Heap Space

On 9-Oct-07, at 12:36 PM, David Whalen wrote:

(snip)
I'm sure we could stop storing many of these columns,

especially  if

someone told me that would make a big difference.


I don't think that it would make a difference in memory
consumption, but storage is certainly not necessary for
faceting.  Extra stored fields can slow down search if they
are large (in terms of bytes), but don't really occupy extra
memory, unless they are polluting the doc cache.  Does 'text'
need to be stored?

what does the LukeReqeust Handler tell you about the # of distinct
terms in each field that you facet on?


Where would I find that?  I could probably estimate that

myself on a

per-column basis.  it ranges from 4 distinct values for

media_type to

30-ish for location to 200-ish for country_code to almost

10,000 for

site_id to almost 100,000 for journalist_id.


Using the filter cache method on the things like media type
and location; this will occupy ~2.3MB of memory _per unique
value_, so it should be a net win for those (although quite
close in space requirements for a 30-ary field on your index size).

-Mike

Re: Facets and running out of Heap Space

Reply via email to