Re: Facets and running out of Heap Space

Mike Klaas Wed, 10 Oct 2007 15:20:17 -0700

On 10-Oct-07, at 2:40 PM, David Whalen wrote:

Accoriding to Yonik I can't use minDf because I'm faceting
on a string field.  I'm thinking of changing it to a tokenized
type so that I can utilize this setting, but then I'll have to
rebuild my entire index.


Unless there's some way around that?

For the fields that matter (many unique values), this is likelyresult in a performance regression.

It might be better to try storing less unique data. For instance,faceting on the blog_url field, or create_date in your schema wouldcase problems (they probably have millions of unique values).

It would be helpful to know which field is causing the problem. Oneway would be to do a sorted query on a quiescent index for eachfield, and see if there are any suspiciously large jumps in memoryusage.


-Mike

-----Original Message-----
From: Mike Klaas [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 10, 2007 4:56 PM
To: solr-user@lucene.apache.org
Cc: stuhood
Subject: Re: Facets and running out of Heap Space

On 10-Oct-07, at 12:19 PM, David Whalen wrote:

It looks now like I can't use facets the way I was hoping

to because

the memory requirements are impractical.


I can't remember if this has been mentioned, but upping the
HashDocSet size is one way to reduce memory consumption.
Whether this will work well depends greatly on the
cardinality of your facet sets.  facet.enum.cache.minDf set
high is another option (will not generate a bitset for any
value whose facet set is less that this value).

Both options have performance implications.

So, as an alternative I was thinking I could get counts by doing
rows=0 and using filter queries.

Is there a reason to think that this might perform better?
Or, am I simply moving the problem to another step in the process?


Running one query per unique facet value seems impractical,
if that is what you are suggesting.  Setting minDf to a very
high value should always outperform such an approach.

-Mike

DW

-----Original Message-----
From: Stu Hood [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 09, 2007 10:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Facets and running out of Heap Space

Using the filter cache method on the things like media type and
location; this will occupy ~2.3MB of memory _per unique value_


Mike, how did you calculate that value? I'm trying to tune

my caches,

and any equations that could be used to determine some balanced
settings would be extremely helpful. I'm in a memory limited
environment, so I can't afford to throw a ton of cache at the
problem.

(I don't want to thread-jack, but I'm also wondering

whether anyone

has any notes on how to tune cache sizes for the filterCache,
queryResultCache and documentCache).

Thanks,
Stu


-----Original Message-----
From: Mike Klaas <[EMAIL PROTECTED]>
Sent: Tuesday, October 9, 2007 9:30pm
To: solr-user@lucene.apache.org
Subject: Re: Facets and running out of Heap Space

On 9-Oct-07, at 12:36 PM, David Whalen wrote:

(snip)
I'm sure we could stop storing many of these columns,

especially  if

someone told me that would make a big difference.


I don't think that it would make a difference in memory

consumption,

but storage is certainly not necessary for faceting.  Extra stored
fields can slow down search if they are large (in terms of bytes),
but don't really occupy extra memory, unless they are

polluting the

doc cache.  Does 'text'
need to be stored?

what does the LukeReqeust Handler tell you about the #

of distinct

terms in each field that you facet on?


Where would I find that?  I could probably estimate that

myself on a

per-column basis.  it ranges from 4 distinct values for

media_type to

30-ish for location to 200-ish for country_code to almost

10,000 for

site_id to almost 100,000 for journalist_id.


Using the filter cache method on the things like media type and
location; this will occupy ~2.3MB of memory _per unique

value_, so it

should be a net win for those (although quite close in space
requirements for a 30-ary field on your index size).

-Mike

Re: Facets and running out of Heap Space

Reply via email to