It looks now like I can't use facets the way I was hoping
to because the memory requirements are impractical.

So, as an alternative I was thinking I could get counts
by doing rows=0 and using filter queries.  

Is there a reason to think that this might perform better?
Or, am I simply moving the problem to another step in the
process?

DW

  

> -----Original Message-----
> From: Stu Hood [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, October 09, 2007 10:53 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Facets and running out of Heap Space
> 
> > Using the filter cache method on the things like media type and 
> > location; this will occupy ~2.3MB of memory _per unique value_
> 
> Mike, how did you calculate that value? I'm trying to tune my 
> caches, and any equations that could be used to determine 
> some balanced settings would be extremely helpful. I'm in a 
> memory limited environment, so I can't afford to throw a ton 
> of cache at the problem.
> 
> (I don't want to thread-jack, but I'm also wondering whether 
> anyone has any notes on how to tune cache sizes for the 
> filterCache, queryResultCache and documentCache).
> 
> Thanks,
> Stu
> 
> 
> -----Original Message-----
> From: Mike Klaas <[EMAIL PROTECTED]>
> Sent: Tuesday, October 9, 2007 9:30pm
> To: solr-user@lucene.apache.org
> Subject: Re: Facets and running out of Heap Space
> 
> On 9-Oct-07, at 12:36 PM, David Whalen wrote:
> 
> >(snip)
> > I'm sure we could stop storing many of these columns, 
> especially  if 
> >someone told me that would make a big difference.
> 
> I don't think that it would make a difference in memory 
> consumption, but storage is certainly not necessary for 
> faceting.  Extra stored fields can slow down search if they 
> are large (in terms of bytes), but don't really occupy extra 
> memory, unless they are polluting the doc cache.  Does 'text' 
> need to be stored?
> >
> >> what does the LukeReqeust Handler tell you about the # of distinct 
> >> terms in each field that you facet on?
> >
> > Where would I find that?  I could probably estimate that 
> myself on a 
> > per-column basis.  it ranges from 4 distinct values for 
> media_type to 
> > 30-ish for location to 200-ish for country_code to almost 
> 10,000 for 
> > site_id to almost 100,000 for journalist_id.
> 
> Using the filter cache method on the things like media type 
> and location; this will occupy ~2.3MB of memory _per unique 
> value_, so it should be a net win for those (although quite 
> close in space requirements for a 30-ary field on your index size).
> 
> -Mike
> 
> 

Reply via email to