It looks now like I can't use facets the way I was hoping to because the memory requirements are impractical.
So, as an alternative I was thinking I could get counts by doing rows=0 and using filter queries. Is there a reason to think that this might perform better? Or, am I simply moving the problem to another step in the process? DW > -----Original Message----- > From: Stu Hood [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 09, 2007 10:53 PM > To: solr-user@lucene.apache.org > Subject: Re: Facets and running out of Heap Space > > > Using the filter cache method on the things like media type and > > location; this will occupy ~2.3MB of memory _per unique value_ > > Mike, how did you calculate that value? I'm trying to tune my > caches, and any equations that could be used to determine > some balanced settings would be extremely helpful. I'm in a > memory limited environment, so I can't afford to throw a ton > of cache at the problem. > > (I don't want to thread-jack, but I'm also wondering whether > anyone has any notes on how to tune cache sizes for the > filterCache, queryResultCache and documentCache). > > Thanks, > Stu > > > -----Original Message----- > From: Mike Klaas <[EMAIL PROTECTED]> > Sent: Tuesday, October 9, 2007 9:30pm > To: solr-user@lucene.apache.org > Subject: Re: Facets and running out of Heap Space > > On 9-Oct-07, at 12:36 PM, David Whalen wrote: > > >(snip) > > I'm sure we could stop storing many of these columns, > especially if > >someone told me that would make a big difference. > > I don't think that it would make a difference in memory > consumption, but storage is certainly not necessary for > faceting. Extra stored fields can slow down search if they > are large (in terms of bytes), but don't really occupy extra > memory, unless they are polluting the doc cache. Does 'text' > need to be stored? > > > >> what does the LukeReqeust Handler tell you about the # of distinct > >> terms in each field that you facet on? > > > > Where would I find that? I could probably estimate that > myself on a > > per-column basis. it ranges from 4 distinct values for > media_type to > > 30-ish for location to 200-ish for country_code to almost > 10,000 for > > site_id to almost 100,000 for journalist_id. > > Using the filter cache method on the things like media type > and location; this will occupy ~2.3MB of memory _per unique > value_, so it should be a net win for those (although quite > close in space requirements for a 30-ary field on your index size). > > -Mike > >