I could certainly be wrong. If you have a facet with a LOT fewer unique values than documents in the query, I'd be curious what happens if you try facet.method=enum.
facet.enum.cache.minDf's documentation suggests it can effect memory usage with enum too, but seems more focused on when you have LOTS of unique facet values. Otherwise, seems to me that memory usage should definitely be proportional to unique values with facet.method=enum. Not sure about fc. But the wiki documentation suggests that facet.method defaults to 'fc' in 1.4. I feel like I heard somewhere that 1.4 will automatically choose either fc or enum based on the characteristics of the field (like num unique values?), but that's not what the documentation suggests. I have not tried to look at the code. I'm definitely not an expert, just trying to help figure it out based on what I do know. Why would 'the computer time' be hidden from users? Ah, because the uninverted field is created once (until the next commit), not per query, I guess? But that makes me think -- if those log files you pasted are showing the memory size of the lucene uinvertedfield itself, then I have absolutely no ideas about any way to effect that. I guess what I was thinking about was more filterCache memory usage. The lucene uninverted field is going to be created for other functions even if you don't facet on the field, I think? But not entirely sure exactly what that log line is reporting memory usage for. ________________________________________ From: Ron Mayer [r...@0ape.com] Sent: Monday, September 06, 2010 8:27 PM To: solr-user@lucene.apache.org Subject: Re: Many sparse facets? Jonathan Rochkind wrote: > What matters isn't how many documents have a value, so much > as how many unique values there are in the field total. If > there aren't that many, faceting can be done fairly quickly and fairly > efficiently. Really? Don't these 2 log file lines: INFO: UnInverted multi-valued field {field=vehicle_vin_facet,memSize=39513151,tindexSize=208256,time=138382,phase1=138356,nTerms=638642,bigTerms=0,termInstances=739169,uses=0} INFO: UnInverted multi-valued field {field=specialassignyn_facet,memSize=36336696,tindexSize=44,time=1458,phase1=1438,nTerms=5,bigTerms=0,termInstances=138046,uses=0} suggest that whether I have a facet with a half million unique values or a half dozen, they use roughly the same much memory? At first glance they both seem similarly efficient to filter on. Certainly the one with many unique instances takes longer to invert -- but that's just computer time that's hidden from users, no? > ... 50 megs times 80 is still only around 4 gigs, not entirely out of the > question > to simply supply enough RAM for all those caches. Yup - that's what I'm doing for now (just moved to a 24 gig ram machine); but I expect we'll have 10X as many documents, and maybe 2x as many facets by spring. Still not undoable, but I may need to start forecasting RAM budgets.