RE: Many sparse facets?

Jonathan Rochkind Mon, 06 Sep 2010 17:50:10 -0700

I could certainly be wrong. If you have a facet with a LOT fewer unique values 
than documents in the query, I'd be curious what happens if you try 
facet.method=enum.

facet.enum.cache.minDf's documentation suggests it can effect memory usage with 
enum too, but seems more focused on when you have LOTS of unique facet values.  
Otherwise, seems to me that memory usage should definitely be proportional to 
unique values with facet.method=enum.   Not sure about fc. 

But the wiki documentation suggests that facet.method defaults to 'fc' in 1.4.  
 I feel like I heard somewhere that 1.4 will automatically choose either fc or 
enum based on the characteristics of the field (like num unique values?), but 
that's not what the documentation suggests. I have not tried to look at the 
code. 

I'm definitely not an expert, just trying to help figure it out based on what I 
do know. 

Why would 'the computer time' be hidden from users? Ah, because the uninverted 
field is created once (until the next commit), not per query, I guess?  But 
that makes me think -- if those log files you pasted are showing the memory 
size of the lucene uinvertedfield itself, then I have absolutely no ideas about 
any way to effect that. I guess what I was thinking about was more filterCache 
memory usage.  The lucene uninverted field is going to be created for other 
functions even if you don't facet on the field, I think?  But not entirely sure 
exactly what that log line is reporting memory usage for. 
________________________________________
From: Ron Mayer [r...@0ape.com]
Sent: Monday, September 06, 2010 8:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Many sparse facets?

Jonathan Rochkind wrote:
> What matters isn't how many documents have a value, so much
> as how many unique values there are in the field total. If
> there aren't that many, faceting can be done fairly quickly and fairly 
> efficiently.

Really?

Don't these 2 log file lines:

INFO: UnInverted multi-valued field 
{field=vehicle_vin_facet,memSize=39513151,tindexSize=208256,time=138382,phase1=138356,nTerms=638642,bigTerms=0,termInstances=739169,uses=0}
INFO: UnInverted multi-valued field 
{field=specialassignyn_facet,memSize=36336696,tindexSize=44,time=1458,phase1=1438,nTerms=5,bigTerms=0,termInstances=138046,uses=0}

suggest that whether I have a facet with a half million unique values or a half
dozen, they use roughly the same much memory?  At first glance they both seem
similarly efficient to filter on.

Certainly the one with many unique instances takes longer to invert -- but 
that's
just computer time that's hidden from users, no?

> ... 50 megs times 80 is still only around 4 gigs, not entirely out of the 
> question
> to simply supply enough RAM for all those caches.

Yup - that's what I'm doing for now (just moved to a 24 gig ram machine); but I 
expect
we'll have 10X as many documents, and maybe 2x as many facets by spring.   
Still not
undoable, but I may need to start forecasting RAM budgets.

RE: Many sparse facets?

Reply via email to