What matters isn't how many documents have a value, so much as how many unique 
values there are in the field total. If there aren't that many, faceting can be 
done fairly quickly and fairly efficiently. 

Otherwise, the only thing I can think of is experimenting with the two 
different facet methods available, to see if either one performs better for 
your environment. You can possibly trade memory for slowness, but I'm not sure 
why you'd want to do that, generally one would rather take up a reasonable 
amount of memory for speed. 50 megs times 80 is still only around 4 gigs, not 
entirely out of the question to simply supply enough RAM for all those caches.  

http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
________________________________________
From: Ron Mayer [r...@0ape.com]
Sent: Monday, September 06, 2010 5:02 PM
To: solr-user@lucene.apache.org
Subject: Many sparse facets?

Is there a good way of handling a large number of facets that are quite
sparse (most documents not having any value most facets)?

In my system I have quite a few documents (few million, will soon
grow to mid tens of millions), and our users are requesting an
ever-increasing number of facets (currently 80, and growing).
Many of the facets are not present in a vast majority of the
documents (often a facet's only present in under 100K or so docs).


Am I right in understanding that lines in the log file like this:

INFO: UnInverted multi-valued field 
{field=cvroffsn_facet,memSize=55532332,tindexSize=132,time=2296,phase1=2257,nTerms=729,bigTerms=0,termInstances=5422,uses=0}

suggest that even when a facet only appears in a few thousand
docs, it still takes considerable memory?


Is there anything clever I can do to tell it to handle such sparsely
used facets in a more memory friendly way?

Perhaps I should be setting up a bunch of shards?  Perhaps small
ones dedicated to holding documents with the rare facets, and
large ones with the documents without the rare facets?


Lines like
INFO: UnInverted multi-valued field 
{field=property_manufacturer_facet,memSize=4224,tindexSize=32,time=66,phase1=66,nTerms=0,bigTerms=0,termInstances=0,uses=0}
suggest to me that in the special case of "termInstances=0", unused facets 
don't take up much memory.
Would that suggest that I might be able to write a different uninverter that
has a more compact representation even for facts that show up a few times?
Where might I look to do so?

Reply via email to