What matters isn't how many documents have a value, so much as how many unique values there are in the field total. If there aren't that many, faceting can be done fairly quickly and fairly efficiently.
Otherwise, the only thing I can think of is experimenting with the two different facet methods available, to see if either one performs better for your environment. You can possibly trade memory for slowness, but I'm not sure why you'd want to do that, generally one would rather take up a reasonable amount of memory for speed. 50 megs times 80 is still only around 4 gigs, not entirely out of the question to simply supply enough RAM for all those caches. http://wiki.apache.org/solr/SimpleFacetParameters#facet.method ________________________________________ From: Ron Mayer [r...@0ape.com] Sent: Monday, September 06, 2010 5:02 PM To: solr-user@lucene.apache.org Subject: Many sparse facets? Is there a good way of handling a large number of facets that are quite sparse (most documents not having any value most facets)? In my system I have quite a few documents (few million, will soon grow to mid tens of millions), and our users are requesting an ever-increasing number of facets (currently 80, and growing). Many of the facets are not present in a vast majority of the documents (often a facet's only present in under 100K or so docs). Am I right in understanding that lines in the log file like this: INFO: UnInverted multi-valued field {field=cvroffsn_facet,memSize=55532332,tindexSize=132,time=2296,phase1=2257,nTerms=729,bigTerms=0,termInstances=5422,uses=0} suggest that even when a facet only appears in a few thousand docs, it still takes considerable memory? Is there anything clever I can do to tell it to handle such sparsely used facets in a more memory friendly way? Perhaps I should be setting up a bunch of shards? Perhaps small ones dedicated to holding documents with the rare facets, and large ones with the documents without the rare facets? Lines like INFO: UnInverted multi-valued field {field=property_manufacturer_facet,memSize=4224,tindexSize=32,time=66,phase1=66,nTerms=0,bigTerms=0,termInstances=0,uses=0} suggest to me that in the special case of "termInstances=0", unused facets don't take up much memory. Would that suggest that I might be able to write a different uninverter that has a more compact representation even for facts that show up a few times? Where might I look to do so?