Is there a good way of handling a large number of facets that are quite sparse (most documents not having any value most facets)?
In my system I have quite a few documents (few million, will soon grow to mid tens of millions), and our users are requesting an ever-increasing number of facets (currently 80, and growing). Many of the facets are not present in a vast majority of the documents (often a facet's only present in under 100K or so docs). Am I right in understanding that lines in the log file like this: INFO: UnInverted multi-valued field {field=cvroffsn_facet,memSize=55532332,tindexSize=132,time=2296,phase1=2257,nTerms=729,bigTerms=0,termInstances=5422,uses=0} suggest that even when a facet only appears in a few thousand docs, it still takes considerable memory? Is there anything clever I can do to tell it to handle such sparsely used facets in a more memory friendly way? Perhaps I should be setting up a bunch of shards? Perhaps small ones dedicated to holding documents with the rare facets, and large ones with the documents without the rare facets? Lines like INFO: UnInverted multi-valued field {field=property_manufacturer_facet,memSize=4224,tindexSize=32,time=66,phase1=66,nTerms=0,bigTerms=0,termInstances=0,uses=0} suggest to me that in the special case of "termInstances=0", unused facets don't take up much memory. Would that suggest that I might be able to write a different uninverter that has a more compact representation even for facts that show up a few times? Where might I look to do so?