Is there a good way of handling a large number of facets that are quite
sparse (most documents not having any value most facets)?

In my system I have quite a few documents (few million, will soon
grow to mid tens of millions), and our users are requesting an
ever-increasing number of facets (currently 80, and growing).
Many of the facets are not present in a vast majority of the
documents (often a facet's only present in under 100K or so docs).


Am I right in understanding that lines in the log file like this:

INFO: UnInverted multi-valued field 
{field=cvroffsn_facet,memSize=55532332,tindexSize=132,time=2296,phase1=2257,nTerms=729,bigTerms=0,termInstances=5422,uses=0}

suggest that even when a facet only appears in a few thousand
docs, it still takes considerable memory?


Is there anything clever I can do to tell it to handle such sparsely
used facets in a more memory friendly way?

Perhaps I should be setting up a bunch of shards?  Perhaps small
ones dedicated to holding documents with the rare facets, and
large ones with the documents without the rare facets?


Lines like
INFO: UnInverted multi-valued field 
{field=property_manufacturer_facet,memSize=4224,tindexSize=32,time=66,phase1=66,nTerms=0,bigTerms=0,termInstances=0,uses=0}
suggest to me that in the special case of "termInstances=0", unused facets 
don't take up much memory.
Would that suggest that I might be able to write a different uninverter that
has a more compact representation even for facts that show up a few times?
Where might I look to do so?

Reply via email to