: > which stores order as int[][] rather than int[]; a variant of : > SimpleFacets.getFieldCacheCounts would simply need an inner loop to : > tally each of the Document's Term indexes for that field. : : I think something like that is the right approach, the only problem : being the size in memory this would take up. It may need some clever : encoding to keep it reasonable.
yeah ... at some point you may wnat to consider the possibility of "sampling" the values for the first N documents in your DocList and then getting the counts for those values ... it's not a perfect solution, but it should work efficiently no matter how many docs you have, or how many unique field values there are in your index -- i don't know of any other approach that can function as well for any data set. : > how best to do that? For example, is there a way to quickly : > determine the number of distinct Term values for a field without : > enumerating to the end, so the ratio of Terms to Documents can be : > used? : : I'd suggest a Solr fieldInfo cache that stored info about a field: : a) the number of unique terms in the field : b) (optionally) a sorted list by docfreq of the top terms in the field yeah ... this is what i'd orriginally invinisioned when i first wrote the TermEnum based code in SimpleFacets before Yonik pointed out how usefull the FieldCache could be ... keeping a list like yonik described in (b) where the size of the list is sufficiently bigger them the typical "limit" you put on your facet fields should provide a lot of wins -- if the way i picture it in my head works out in reality, sizing that list with your <HashDocSet maxSize="X"/> in mind might help you ensure that even if you do wind up iterating over the full TermEnum, those terms all result in HashDocSets which will be relatively small, so your filterCache can be big. computing the (a) value yonik mentioned is trivial to do while building up (b) and (a) can be used to determine wether or not you really want to try and build the MultiFieldCache or just walk the TermEnums. -Hoss