On 12/6/06, J.J. Larrea <[EMAIL PROTECTED]> wrote:
My thought was that the simplest approach would be to subclass FieldCacheImpl to introduce a getMultiStringIndex method derived from getStringIndex, defining and returning a MultiStringIndex class which stores order as int[][] rather than int[]; a variant of SimpleFacets.getFieldCacheCounts would simply need an inner loop to tally each of the Document's Term indexes for that field.
I think something like that is the right approach, the only problem being the size in memory this would take up. It may need some clever encoding to keep it reasonable.
With multi-valuedness no longer being a useful criterion for automatically choosing between the filter-based and modified FieldCache-based mechanisms, there then would need to be an alternate criterion, either implicit or explicit. Does anyone have any ideas how best to do that? For example, is there a way to quickly determine the number of distinct Term values for a field without enumerating to the end, so the ratio of Terms to Documents can be used?
I'd suggest a Solr fieldInfo cache that stored info about a field: a) the number of unique terms in the field b) (optionally) a sorted list by docfreq of the top terms in the field
An entirely alternate approach (briefly suggested in a comment in SimpleFacets) for fields indexed with term vectors would be to simply call getTermFreqVector, for each hit and store (term text, tally) in a HashTable, or (term text, index) in a HT which could be cached with tallies generated per-query in an array as they are now, in the latter case building a field-cache dynamically based on actual query results. Does anyone have any insight on how efficient that may or may not be?
For queries that don't have many hits, termvectors would be fine. I don't think they would perform well with a lot of hits though. There could be a different type of faceting that just uses the top "n" results though.
And if I have gotten something dreadfully wrong in my understanding of current implementation or proposed enhancement, I would appreciate getting straightened out.
Sounds like you have a pretty good handle on it! -Yonik