On 12/6/06, J.J. Larrea <[EMAIL PROTECTED]> wrote:
My thought was that the simplest approach would be to subclass
FieldCacheImpl to introduce a getMultiStringIndex method derived from
getStringIndex, defining  and returning a MultiStringIndex class
which stores order as int[][] rather than int[]; a variant of
SimpleFacets.getFieldCacheCounts would simply need an inner loop to
tally each of the Document's Term indexes for that field.

I think something like that is the right approach, the only problem
being the size in memory this would take up.  It may need some clever
encoding to keep it reasonable.

With multi-valuedness no longer being a useful criterion for
automatically choosing between the filter-based and modified
FieldCache-based mechanisms, there then would need to be an alternate
criterion, either implicit or explicit. Does anyone have any ideas
how best to do that?  For example, is there a way to quickly
determine the number of distinct Term values for a field without
enumerating to the end, so the ratio of Terms to Documents can be
used?

I'd suggest a Solr fieldInfo cache that stored info about a field:
a) the number of unique terms in the field
b) (optionally) a sorted list by docfreq of the top terms in the field

An entirely alternate approach (briefly suggested in a comment in
SimpleFacets) for fields indexed with term vectors would be to simply
call getTermFreqVector, for each hit and store (term text, tally) in
a HashTable, or (term text, index) in a HT which could be cached with
tallies generated per-query in an array as they are now, in the
latter case building a field-cache dynamically based on actual query
results.  Does anyone have any insight on how efficient that may or
may not be?

For queries that don't have many hits, termvectors would be fine.
I don't think they would perform well with a lot of hits though.
There could be a different type of faceting that just uses the top "n"
results though.

And if I have gotten something dreadfully wrong in my understanding
of current implementation or proposed enhancement, I would appreciate
getting straightened out.

Sounds like you have a pretty good handle on it!

-Yonik

Reply via email to