I freely admit that i'm totally lost on most of what you're suggestion ...
it seems like you're suggesting that organizing the terms in a facet field
into a tree structure would help us know which terms to compute the
counts for first for a given query -- but it's not clear to me why that
would be the case -- the terms don't neccessarily have any correlation.

but on this point...

: pruning by df is going to be one of the most important features
: here... so a variation (or optimization) would be to keep a list of
: the highest terms by df, and then build the facet tree excluding those
: top terms.  That should lower the dfs in the tree nodes and allow more
: pruning.

i'm not sure why excluding high DF terms helps your approach, but one of
the optimizations i anticipated when we first added term faceting was to
build a cache of the high DF terms and test them first -- if you only want
the 20 facet terms with the highest counts, and after computing counts for
the 100 highest DF terms you find your lowest count (#20) to be 678, and
the DF of the 100th highest DF term in your cache was 677 then you are
garunteed you don't need to check any other terms (which by definition
have lower DFs)

...so if extracting the high DF terms helps whatever complex tree walking
you are thinking of, then checking those high DF terms first might save
you the hassle of walking the tree at all.



-Hoss

Reply via email to