On 10-Oct-07, at 4:16 AM, Britske wrote:
However, I realized that for calculating the count for each of the
facetvalues, the original standardrequesthandler already loops the
doclist
to check for matches. Therefore my implementation actually does
double work,
since it gets doclists for each of the facetvalues again.
Well, not quite. Once you get into the faceting code, everything is
in terms of DocSets, which are undordered collections of doc ids.
Also, faceting employs efficient algorithms for counting the
cardinality of intersections without actually materializing them,
which is another difficulty to reusing the code.
My question:
is there a way to get to the already calculated doclist per
facetvalue from
a subclassed StandardRequestHandler, and so get a nice speedup? This
facet-falculation seems to go deep into the core of Solr
(SimpleFacets.getFacetTermEnumCounts) and seems not very sensible
to alter
for just this requirement. opinions appreciated.
Solr never really materializes much of the DocList for a query--
almost all docs are dropped as soon as it is clear that they are not
in the top N results.
It should be possible to produce an approximation which is more
efficient, like collecting the DocList for the top 1000 docs,
converting it to a DocSet, find the set intersections (instead of
using SimpleFacets), and re-order the resulting sets in terms of the
original DocList.
It would take a bit of work to implement, however.
As a last and somewhat related question:
is there a way to explicity specify facet-values that I want to
include in
the faceting without (ab)using Q? This is relevant for me since
the perfect
solution would be to have the ability to orthogonally get multiple
toplists
in 1 query. Given the current implementation, this orthoganality is
now
'corrupted' as injection of a fieldvalue in Q for one facetfield
influences
the outcome of another facetfield.
I'm not quite sure what you are asking here. You can specify
arbitrary facet values using facet.query or facet.prefix. If you
want to facet multiple doclists from different queries in one
request, just write your own request handler that takes a multi-
valued q param and facets on each.
I didn't answer all the questions in your email, but I hope this
clarifies things a bit. Good luck!
-Mike