On 10-Oct-07, at 4:16 AM, Britske wrote:


However, I realized that for calculating the count for each of the
facetvalues, the original standardrequesthandler already loops the doclist to check for matches. Therefore my implementation actually does double work,
since it gets doclists for each of the facetvalues again.

Well, not quite. Once you get into the faceting code, everything is in terms of DocSets, which are undordered collections of doc ids. Also, faceting employs efficient algorithms for counting the cardinality of intersections without actually materializing them, which is another difficulty to reusing the code.

My question:
is there a way to get to the already calculated doclist per facetvalue from
a subclassed StandardRequestHandler, and so get a nice speedup?  This
facet-falculation seems to go deep into the core of Solr
(SimpleFacets.getFacetTermEnumCounts) and seems not very sensible to alter
for just this requirement. opinions appreciated.

Solr never really materializes much of the DocList for a query-- almost all docs are dropped as soon as it is clear that they are not in the top N results.

It should be possible to produce an approximation which is more efficient, like collecting the DocList for the top 1000 docs, converting it to a DocSet, find the set intersections (instead of using SimpleFacets), and re-order the resulting sets in terms of the original DocList.

It would take a bit of work to implement, however.


As a last and somewhat related question:
is there a way to explicity specify facet-values that I want to include in the faceting without (ab)using Q? This is relevant for me since the perfect solution would be to have the ability to orthogonally get multiple toplists in 1 query. Given the current implementation, this orthoganality is now 'corrupted' as injection of a fieldvalue in Q for one facetfield influences
the outcome of another facetfield.

I'm not quite sure what you are asking here. You can specify arbitrary facet values using facet.query or facet.prefix. If you want to facet multiple doclists from different queries in one request, just write your own request handler that takes a multi- valued q param and facets on each.

I didn't answer all the questions in your email, but I hope this clarifies things a bit. Good luck!

-Mike

Reply via email to