On 12-Nov-07, at 8:03 AM, Chris Hostetter wrote:
if what you are interested in is stats on the first N docs
according to a
specific sort (score or otherwise) then you could write a custom
request
handler that executed a search with a limit of N, got the DocList,
iterated over it to build a DocSet, and then used that DocSet to do
faceting ... but that would probably take even longer then just
using the
full DocSet matching the entire query.
An implementation might look like:
DocList superlist;
int facetDocLimit = params.getInt(DMP.FACET_DOCLIMIT, -1);
if(facetDocLimit > 0 && facetDocLimit != req.getLimit()) {
superlist = s.getDocList(query, restrictions,
SolrPluginUtils.getSort(req),
req.getStart(), facetDocLimit,
flags);
results.docSet = SearcherUtils.getDocSetFromDocList
(superlist, s);
results.docList = superlist.subset(0, req.getLimit());
} else {
Where getDocSetFromDocList() uses DocSetHitCollector to build a DocSet.
To answer the performance question: There is a gain to be had when
doing lots of faceting on huge indices, if N is low (say, 500-1000).
One problem with the implementation above is that it stymies the
query caching in SolrIndexSearcher (since the generated DocList is >
the cache upper bound).
-Mike