On 12-Nov-07, at 8:03 AM, Chris Hostetter wrote:


if what you are interested in is stats on the first N docs according to a specific sort (score or otherwise) then you could write a custom request
handler that executed a search with a limit of N, got the DocList,
iterated over it to build a DocSet, and then used that DocSet to do
faceting ... but that would probably take even longer then just using the
full DocSet matching the entire query.

An implementation might look like:

        DocList superlist;
        int facetDocLimit = params.getInt(DMP.FACET_DOCLIMIT, -1);
        if(facetDocLimit > 0 && facetDocLimit != req.getLimit()) {
          superlist = s.getDocList(query, restrictions,
                                   SolrPluginUtils.getSort(req),
                                   req.getStart(), facetDocLimit,
                                   flags);
results.docSet = SearcherUtils.getDocSetFromDocList (superlist, s);
          results.docList = superlist.subset(0, req.getLimit());
        } else {

Where getDocSetFromDocList() uses DocSetHitCollector to build a DocSet.

To answer the performance question: There is a gain to be had when doing lots of faceting on huge indices, if N is low (say, 500-1000). One problem with the implementation above is that it stymies the query caching in SolrIndexSearcher (since the generated DocList is > the cache upper bound).

-Mike

Reply via email to