I've subclassed StandardRequestHandler to be able to show top-N results for some of the facet-values that I'm interested in. The functionality resembles the solr-236 field collapsing a bit, with the difference that I can arbitrarily specify which facet-query to collapse and to what extend. (possibility to specify N independently)
The code for this is now quite simple, but (maybe because of that) I've got the feeling that it can be optimized quite a bit. The question is how? first some explanation and code: I extended the standardrequesthandler and execute super.handleRequestBody(req,rsp) to be able to fetch the facetquery results. >From that I copy the facets that I wish to collapse to a NamedList facet_results and execute code (see below) that basically splits a (possibly combined) facetquery into independent queries which are added to a FQ-list. That FQ-list is appended to the original query and FQ-list and the new query is executed. for(int i = 0; i < facetresults.size(); i++) { List<Query> fqList = new ArrayList<Query>(); String[] fqsplit = facetresults.getName(i).split("[+]"); for(int j = 0; j< fqsplit.length; j++) { Query fqNew = QueryParsing.parseQuery(fqsplit[j].trim(), req.getSchema()); fqList.add(fqNew); } fqList.addAll(fqsExisting); DocListAndSet resultList = new DocListAndSet(); SolrIndexSearcher s = req.getSearcher(); resultList.docList = s.getDocList(query,fqList, sort,start, rows ,0); NamedList facetValue = new SimpleOrderedMap(); facetValue.add("results",resultList.docList); facetresults.setVal(i, facetValue); } This all works okay, but I'm still thinking that there must be a better way than executing queries over and over again, for which only the fq's are different: Q and Sort are the same for the executed queries per facet as for the same already exectuted overall query. Obviously doing a intersect on the original result would by far be the fastest solution but Mike mentioned that this wasn't doable, since the overall sorted resultlist is not available. see: http://www.nabble.com/showing-results-per-facet-value-efficiently-to13133815.html Is there anything else I can do to speedup the queries? for reference I'm now seeing 15-16ms for each exectued query which is not in the query-cache. This seems independent whether of not Fq's are already in the filtercache or not, which strikes me as odd. For example see the performance measure of the collapsed facet-queries below (and make up 1 call to Solr). Tested on an unwarmed solr-server. 20.000 docs. intel Core 2 Duo 2ghz. 800 MB Ram assigned to Solr. 15 : ms for: _ddp_p_dc_dc_2_dc_dc:[0 TO 50] 16 : ms for: _ddp_p_dc_dc_2_dc_dc:[51 TO 100] 16 : ms for: _ddp_p_dc_dc_2_dc_dc:[101 TO 200] 15 : ms for: _ddp_p_dc_dc_2_dc_dc:[201 TO 300] 16 : ms for: idA:2140479 15 : ms for: idA:1456928 16 : ms for: idA:2601889 0 : ms for: _ddp_p_dc_dc_2_dc_dc:[0 TO 50] 0 : ms for: _ddp_p_dc_dc_2_dc_dc:[51 TO 100] 0 : ms for: _ddp_p_dc_dc_2_dc_dc:[101 TO 200] 0 : ms for: _ddp_p_dc_dc_2_dc_dc:[201 TO 300] 15 : ms for: _ddp_p_dc_dc_2_dc_dc:[0 TO 50] + idA:2140479 16 : ms for: _ddp_p_dc_dc_2_dc_dc:[0 TO 50] + idA:1456928 16 : ms for: _ddp_p_dc_dc_2_dc_dc:[0 TO 50] + idA:2601889 15 : ms for: _ddp_p_dc_dc_2_dc_dc:[51 TO 100] + idA:2140479 16 : ms for: _ddp_p_dc_dc_2_dc_dc:[51 TO 100] + idA:1456928 15 : ms for: _ddp_p_dc_dc_2_dc_dc:[51 TO 100] + idA:2601889 16 : ms for: _ddp_p_dc_dc_2_dc_dc:[101 TO 200] + idA:2140479 16 : ms for: _ddp_p_dc_dc_2_dc_dc:[101 TO 200] + idA:1456928 15 : ms for: _ddp_p_dc_dc_2_dc_dc:[101 TO 200] + idA:2601889 16 : ms for: _ddp_p_dc_dc_2_dc_dc:[201 TO 300] + idA:2140479 16 : ms for: _ddp_p_dc_dc_2_dc_dc:[201 TO 300] + idA:1456928 15 : ms for: _ddp_p_dc_dc_2_dc_dc:[201 TO 300] + idA:2601889 The strange thing here is that for example the query: _ddp_p_dc_dc_2_dc_dc:[0 TO 50] + idA:2140479 takes 15 ms although it's independent parts: - _ddp_p_dc_dc_2_dc_dc:[0 TO 50] - idA:2140479 have already been executed (they also take 15/16 ms) so all FQ's for _ddp_p_dc_dc_2_dc_dc:[0 TO 50] + idA:2140479 must be in the filter-cache and hence the query must execute quicker than the very first query: _ddp_p_dc_dc_2_dc_dc:[0 TO 50] for which the FQ wasn't in the filter-cache at that moment. So to summarize my 2 questions: 1. is there any way to get better performance for what 'm trying to achieve? Perhaps a custom hitcollector or something? 2. do you have any explanation for the fact the the filter-cache doens't seem to matter for executing the queries? Thanks in advance for making it to the end of this post and for any help you might give me ;-) Geert-Jan -- View this message in context: http://www.nabble.com/how-do-do-most-efficient%3A-collapsing-facets-into-top-N-results-tp14318577p14318577.html Sent from the Solr - User mailing list archive at Nabble.com.