Hi All,
I've been playing around with SpellCheckComponent (solr 1.4) and ran into issues with suggestions for a phrase query. We use dismax request handler and it's an AND search in case the query terms count < 4 (specified by "mm" param). Since SpellCheckComponent checks the doc frequency of individual terms and then collates the results, there are instances where the suggestion provided by the component results in 0 results and hence a bad user experience. I read somewhere in this newsgroup, about using Shingles to solve this problem but unfortunately in that case the suggestions are driven by "MaxShingleSize" and the order in which the terms appear in the original doc. So, basically I had two issues with default SpellCheckComponent with multiple terms query: 1. Sometimes you get 0 results with the suggestion returned. 2. I would also want to show the suggestion even if the original query result in >0 results, provided the "suggestion query" returns results which are substantially greater than the original results. So I decided to extend the SpellCheckComponent, override protected NamedList toNamedList() method; fire another query with the "collation" and compare the doc matches for both the queries; if the collation-query.results are > original-query.results; leave the collation as is else blank out the collation. I am not sure if this the right way of solving this problem; also I am not even sure on how to fire the search with a new query optimally. Here's the overridden code in my custom SpellCheckComponent: protected NamedList toNamedList(SpellingResult spellingResult, String origQuery, boolean extendedResults, boolean collate) { NamedList result = super.toNamedList(spellingResult, origQuery, extendedResults, collate); if(collate){ String collation = (String) result.get("collation"); if(collation!=null && collation.length() > 0 && builder!=null){ //fire a query and get the results try { //only add spelling suggestion in case results are less than some threshold int hits = builder.getResults().docList.matches(); if(hits>MIN_THRESHOLD){ result.remove("collation"); result.add("collation", ""); return result; } SolrIndexSearcher searcher = builder.req.getSearcher(); QParser qp = QParser.getParser(collation, "dismax", builder.req); NamedList params = new NamedList(); params.add("rows", 0); params.add("omitHeader","true"); SolrParams localParams = SolrParams.toSolrParams(params); qp.setLocalParams(localParams); Query q = qp.getQuery(); TopDocs docs = searcher.search(q, 1); int suggestionHits = docs.totalHits; //try to get hits for this query log.info("current hits:" + hits); log.info("total number of hits:" + suggestionHits); if(suggestionHits < hits*MULTIPLIER){ //remove the collation result.remove("collation"); result.add("collation", ""); } } catch (IOException e) { log.error(e.toString()); } catch (ParseException e) { log.error(e.toString()); } } } return result; } This does work as expected but I had couple of questions around the whole thing: 1. Is this the right approach? 2. Is doing a solr search this way correct, unfortunately I am not sure how you go about making a solr search :D, I've always been using the REST interface instead of having to do it from within a java class: also given the fact that I don't know Java enough adds a bit more complexity? 3. In the above code, I hard-coded the queryType to "dismax" as I wasn't sure how to get it from the original request. Apologies for the long mail but this has been killing me for a while! Thanks!