Hi All,

I've been playing around with SpellCheckComponent (solr 1.4) and ran into 
issues with suggestions for a phrase query. We use dismax request handler and 
it's an AND search in case the query terms count < 4 (specified by "mm" param). 
Since SpellCheckComponent checks the doc frequency of individual terms and then 
collates the results, there are instances where the suggestion provided by the 
component results in 0 results and hence a bad user experience. I read 
somewhere in this newsgroup, about using Shingles to solve this problem but 
unfortunately in that case the suggestions are driven by "MaxShingleSize" and 
the order in which the terms appear in the original doc.  So, basically I had 
two issues with default SpellCheckComponent with multiple terms query:

1. Sometimes you get 0 results with the suggestion returned.
2. I would also want to show the suggestion even if the original query result 
in >0 results, provided the "suggestion query" returns results which are 
substantially greater than the original results.

So I decided to extend the SpellCheckComponent, override protected NamedList 
toNamedList() method; fire another query with the "collation" and compare the 
doc matches for both the queries; if the collation-query.results are > 
original-query.results; leave the collation as is else blank out the collation. 
I am not sure if this the right way of solving this problem; also I am not even 
sure on how to fire the search with a new query optimally. Here's the 
overridden code in my custom SpellCheckComponent:

    protected NamedList toNamedList(SpellingResult spellingResult, String 
origQuery, 
            boolean extendedResults, boolean collate) 
    {
        NamedList result = super.toNamedList(spellingResult, origQuery, 
extendedResults, collate);
        if(collate){
            String collation = (String) result.get("collation");
            if(collation!=null && collation.length() > 0 && builder!=null){
                //fire a query and get the results
                try {
                    //only add spelling suggestion in case results are less 
than some threshold
                    int hits = builder.getResults().docList.matches();
                    if(hits>MIN_THRESHOLD){
                        result.remove("collation");
                        result.add("collation", "");
                        return result;
                    }
                    SolrIndexSearcher searcher = builder.req.getSearcher();
                    QParser qp = QParser.getParser(collation, "dismax", 
builder.req);
                    NamedList params = new NamedList();
                    params.add("rows", 0);
                    params.add("omitHeader","true");
                    SolrParams localParams = SolrParams.toSolrParams(params);
                    qp.setLocalParams(localParams);
                    Query q = qp.getQuery();
                    TopDocs docs = searcher.search(q, 1);
                    int suggestionHits = docs.totalHits;
                    //try to get hits for this query
                    log.info("current hits:" + hits);
                    log.info("total number of hits:" + suggestionHits);
                    if(suggestionHits < hits*MULTIPLIER){
                        //remove the collation
                        result.remove("collation");
                        result.add("collation", "");
                    }
                } catch (IOException e) {
                    log.error(e.toString());
                }
                catch (ParseException e) {
                    log.error(e.toString());
                }                
            }
        }
        return result;
    }

This does work as expected but I had couple of questions around the whole thing:

1. Is this the right approach?
2. Is doing a solr search this way correct, unfortunately I am not sure how you 
go about making a solr search :D, I've always been using the REST interface 
instead of having to do it from within a java class: also given the fact that I 
don't know Java enough adds a bit more complexity?
3. In the above code, I hard-coded the queryType to "dismax" as I wasn't sure 
how to get it from the original request. 


Apologies for the long mail but this has been killing me for a while!

Thanks!


Reply via email to