Re: Spell Check Handler

climbingrose Thu, 11 Oct 2007 03:53:31 -0700

Hi all,

I've been so busy the last few days so I haven't replied to this email. I
modified SpellCheckerHandler a while ago to include support for multiword
query. To be honest, I didn't have time to write unit test for the code.
However, I deployed it in a production environment and it has been working
for me so far. My version, however, has two assumptions:


1) I assumpt that when user enter a misspelled multiword query, we should
only check for words that are actually misspelled. For example, if user
enter "life expectancy calculatar", which has "calculator" misspelled, we
should only spellcheck "calculatar".
2) I only return the best string for a mispelled query.

I guess I can just directly paste the code here so that others can adapt for
their own purposes. If you have any question, just send me an email. I'll
happy to help  you.

        StringBuffer buf = null;
        if (null != words && !"".equals(words.trim())) {
            Analyzer analyzer = req.getSchema
().getField(field).getType().getAnalyzer();

            TokenStream source = analyzer.tokenStream(field, new
StringReader(words));
            Token t;
            boolean hasSuggestion = false;
            boolean termExists = false;
            while (true) {
                try {
                    t = source.next();
                } catch (IOException e) {
                    t = null;
                }
                if (t == null)
                    break;

                String termText = t.termText();
                String[] suggestions = spellChecker.suggestSimilar(termText,
numSug, req.getSearcher().getReader(), restrictToField, true);
                if (suggestions != null && suggestions.length > 0) {
                    if (!suggestions[0].equals(termText)) {
                        hasSuggestion = true;
                    }
                    if (buf == null) {
                        buf = new StringBuffer(suggestions[0]);
                    } else
                        buf.append(" ").append(suggestions[0]);
                } else if (spellChecker.exist(termText)){
                    termExists = true;
                    if (buf == null) {
                        buf = new StringBuffer(termText);
                    } else
                        buf.append(" ").append(termText);
                } else {
                    hasSuggestion = false;
                    termExists= false;
                    break;
                }
            }
            try {
                source.close();
            } catch (IOException e) {
                // ignore
            }
            // String[] suggestions = spellChecker.suggestSimilar(words,
numSug,
            // nullReader, restrictToField, onlyMorePopular);
            if (hasSuggestion || (!hasSuggestion && termExists))
                rsp.add("suggestions", buf.toString());
            else
                rsp.add("suggestions", null);



On 10/11/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> Hoss,
>
> I had a feeling someone would be quoting Yonik's Law of Patches!  ;-)
>
> For now, this is done.
>
> I created the changes, created JavaDoc comments on the various settings
> and their expected output, created a JUnit test for the
> SpellCheckerRequestHandler
> which tests various components of the handler, and I also created the
> supporting configuration files for the JUnit tests (schema and solrconfig
> files).
>
> I attached the patch to the JIRA issue so now we just have to wait until
> it gets
> added back in to the main code stream.
>
> For anyone who is interested, here is a link to the JIRA:
> https://issues.apache.org/jira/browse/SOLR-375
>
> Could someone please drop me a hint on how to update the wiki or any other
> documentation that could benefit to being updated; I'll like to help out
> as much
> as possible, but first I need to know "how". ;-)
>
> When these changes do get committed back in to the daily build, please
> review the generated JavaDoc for information on how to utilize these new
> features.
> If anyone has any questions, or comments, please do not hesitate to ask.
>
> As a general note of a self-critique on these changes, I am not 100% sure
> of the way I
> implemented the "nested" structure when the "multiWords" parameter is
> used.  My interest
> is that it should work smoothly with some other technology such as
> Prototype using the
> JSon output type.  Unfortunately, I will not be getting a chance to start
> on that coding until
> next week so it is up in the air as to if this structure will be conducive
> or not.  I am planning
> on providing more details in the documentations as far as how to utilize
> these modifications
> in Prototype and AJax when I get a chance (even provide links to a
> production site so you
> can see it in action and view the source if interested).  So stay tuned...
>
>    Thanks for everyones time,
>       Scott Tabar
>
> ---- Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
> : If you like, I can post the source code changes that I made to the
> : SpellCheckerRequestHandler, but at this time I am not ready to open a
> : JIRA issue and submit the changes back through the subversion.  I will
> : need to do a little more testing, documentation, and create some unit
> : tests to cover all of these changes, but what I have been able to
> : perform, it is working very well.
>
> Keep in mind "Yonik's Law Of Patches" ...
>
>         "A half-baked patch in Jira, with no documentation, no tests
>         and no backwards compatibility is better than no patch at all."
>         http://wiki.apache.org/solr/HowToContribute
>
> ...even if you don't think the code is "solid" yet, if you want to
> eventually make it available to people, making a "rough" version available
> to people early gives other people the opportunity to help you make it
> solid (by writing unit tests, fixing bugs, and adding documentation).
>
>
> -Hoss
>
>
>


-- 
Regards,

Cuong Hoang

Re: Spell Check Handler

Reply via email to