Re: Spell Check Handler

climbingrose Thu, 11 Oct 2007 03:57:36 -0700

Just to clarify this line of code:

String[] suggestions = spellChecker.suggestSimilar(termText, numSug,
req.getSearcher().getReader(), restrictToField, true);


I only return suggestions if they are more popular than termText. You
probably need to use code in Scott's patch to make this behaviour
configurable.

On 10/11/07, climbingrose <[EMAIL PROTECTED]> wrote:
>
> Hi all,
>
> I've been so busy the last few days so I haven't replied to this email. I
> modified SpellCheckerHandler a while ago to include support for multiword
> query. To be honest, I didn't have time to write unit test for the code.
> However, I deployed it in a production environment and it has been working
> for me so far. My version, however, has two assumptions:
>
> 1) I assumpt that when user enter a misspelled multiword query, we should
> only check for words that are actually misspelled. For example, if user
> enter "life expectancy calculatar", which has "calculator" misspelled, we
> should only spellcheck "calculatar".
> 2) I only return the best string for a mispelled query.
>
> I guess I can just directly paste the code here so that others can adapt
> for their own purposes. If you have any question, just send me an email.
> I'll happy to help  you.
>
>         StringBuffer buf = null;
>         if (null != words && !"".equals(words.trim())) {
>             Analyzer analyzer = req.getSchema
> ().getField(field).getType().getAnalyzer();
>
>             TokenStream source = analyzer.tokenStream(field, new
> StringReader(words));
>             Token t;
>             boolean hasSuggestion = false;
>             boolean termExists = false;
>             while (true) {
>                 try {
>                     t = source.next();
>                 } catch (IOException e) {
>                     t = null;
>                 }
>                 if (t == null)
>                     break;
>
>                 String termText = t.termText();
>                 String[] suggestions = spellChecker.suggestSimilar(termText,
> numSug, req.getSearcher().getReader(), restrictToField, true);
>                 if (suggestions != null && suggestions.length > 0) {
>                     if (!suggestions[0].equals(termText)) {
>                         hasSuggestion = true;
>                     }
>                     if (buf == null) {
>                         buf = new StringBuffer(suggestions[0]);
>                     } else
>                         buf.append(" ").append(suggestions[0]);
>                 } else if (spellChecker.exist(termText)){
>                     termExists = true;
>                     if (buf == null) {
>                         buf = new StringBuffer(termText);
>                     } else
>                         buf.append(" ").append(termText);
>                 } else {
>                     hasSuggestion = false;
>                     termExists= false;
>                     break;
>                 }
>             }
>             try {
>                 source.close();
>             } catch (IOException e) {
>                 // ignore
>             }
>             // String[] suggestions = spellChecker.suggestSimilar(words,
> numSug,
>             // nullReader, restrictToField, onlyMorePopular);
>             if (hasSuggestion || (!hasSuggestion && termExists))
>                 rsp.add("suggestions", buf.toString());
>             else
>                 rsp.add("suggestions", null);
>
>
>
> On 10/11/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> >
> > Hoss,
> >
> > I had a feeling someone would be quoting Yonik's Law of Patches!  ;-)
> >
> > For now, this is done.
> >
> > I created the changes, created JavaDoc comments on the various settings
> > and their expected output, created a JUnit test for the
> > SpellCheckerRequestHandler
> > which tests various components of the handler, and I also created the
> > supporting configuration files for the JUnit tests (schema and
> > solrconfig files).
> >
> > I attached the patch to the JIRA issue so now we just have to wait until
> > it gets
> > added back in to the main code stream.
> >
> > For anyone who is interested, here is a link to the JIRA:
> > https://issues.apache.org/jira/browse/SOLR-375
> >
> > Could someone please drop me a hint on how to update the wiki or any
> > other
> > documentation that could benefit to being updated; I'll like to help out
> > as much
> > as possible, but first I need to know "how". ;-)
> >
> > When these changes do get committed back in to the daily build, please
> > review the generated JavaDoc for information on how to utilize these new
> > features.
> > If anyone has any questions, or comments, please do not hesitate to ask.
> >
> >
> > As a general note of a self-critique on these changes, I am not 100%
> > sure of the way I
> > implemented the "nested" structure when the "multiWords" parameter is
> > used.  My interest
> > is that it should work smoothly with some other technology such as
> > Prototype using the
> > JSon output type.  Unfortunately, I will not be getting a chance to
> > start on that coding until
> > next week so it is up in the air as to if this structure will be
> > conducive or not.  I am planning
> > on providing more details in the documentations as far as how to utilize
> > these modifications
> > in Prototype and AJax when I get a chance (even provide links to a
> > production site so you
> > can see it in action and view the source if interested).  So stay
> > tuned...
> >
> >    Thanks for everyones time,
> >       Scott Tabar
> >
> > ---- Chris Hostetter <[EMAIL PROTECTED]> wrote:
> >
> > : If you like, I can post the source code changes that I made to the
> > : SpellCheckerRequestHandler, but at this time I am not ready to open a
> > : JIRA issue and submit the changes back through the subversion.  I will
> > : need to do a little more testing, documentation, and create some unit
> > : tests to cover all of these changes, but what I have been able to
> > : perform, it is working very well.
> >
> > Keep in mind "Yonik's Law Of Patches" ...
> >
> >         "A half-baked patch in Jira, with no documentation, no tests
> >         and no backwards compatibility is better than no patch at all."
> >         http://wiki.apache.org/solr/HowToContribute
> >
> > ...even if you don't think the code is "solid" yet, if you want to
> > eventually make it available to people, making a "rough" version
> > available
> > to people early gives other people the opportunity to help you make it
> > solid (by writing unit tests, fixing bugs, and adding documentation).
> >
> >
> > -Hoss
> >
> >
> >
>
>
> --
> Regards,
>
> Cuong Hoang




-- 
Regards,

Cuong Hoang

Re: Spell Check Handler

Reply via email to