Re: Spell checking not returning "full" terms

Rupert Fiasco Wed, 04 Feb 2009 16:55:14 -0800

Awesome! After reading up on the links you sent me I got it all working. Thanks!


FYI - I did previously come across one of the links you sent over:

http://wiki.apache.org/solr/SpellCheckerRequestHandler

But what threw me off is that when I started reading about that
yesterday, in the first paragraph it says that this component is
deprecated and to use SpellCheckComponent - so at that point I stopped
reading and went over to the component page. If I had kept reading I
would have encountered all of the gritty details that I in fact needed
to get it to work. The wiki entry makes it seem old and deprecated and
is no longer relevant, but it certainly is.

-Rupert

On Wed, Feb 4, 2009 at 11:57 AM, Grant Ingersoll <gsing...@apache.org> wrote:
> I'm guessing the field you are checking against is being stemmed.  The field
> you spell check against should have minimal analysis done to it, i.e.
> tokenization and probably downcasing.  See
> http://wiki.apache.org/solr/SpellCheckComponent and
> http://wiki.apache.org/solr/SpellCheckerRequestHandler for tips on how to
> handle analysis for spelling.
>
> On Feb 4, 2009, at 2:33 PM, Rupert Fiasco wrote:
>
>> We are using Solr 1.3 and trying to get spell checking functionality.
>>
>> FYI, our index contains a lot of medical terms (which might or might
>> not make a difference as they are not English-y words, if that makes
>> any sense?)
>>
>> If I specify a spellcheck query of "spellcheck.q=diabtes"
>>
>> I get suggestions of:
>>
>> <str>diabet</str>
>> <str>diabetogen</str>
>> <str>dilat</str>
>> <str>diamet</str>
>> <str>diatom</str>
>> <str>diastol</str>
>> <str>diactin</str>
>> <str>dialect</str>
>>
>> If I re-mis-spell Diabetes to "q=diabets" then I go no suggestions.
>>
>> So first off two things:
>>
>> 1) Why would leaving out one "e" over the other affect the spelling
>> suggestions so substantially?
>> 2) In the former list of suggestions, notice the first suggestion is
>> "diabet", which isnt all that helpful, it should return something like
>> "diabetes" or maybe even "diabetic".
>>
>> Note that if I do a normal search against "diabetes" then I get a ton
>> of results, in other words, our index is filled with terms of
>> "diabetes".
>>
>> My relevant solrconfig is:
>>
>>
>>   <str name="queryAnalyzerFieldType">text</str>
>>
>>   <lst name="spellchecker">
>>     <str name="name">default</str>
>>     <str name="field">text_t</str>
>>     <str name="spellcheckIndexDir">./spellchecker1</str>
>>     <str name="accuracy">0.1</str>
>>
>>   </lst>
>>   <lst name="spellchecker">
>>     <str name="name">jarowinkler</str>
>>     <str name="field">text_t</str>
>>     <!-- Use a different Distance Measure -->
>>     <str
>> name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
>>     <str name="spellcheckIndexDir">./spellchecker2</str>
>>     <str name="accuracy">0.1</str>
>>
>>   </lst>
>>
>> and I have
>>
>> spellcheck.count = 8
>>
>> Notice that I severely bumped down the "accuracy" setting to get more
>> results. Bumping it up higher yields less results (not sure what
>> setting really meant so I dont know in what direction I want to change
>> that value - I am guessing that a lower value allows for more
>> mis-spellings, e.g. its more promiscuous).
>>
>> Our "text" and "text_t" fields are defined in schema.xml as:
>>
>> <field name="text" type="text" indexed="true" stored="false"
>> multiValued="true"/>
>> and
>> <dynamicField name="*_t" type="text"       indexed="true"
>> stored="true" multiValued="true" />
>>
>> Any help would be appreciated.
>>
>> Thanks
>> -Rupert
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>
>
>

Re: Spell checking not returning "full" terms

Reply via email to