RE: LowerCaseFilterFactory and spellchecker

Norskog, Lance Wed, 28 Nov 2007 17:10:10 -0800

There are a few parameters for limiting what words are added to the
dictionary.  You might be trimming out 'thorne'. See this page:


http://wiki.apache.org/solr/SpellCheckerRequestHandler

-----Original Message-----
From: Rob Casson [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 28, 2007 4:25 PM
To: solr-user@lucene.apache.org
Subject: LowerCaseFilterFactory and spellchecker

think i'm just doing something wrong...

was experimenting with the spellcheck handler with the nightly checkout
from 11-28; seems my spellchecking is case-sensitive, even tho i think
i'm adding the LowerCaseFilterFactory to both the index and query
analyzers.

here's a brief rundown of my testing steps.

from schema.xml:

<fieldtype name="spell" class="solr.TextField"
positionIncrementGap="100">
        <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.StandardFilterFactory"/>
                <filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
        <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.StandardFilterFactory"/>
                <filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
</fieldtype>

<field name="title" type="text" indexed="true" stored="true"
multiValued="true"/>
<field name="spelling" type="spell" indexed="true" stored="stored"
multiValued="true"/>

<copyField source="title" dest="spelling"/>

--------------------------------

from solrconfig.xml:

<requestHandler name="spellchecker"
class="solr.SpellCheckerRequestHandler" startup="lazy">
        <lst name="defaults">
                <int name="suggestionCount">1</int>
                <float name="accuracy">0.5</float>
        </lst>
        <str name="spellcheckerIndexDir">spell</str>
        <str name="termSourceField">spelling</str>
</requestHandler>

--------------------------------

adding the doc:

curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
--data-binary '<add><doc><field
name="title">Thorne</field></doc></add>'
curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
--data-binary '<optimize />'

--------------------------------

building the spellchecker:

http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker&cmd=rebuild

--------------------------------

querying the spellchecker:

results from http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker

<?xml version="1.0" encoding="UTF-8"?>
<response>
        <lst name="responseHeader">
                <int name="status">0</int>
                <int name="QTime">1</int>
        </lst>
        <str name="words">Thorne</str>
        <str name="exist">false</str>
        <arr name="suggestions">
                <str>thorne</str>
        </arr>
</response>

results from http://localhost:8983/solr/select/?q=thorne&qt=spellchecker

<?xml version="1.0" encoding="UTF-8"?>
<response>
        <lst name="responseHeader">
                <int name="status">0</int>
                <int name="QTime">2</int>
        </lst>
                <str name="words">thorne</str>
                <str name="exist">true</str>
        <arr name="suggestions"/>
</response>


any pointers as to what i'm doing wrong, misinterpreting?  i suspect i'm
just doing something bone-headed in the analyzer sections...

thanks as always,

rob casson
miami university libraries

RE: LowerCaseFilterFactory and spellchecker

Reply via email to