porter stemmer turns 'international' into 'intern'

On Mon, Feb 22, 2010 at 6:57 PM, cjkadakia <cjkada...@sonicbids.com> wrote:

>
> I'm getting very odd behavior from a wildcard search.
>
> For example, when I'm searching for docs with a name containing the word
> "International" the following occur:
>
> q=name:(inte*) -- found "International"
> q=name:(intern*) -- found "International"
> q=name:(interna*) -- did not find "International"
> q=name:(internat*) -- did not find "International"
> .. adding 1 character at a time did not find "International"
> q=name:(international*) -- did not find "International"
>
> As indicated, the behavior is quite bizarre and causing issues with our use
> and test cases. Is there something I can set for the fieldType of text in
> order to make these kinds of searches working? Also, any insight as to why
> this is not working would be a big help as well.
>
> Pasted for reference:
>    <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <!-- in this example, we will only use synonyms at query time
>        <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>        -->
>        <!-- Case insensitive stop word removal.
>          add enablePositionIncrements=true in both the index and query
>          analyzers to leave a 'gap' for more accurate phrase queries.
>        -->
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
>      </analyzer>
>    </fieldType>
>
> --
> View this message in context:
> http://old.nabble.com/Odd-wildcard-behavior-tp27695404p27695404.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Robert Muir
rcm...@gmail.com

Reply via email to