porter stemmer turns 'international' into 'intern' On Mon, Feb 22, 2010 at 6:57 PM, cjkadakia <cjkada...@sonicbids.com> wrote:
> > I'm getting very odd behavior from a wildcard search. > > For example, when I'm searching for docs with a name containing the word > "International" the following occur: > > q=name:(inte*) -- found "International" > q=name:(intern*) -- found "International" > q=name:(interna*) -- did not find "International" > q=name:(internat*) -- did not find "International" > .. adding 1 character at a time did not find "International" > q=name:(international*) -- did not find "International" > > As indicated, the behavior is quite bizarre and causing issues with our use > and test cases. Is there something I can set for the fieldType of text in > order to make these kinds of searches working? Also, any insight as to why > this is not working would be a big help as well. > > Pasted for reference: > <fieldType name="text" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <!-- in this example, we will only use synonyms at query time > <filter class="solr.SynonymFilterFactory" > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > --> > <!-- Case insensitive stop word removal. > add enablePositionIncrements=true in both the index and query > analyzers to leave a 'gap' for more accurate phrase queries. > --> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords.txt" > enablePositionIncrements="true" > /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="English" > protected="protwords.txt"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords.txt" > enablePositionIncrements="true" > /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="English" > protected="protwords.txt"/> > </analyzer> > </fieldType> > > -- > View this message in context: > http://old.nabble.com/Odd-wildcard-behavior-tp27695404p27695404.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Robert Muir rcm...@gmail.com