Re: question on solr.ASCIIFoldingFilterFactory

Markus Jelsma Tue, 05 Apr 2011 10:40:15 -0700

It's not the ASCII folding filter but the stemmer that's removing some trailing 
characters. Something you can easily spot on the analysis page.


> Here is the field type definition for ‘text’ field which is what I am using
> for the indexed fields.  Can you guys notice any obvious filter that could
> be the issue?
> 
> ---------------------------------------------------------------------------
> 
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
> 
>       <analyzer type="index">
> 
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> 
>         <!-- in this example, we will only use synonyms at query time
> 
>         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
> 
>         -->
> 
>         <!-- Case insensitive stop word removal.
> 
>           add enablePositionIncrements=true in both the index and query
> 
>           analyzers to leave a 'gap' for more accurate phrase queries.
> 
>         -->
> 
>         <filter class="solr.StopFilterFactory"
> 
>                 ignoreCase="true"
> 
>                 words="stopwords.txt"
> 
>                 enablePositionIncrements="true"
> 
>                 />
> 
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
>         <filter class="solr.LowerCaseFilterFactory"/>
> 
>         <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
> 
>                                 <filter
> class="solr.ASCIIFoldingFilterFactory"/>
> 
>       </analyzer>
> 
>       <analyzer type="query">
> 
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> 
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> 
>         <filter class="solr.StopFilterFactory"
> 
>                 ignoreCase="true"
> 
>                 words="stopwords.txt"
> 
>                 enablePositionIncrements="true"
> 
>                 />
> 
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
>         <filter class="solr.LowerCaseFilterFactory"/>
> 
>         <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
> 
>       </analyzer>
> 
>     </fieldType>
> 
> 
> 
> From: Steven A Rowe [mailto:[email protected]]
> Sent: Tuesday, April 05, 2011 12:28 PM
> To: [email protected]
> Subject: RE: question on solr.ASCIIFoldingFilterFactory
> 
> 
> 
> I added this test method locally to TestASCIIFoldingFilter.java in the
> Lucene/Solr 3.1.0 source
> 
> tree, and it passed, so the filter is not the problem (and the Solr factory
> certainly isn't
> 
> either - it's just a wrapper) - I second Ludovic's question - you must have
> other filters
> 
> configured:
> 
> 
> 
>   public void testPluralNotTrimmed() throws Exception {
> 
>     TokenStream stream = new WhitespaceTokenizer(TEST_VERSION_CURRENT, new
> StringReader
> 
>       ("después Imágenes"));
> 
>     ASCIIFoldingFilter filter = new ASCIIFoldingFilter(stream);
> 
>     CharTermAttribute termAtt =
> filter.getAttribute(CharTermAttribute.class);
> 
> 
> 
>     assertTermEquals("despues", filter, termAtt);
> 
>     assertTermEquals("Imagenes", filter, termAtt);
> 
>   }
> 
> 
> 
> Steve

Re: question on solr.ASCIIFoldingFilterFactory

Reply via email to