RE: question on solr.ASCIIFoldingFilterFactory

Nemani, Raj Tue, 05 Apr 2011 10:33:37 -0700

Here is the field type definition for ‘text’ field which is what I am using for 
the indexed fields.  Can you guys notice any obvious filter that could be the 
issue?


---------------------------------------------------------------------------

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">

      <analyzer type="index">

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <!-- in this example, we will only use synonyms at query time

        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" 
ignoreCase="true" expand="false"/>

        -->

        <!-- Case insensitive stop word removal.

          add enablePositionIncrements=true in both the index and query

          analyzers to leave a 'gap' for more accurate phrase queries.

        -->

        <filter class="solr.StopFilterFactory"

                ignoreCase="true"

                words="stopwords.txt"

                enablePositionIncrements="true"

                />

        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" 
splitOnCaseChange="1"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.SnowballPorterFilterFactory" language="English" 
protected="protwords.txt"/>

                                <filter class="solr.ASCIIFoldingFilterFactory"/>

      </analyzer>

      <analyzer type="query">

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>

        <filter class="solr.StopFilterFactory"

                ignoreCase="true"

                words="stopwords.txt"

                enablePositionIncrements="true"

                />

        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" 
splitOnCaseChange="1"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.SnowballPorterFilterFactory" language="English" 
protected="protwords.txt"/>

      </analyzer>

    </fieldType>

 

From: Steven A Rowe [mailto:[email protected]] 
Sent: Tuesday, April 05, 2011 12:28 PM
To: [email protected]
Subject: RE: question on solr.ASCIIFoldingFilterFactory

 

I added this test method locally to TestASCIIFoldingFilter.java in the 
Lucene/Solr 3.1.0 source

tree, and it passed, so the filter is not the problem (and the Solr factory 
certainly isn't

either - it's just a wrapper) - I second Ludovic's question - you must have 
other filters

configured:

 

  public void testPluralNotTrimmed() throws Exception {

    TokenStream stream = new WhitespaceTokenizer(TEST_VERSION_CURRENT, new 
StringReader

      ("después Imágenes"));

    ASCIIFoldingFilter filter = new ASCIIFoldingFilter(stream);

    CharTermAttribute termAtt = filter.getAttribute(CharTermAttribute.class);

 

    assertTermEquals("despues", filter, termAtt);

    assertTermEquals("Imagenes", filter, termAtt);

  }  

 

Steve

RE: question on solr.ASCIIFoldingFilterFactory

Reply via email to