Here is the field type definition for ‘text’ field which is what I am using for
the indexed fields. Can you guys notice any obvious filter that could be the
issue?
---------------------------------------------------------------------------
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"
splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"
splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
</analyzer>
</fieldType>
From: Steven A Rowe [mailto:[email protected]]
Sent: Tuesday, April 05, 2011 12:28 PM
To: [email protected]
Subject: RE: question on solr.ASCIIFoldingFilterFactory
I added this test method locally to TestASCIIFoldingFilter.java in the
Lucene/Solr 3.1.0 source
tree, and it passed, so the filter is not the problem (and the Solr factory
certainly isn't
either - it's just a wrapper) - I second Ludovic's question - you must have
other filters
configured:
public void testPluralNotTrimmed() throws Exception {
TokenStream stream = new WhitespaceTokenizer(TEST_VERSION_CURRENT, new
StringReader
("después Imágenes"));
ASCIIFoldingFilter filter = new ASCIIFoldingFilter(stream);
CharTermAttribute termAtt = filter.getAttribute(CharTermAttribute.class);
assertTermEquals("despues", filter, termAtt);
assertTermEquals("Imagenes", filter, termAtt);
}
Steve