Hello! I´ve got some encoding problems with my currently new analyzer configuration. I´ve deployed a Solr server in Apache Tomcat setting Tomcat´s encoding to UTF-8 in server.xml. Also Solr´s encoding is setted to UTF-8 in schema.xml. I have defined a fieldType like the following:
* <fieldType name="textSearch" class="solr.TextField" positionIncrementGap="100">* * <analyzer>* * <charFilter class="solr.MappingCharFilterFactory" mapping="charsToRemove.txt"/>* * <tokenizer class="solr.WhitespaceTokenizerFactory"/>* * <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_es.txt"/>* * <filter class="solr.WordDelimiterFilterFactory"* * splitOnCaseChange="1"* * splitOnNumerics="1"* * stemEnglishPossessive="1"* * generateWordParts="1"* * generateNumberParts="1"* * preserveOriginal="1"* * />* * <filter class="solr.ASCIIFoldingFilterFactory"/>* * <filter class="solr.SnowballPorterFilterFactory" language="Spanish" />* * <filter class="solr.LowerCaseFilterFactory"/>* * <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>* * </analyzer> * * </fieldType>* I don´t know why, but inmediatly translates an input like "sueños" (dreams, in English) to something like "sueños". That produces that WordDelimiterFilterFactory splits the token into "sue à os", with obviously affects directly to search queries which includes de original "sueños" term. It looks like that Solr encoding isn´t UTF-8. Any tips or suggestions? Thank you very much. -- - Luis Cappa