Latest news! It was a simple bad spelling Tomcat issue in server.xml. I specified "utf8" instead of "UTF-8". After that the problem was solved and everything is O.K. However, I hope that this thread could be useful for someone because this kind of latin encoding problems are very common.
Goodbye! 2012/11/7 Luis Cappa Banda <luisca...@gmail.com> > Hello! > > I´ve got some encoding problems with my currently new analyzer > configuration. I´ve deployed a Solr server in Apache Tomcat setting > Tomcat´s encoding to UTF-8 in server.xml. Also Solr´s encoding is setted to > UTF-8 in schema.xml. I have defined a fieldType like the following: > > * <fieldType name="textSearch" class="solr.TextField" > positionIncrementGap="100">* > * <analyzer>* > * <charFilter class="solr.MappingCharFilterFactory" > mapping="charsToRemove.txt"/>* > * <tokenizer class="solr.WhitespaceTokenizerFactory"/>* > * <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords_es.txt"/>* > * <filter class="solr.WordDelimiterFilterFactory"* > * splitOnCaseChange="1"* > * splitOnNumerics="1"* > * stemEnglishPossessive="1"* > * generateWordParts="1"* > * generateNumberParts="1"* > * preserveOriginal="1"* > * />* > * <filter class="solr.ASCIIFoldingFilterFactory"/>* > * <filter class="solr.SnowballPorterFilterFactory" language="Spanish" />* > * <filter class="solr.LowerCaseFilterFactory"/>* > * <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>* > * </analyzer> * > * </fieldType>* > > > I don´t know why, but inmediatly translates an input like "sueños" > (dreams, in English) to something like "sueños". That produces that > WordDelimiterFilterFactory splits the token into "sue à os", with obviously > affects directly to search queries which includes de original "sueños" > term. It looks like that Solr encoding isn´t UTF-8. > > Any tips or suggestions? > > Thank you very much. > > -- > > - Luis Cappa > > -- - Luis Cappa