Hi Ariel, On 6/23/2011 at 12:34 PM, Ariel wrote: > But it still doesn't convert the code to the correct character, for > instance: España must be converted to EspaƱa but it still > remains as España.
So it looks like your text processing tool(s) escape markup meta-characters (e.g. "&" -> "&") after escaping above-ASCII characters to their named entity equivalents (e.g. "n" with a tilde to "ñ"). This two-level escaping appears to be the problem. According to the analysis.jsp output you sent, your original text "Espa&ntilde;a" was converted to "Espa&ndilde;a" - the first level of escaping was reversed. I suspect you could fix the problem by including HTMLStripCharFilter twice, e.g.: <charFilter class="solr.HTMLStripCharFilterFactory"/> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> ... Good luck, Steve