Steven A Rowe the solution you have proposed doesn't work, thanks anyway. Regards
On 6/23/11, Steven A Rowe <sar...@syr.edu> wrote: > Hi Ariel, > > On 6/23/2011 at 12:34 PM, Ariel wrote: >> But it still doesn't convert the code to the correct character, for >> instance: Espa&ntilde;a must be converted to EspaƱa but it still >> remains as Espa&ntilde;a. > > So it looks like your text processing tool(s) escape markup meta-characters > (e.g. "&" -> "&") after escaping above-ASCII characters to their named > entity equivalents (e.g. "n" with a tilde to "ñ"). This two-level > escaping appears to be the problem. > > According to the analysis.jsp output you sent, your original text > "Espa&ntilde;a" was converted to "Espa&ndilde;a" - the first level of > escaping was reversed. > > I suspect you could fix the problem by including HTMLStripCharFilter twice, > e.g.: > > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > ... > > Good luck, > Steve > >