Your first definition of text_fr seems to be correct and should work as expected. I tested it and worked fine ("mémé" was highlighted).
What was the output of HTMLStripCharFilterFactory in analysis.jsp? In my analysis.jsp, I got "ça va mémé ?". Koji Kundig, Andreas wrote:
Hello I indexed an html document with a decimal HTML Entity encodings: the character é (e with an acute accent) is encoded as é The exact content of the document is: <html><body>ça va mémé ?</body></html> A search for 'mémé' returns no document. If I put the line above in solr admin's analysis.jsp it also doesn't match mémé. There is only a match if I replace é by é . This is how I configured the fieldType: <fieldType name="text_fr" class="solr.TextField" positionIncrementGap="100"> <analyzer> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer> </fieldType> I tried avoiding the problem by using the MappingCharFilterFactory: <fieldType name="text_fr" class="solr.TextField" positionIncrementGap="100"> <analyzer> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer> </fieldType> I put the file mapping.txt in the conf directory. It contains just this: "é" => "é" This doesn't work either. How can I get this to work? (I am using solr 1.4.0) thank you Andréas Kündig World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using.
-- http://www.rondhuit.com/en/