Hi Alejandro, Solr is Unicode aware. The ISOLatin1AccentFilterFactory handles diacritics for the ISO Latin-1 section of the Unicode character set. UTF (do you mean UTF-8?) is a (set of) Unicode serialization(s), and once Solr has deserialized it, it is just Unicode characters (Java's in-memory UTF-16 representation).
So as long as you're only concerned about removing diacritics from the set of Unicode characters that overlaps ISO Latin-1, and not about other Unicode characters, then ISOLatin1AccentFilterFactory should work for you. Steve On 08/11/2008 at 7:22 PM, Alejandro Garza Gonzalez wrote: > I have utf-8 content that I wat to index, however I want searches > without diacritics to return results. > > For example, a document with the words "nino en mexico" should return > results like a document with the phrase "Niño en México". > > Ideally, exact diacritic matches should score higher (searching for > "niño" exactly should make a document with "niño" score higher than a > document with "nino") > > Any pointers on how to do this? I found about the > /solr/.ISOLatin1AccentFilterFactory but it seems to only strip > diacritics from iso-latin characters. How about UTF diacritics? -- > _________________ ___ _ _ _ _ _ _ _ *Ing. Alejandro Garza González* > Director, Tecnología e Innovación, Biblioteca Tecnológico de Monterrey, > Campus Monterrey > > Tel.: 52(81) 8358-1400 ext. 4037 Fax: 52(81) 8328-4067 > Enlace Intercampus: 80 689 4037 > http://biblioteca.mty.itesm.mx > > El contenido de este mensaje de datos no se considera oferta, propuesta > o acuerdo, sino hasta que sea confirmado en documento por escrito que > contenga la firma autógrafa del apoderado legal del ITESM. El contenido > de este mensaje de datos es confidencial y se entiende dirigido y para > uso exclusivo del destinatario, por lo que no podrá distribuirse y/o > difundirse por ningún medio sin la previa autorización del emisor > original. Si usted no es el destinatario, se le prohíbe su utilización > total o parcial para cualquier fin. > > The content of this data transmission must not be considered > an offer, > proposal, understanding or agreement unless it is confirmed in a > document signed by a legal representative of ITESM. The > content of this > data transmission is confidential and is intended to be > delivered only > to the addressees. Therefore, it shall not be distributed and/or > disclosed through any means without the authorization of the original > sender. If you are not the addressee, you are forbidden from > using it, > either totally or partially, for any purpose. > >