I have utf-8 content that I wat to index, however I want searches without diacritics to return results.

For example, a document with the words "nino en mexico" should return results like a document with the phrase "Niño en México".

Ideally, exact diacritic matches should score higher (searching for "niño" exactly should make a document with "niño" score higher than a document with "nino")

Any pointers on how to do this? I found about the /solr/.ISOLatin1AccentFilterFactory but it seems to only strip diacritics from iso-latin characters. How about UTF diacritics?
--
_________________ ___ _ _ _ _ _ _ _
*Ing. Alejandro Garza González*
Director, Tecnología e Innovación, Biblioteca
Tecnológico de Monterrey, Campus Monterrey

Tel.: 52(81) 8358-1400 ext. 4037 Fax: 52(81) 8328-4067
Enlace Intercampus: 80 689 4037
http://biblioteca.mty.itesm.mx

El contenido de este mensaje de datos no se considera oferta, propuesta o acuerdo, sino hasta que sea confirmado en documento por escrito que contenga la firma autógrafa del apoderado legal del ITESM. El contenido de este mensaje de datos es confidencial y se entiende dirigido y para uso exclusivo del destinatario, por lo que no podrá distribuirse y/o difundirse por ningún medio sin la previa autorización del emisor original. Si usted no es el destinatario, se le prohíbe su utilización total o parcial para cualquier fin.

The content of this data transmission must not be considered an offer, proposal, understanding or agreement unless it is confirmed in a document signed by a legal representative of ITESM. The content of this data transmission is confidential and is intended to be delivered only to the addressees. Therefore, it shall not be distributed and/or disclosed through any means without the authorization of the original sender. If you are not the addressee, you are forbidden from using it, either totally or partially, for any purpose.

Reply via email to