Best way to index without diacritics

Alejandro Garza Gonzalez Mon, 11 Aug 2008 16:22:36 -0700

I have utf-8 content that I wat to index, however I want searcheswithout diacritics to return results.

For example, a document with the words "nino en mexico" should returnresults like a document with the phrase "Niño en México".

Ideally, exact diacritic matches should score higher (searching for"niño" exactly should make a document with "niño" score higher than adocument with "nino")

Any pointers on how to do this? I found about the/solr/.ISOLatin1AccentFilterFactory but it seems to only stripdiacritics from iso-latin characters. How about UTF diacritics?

--
_________________ ___ _ _ _ _ _ _ _
*Ing. Alejandro Garza González*
Director, Tecnología e Innovación, Biblioteca
Tecnológico de Monterrey, Campus Monterrey

Tel.: 52(81) 8358-1400 ext. 4037 Fax: 52(81) 8328-4067
Enlace Intercampus: 80 689 4037
http://biblioteca.mty.itesm.mx

El contenido de este mensaje de datos no se considera oferta, propuestao acuerdo, sino hasta que sea confirmado en documento por escrito quecontenga la firma autógrafa del apoderado legal del ITESM. El contenidode este mensaje de datos es confidencial y se entiende dirigido y parauso exclusivo del destinatario, por lo que no podrá distribuirse y/odifundirse por ningún medio sin la previa autorización del emisororiginal. Si usted no es el destinatario, se le prohíbe su utilizacióntotal o parcial para cualquier fin.

The content of this data transmission must not be considered an offer,proposal, understanding or agreement unless it is confirmed in adocument signed by a legal representative of ITESM. The content of thisdata transmission is confidential and is intended to be delivered onlyto the addressees. Therefore, it shall not be distributed and/ordisclosed through any means without the authorization of the originalsender. If you are not the addressee, you are forbidden from using it,either totally or partially, for any purpose.

Best way to index without diacritics

Reply via email to