I have utf-8 content that I wat to index, however I want searches
without diacritics to return results.
For example, a document with the words "nino en mexico" should return
results like a document with the phrase "Niño en México".
Ideally, exact diacritic matches should score higher (searching for
"niño" exactly should make a document with "niño" score higher than a
document with "nino")
Any pointers on how to do this? I found about the
/solr/.ISOLatin1AccentFilterFactory but it seems to only strip
diacritics from iso-latin characters. How about UTF diacritics?
--
_________________ ___ _ _ _ _ _ _ _
*Ing. Alejandro Garza González*
Director, Tecnología e Innovación, Biblioteca
Tecnológico de Monterrey, Campus Monterrey
Tel.: 52(81) 8358-1400 ext. 4037 Fax: 52(81) 8328-4067
Enlace Intercampus: 80 689 4037
http://biblioteca.mty.itesm.mx
El contenido de este mensaje de datos no se considera oferta, propuesta
o acuerdo, sino hasta que sea confirmado en documento por escrito que
contenga la firma autógrafa del apoderado legal del ITESM. El contenido
de este mensaje de datos es confidencial y se entiende dirigido y para
uso exclusivo del destinatario, por lo que no podrá distribuirse y/o
difundirse por ningún medio sin la previa autorización del emisor
original. Si usted no es el destinatario, se le prohíbe su utilización
total o parcial para cualquier fin.
The content of this data transmission must not be considered an offer,
proposal, understanding or agreement unless it is confirmed in a
document signed by a legal representative of ITESM. The content of this
data transmission is confidential and is intended to be delivered only
to the addressees. Therefore, it shall not be distributed and/or
disclosed through any means without the authorization of the original
sender. If you are not the addressee, you are forbidden from using it,
either totally or partially, for any purpose.