Actually, on second thought, I think you should be able to do this directly, but I don't have the highlighter magic at my fingertips. The field type analyzer simply needs to map the accented characters; the character positions of the accented and unaccented tokens should line up fine. Really, it is no different that highlighting tokens that have differences in upper and lower case.

-- Jack Krupansky

-----Original Message----- From: Jack Krupansky
Sent: Monday, July 15, 2013 9:13 AM
To: solr-user@lucene.apache.org
Subject: Re: How to Indicate Solr That: Both Ascified and Non-Ascii versions of tokens are same?

Either do a custom highlighter or preprocess the query and generate an "OR"
of the accented and unaccented terms. Solr has no magic feature to do both.
Sure, you could do a token filter that duplicated each term and included
both the accented and unaccented versions, but... it gets messy and is a
pain with phrases.

It is worth a Jira though.

-- Jack Krupansky

-----Original Message----- From: Furkan KAMACI
Sent: Monday, July 15, 2013 9:06 AM
To: solr-user@lucene.apache.org
Subject: How to Indicate Solr That: Both Ascified and Non-Ascii versions of
tokens are same?

When I search something which has non ASCII characters at Google it returns
me results both original and ascified versions and *highlights both of
them*.
For example if I search *çiğli* at Google first result is that:

*Çiğli* Belediyesi
www.*cigli*.bel.tr/

How can I do that at Solr? How can I indicate that to Solr: *Both Ascified
and Non-Ascii versions of tokens are same?**
*

Reply via email to