Skip to site navigation (Press enter)

Re: How to Indicate Solr That: Both Ascified and Non-Ascii versions of tokens are same?

Jack Krupansky Mon, 15 Jul 2013 06:19:38 -0700

Actually, on second thought, I think you should be able to do this directly,but I don't have the highlighter magic at my fingertips. The field typeanalyzer simply needs to map the accented characters; the characterpositions of the accented and unaccented tokens should line up fine. Really,it is no different that highlighting tokens that have differences in upperand lower case.


-- Jack Krupansky

-----Original Message-----From: Jack Krupansky

Sent: Monday, July 15, 2013 9:13 AM
To: solr-user@lucene.apache.org

Subject: Re: How to Indicate Solr That: Both Ascified and Non-Ascii versionsof tokens are same?


Either do a custom highlighter or preprocess the query and generate an "OR"
of the accented and unaccented terms. Solr has no magic feature to do both.
Sure, you could do a token filter that duplicated each term and included
both the accented and unaccented versions, but... it gets messy and is a
pain with phrases.

It is worth a Jira though.

-- Jack Krupansky

-----Original Message-----From: Furkan KAMACI

Sent: Monday, July 15, 2013 9:06 AM
To: solr-user@lucene.apache.org
Subject: How to Indicate Solr That: Both Ascified and Non-Ascii versions of
tokens are same?

When I search something which has non ASCII characters at Google it returns
me results both original and ascified versions and *highlights both of
them*.
For example if I search *çiğli* at Google first result is that:

*Çiğli* Belediyesi
www.*cigli*.bel.tr/

How can I do that at Solr? How can I indicate that to Solr: *Both Ascified
and Non-Ascii versions of tokens are same?**

*