Cyrillic characters

Tricia Williams Tue, 18 Jul 2006 14:54:24 -0700

Hi all,

I'm trying to adapt our old cocoon/lucene based web search applicationto one that is more solrish. Our old web app was capable of searching forqueries with cyrillic characters in them. I'm finding that using thepackaged example admin interface entering a query with a string ofcyrillic characters causes a java.lang.ArrayIndexOutOfBoundsException.I've also noted that the url built from the search form is not utf-8encoded. So obviously if I try to manipulate the query string byinserting a utf-8 encoded string in the q= parameter the values areinterpreted incorrectly and as such I cannot use this approach as awork-around. My sample query is: ...... (the english word _canada_translated into russian) or%D0%9A%D0%B0%D0%BD%D0%B0%D0%B4%D0%B0 (utf-8) or%26%231050%3B%26%231072%3B%26%231085%3B%26%231072%3B%26%231076%3B%26%231072%3B(solr url encoding)

I would appreciate any advice or suggestions that would allow meto search for cyrillics in solr. If anyone knows why solr is behaving asit does with the strange encoding, a brief explanation of what causes thisbehaviour could be helpful and what the encoding is (unicode?). If anyoneelse has force solr to accept utf-8 encoded q= parameters with success Iwould love to know how you did it.


Thanks in advance!
Tricia

ps. I am using mozilla firefox as my main browser which leads to thebehaviour I reported above. IE 6.0 works fine for cyrillics althoughthere is still a strange but different encoding (%CA%E0%ED%E0%E4%E0 forthe same query as before).

Cyrillic characters

Reply via email to