Hi all,

I'm trying to adapt our old cocoon/lucene based web search application to one that is more solrish. Our old web app was capable of searching for queries with cyrillic characters in them. I'm finding that using the packaged example admin interface entering a query with a string of cyrillic characters causes a java.lang.ArrayIndexOutOfBoundsException. I've also noted that the url built from the search form is not utf-8 encoded. So obviously if I try to manipulate the query string by inserting a utf-8 encoded string in the q= parameter the values are interpreted incorrectly and as such I cannot use this approach as a work-around. My sample query is: ...... (the english word _canada_ translated into russian) or %D0%9A%D0%B0%D0%BD%D0%B0%D0%B4%D0%B0 (utf-8) or %26%231050%3B%26%231072%3B%26%231085%3B%26%231072%3B%26%231076%3B%26%231072%3B (solr url encoding)

I would appreciate any advice or suggestions that would allow me to search for cyrillics in solr. If anyone knows why solr is behaving as it does with the strange encoding, a brief explanation of what causes this behaviour could be helpful and what the encoding is (unicode?). If anyone else has force solr to accept utf-8 encoded q= parameters with success I would love to know how you did it.

Thanks in advance!
Tricia

ps. I am using mozilla firefox as my main browser which leads to the behaviour I reported above. IE 6.0 works fine for cyrillics although there is still a strange but different encoding (%CA%E0%ED%E0%E4%E0 for the same query as before).

Reply via email to