Hi all,
I'm trying to adapt our old cocoon/lucene based web search application
to one that is more solrish. Our old web app was capable of searching for
queries with cyrillic characters in them. I'm finding that using the
packaged example admin interface entering a query with a string of
cyrillic characters causes a java.lang.ArrayIndexOutOfBoundsException.
I've also noted that the url built from the search form is not utf-8
encoded. So obviously if I try to manipulate the query string by
inserting a utf-8 encoded string in the q= parameter the values are
interpreted incorrectly and as such I cannot use this approach as a
work-around. My sample query is: ...... (the english word _canada_
translated into russian) or
%D0%9A%D0%B0%D0%BD%D0%B0%D0%B4%D0%B0 (utf-8) or
%26%231050%3B%26%231072%3B%26%231085%3B%26%231072%3B%26%231076%3B%26%231072%3B
(solr url encoding)
I would appreciate any advice or suggestions that would allow me
to search for cyrillics in solr. If anyone knows why solr is behaving as
it does with the strange encoding, a brief explanation of what causes this
behaviour could be helpful and what the encoding is (unicode?). If anyone
else has force solr to accept utf-8 encoded q= parameters with success I
would love to know how you did it.
Thanks in advance!
Tricia
ps. I am using mozilla firefox as my main browser which leads to the
behaviour I reported above. IE 6.0 works fine for cyrillics although
there is still a strange but different encoding (%CA%E0%ED%E0%E4%E0 for
the same query as before).