Re: Cyrillic characters

WHIRLYCOTT Tue, 18 Jul 2006 15:09:31 -0700

Crap, you're right. I have a well-tested application that's usingUTF-8 everywhere possible and I just tested with some Russian text.Solr's coughing up this as an exception:


Jul 18, 2006 6:00:05 PM org.apache.solr.core.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 1

at org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:141)atorg.apache.solr.request.StandardRequestHandler.handleRequest(StandardRequestHandler.java:96)

        at org.apache.solr.core.SolrCore.execute(SolrCore.java:592)

at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:94)

        at javax.servlet.http.HttpServlet.service(HttpServlet.java:596)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)

at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:428)at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:473)at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:568)

        at org.mortbay.http.HttpContext.handle(HttpContext.java:1530)

at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:633)

        at org.mortbay.http.HttpContext.handle(HttpContext.java:1482)
        at org.mortbay.http.HttpServer.service(HttpServer.java:909)

at org.mortbay.http.HttpConnection.service(HttpConnection.java:820)at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:986)at org.mortbay.http.HttpConnection.handle(HttpConnection.java:837)at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:245)at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

You're going directly against Solr/Jetty, right? Not proxied ormod_rewrite'd through to Apache?

Solr isn't properly encoding the data being received by the servlet.I think that I can fix this using some of the tricks that I'velearned in building my site. More later.


How much testing have people done using UTF-8 data on Solr?

phil.



On Jul 18, 2006, at 5:53 PM, Tricia Williams wrote:

Hi all,
I'm trying to adapt our old cocoon/lucene based web searchapplication to one that is more solrish. Our old web app wascapable of searching for queries with cyrillic characters in them.I'm finding that using the packaged example admin interfaceentering a query with a string of cyrillic characters causes ajava.lang.ArrayIndexOutOfBoundsException. I've also noted that theurl built from the search form is not utf-8 encoded. So obviouslyif I try to manipulate the query string by inserting a utf-8encoded string in the q= parameter the values are interpretedincorrectly and as such I cannot use this approach as a work-around. My sample query is: ...... (the english word _canada_translated into russian) or %D0%9A%D0%B0%D0%BD%D0%B0%D0%B4%D0%B0(utf-8) or %26%231050%3B%26%231072%3B%26%231085%3B%26%231072%3B%26%231076%3B%26%231072%3B (solr url encoding)
I would appreciate any advice or suggestions that would allow meto search for cyrillics in solr. If anyone knows why solr isbehaving as it does with the strange encoding, a brief explanationof what causes this behaviour could be helpful and what theencoding is (unicode?). If anyone else has force solr to acceptutf-8 encoded q= parameters with success I would love to know howyou did it.
Thanks in advance!
Tricia
ps. I am using mozilla firefox as my main browser which leads tothe behaviour I reported above. IE 6.0 works fine for cyrillicsalthough there is still a strange but different encoding (%CA%E0%ED%E0%E4%E0 for the same query as before).



--
                                   Whirlycott
                                   Philip Jacob
                                   [EMAIL PROTECTED]
                                   http://www.whirlycott.com/phil/

Re: Cyrillic characters

Reply via email to