Hi all,
I'm trying to adapt our old cocoon/lucene based web search application
to one that is more solrish. Our old web app was capable of searching for
queries with cyrillic characters in them. I'm finding that using the
packaged example admin interface entering a query with a string of
Crap, you're right. I have a well-tested application that's using
UTF-8 everywhere possible and I just tested with some Russian text.
Solr's coughing up this as an exception:
Jul 18, 2006 6:00:05 PM org.apache.solr.core.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 1
On 7/18/06, WHIRLYCOTT <[EMAIL PROTECTED]> wrote:
How much testing have people done using UTF-8 data on Solr?
UTF-8 query *output* is well tested with Resin within CNET.
Indexing UTF-8 is also well tested (again, mostly with Resin).
UTF-8 query input is not really tested at all AFAIK (the q par
OK, lets split up the indexing side from the query side for a moment
and assume that you are indexing correctly (setting the content-type
correctly, etc).
I just added a new value to the multi-valued features field to the
solr.xml example document:
"Good unicode support: héllo (hello with an acc
: ps. I am using mozilla firefox as my main browser which leads to the
: behaviour I reported above. IE 6.0 works fine for cyrillics although
: there is still a strange but different encoding (%CA%E0%ED%E0%E4%E0 for
: the same query as before).
The problem may not be in the Solr internals as mu
Definitely some Firefox bugs with UTF8 at least:
If I go to the admin screen, and paste in héllo into the query box,
then kill Solr and run netcat to see exactly what I get, it's the
following:
$ nc -l -p 8983
GET /solr/select/?stylesheet=&q=h%E9llo&version=2.1&start=0&rows=10&indent=on HT
TP/1.1
I've started poking around and have fixed already one bug related to
URL encoding of data. I'm going to work some more on this tonight
and will hopefully have a patch for you soon.
phil.
On Jul 18, 2006, at 6:19 PM, Yonik Seeley wrote:
On 7/18/06, WHIRLYCOTT <[EMAIL PROTECTED]> wrote:
Ho
On 7/18/06, Tricia Williams <[EMAIL PROTECTED]> wrote:
My sample query is: .. (the english word _canada_
translated into russian) or
%D0%9A%D0%B0%D0%BD%D0%B0%D0%B4%D0%B0 (utf-8) or
%26%231050%3B%26%231072%3B%26%231085%3B%26%231072%3B%26%231076%3B%26%231072%3B
(solr url encoding)
Hi Tricia,
On Jul 18, 2006, at 5:53 PM, Tricia Williams wrote:
that using the packaged example admin interface entering a query
with a string of cyrillic characters causes a
java.lang.ArrayIndexOutOfBoundsException
... I have this much fixed as well.
However, I'm still walking data through the stack