Re: Cyrillic characters

2006-07-19 Thread Yonik Seeley
On 7/19/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: Now the problem: Tomcat 5.5.17 isn't decoding percent-encoded UTF-8, but instead treating %C3%A9 as two separate characters. Here's the magic for Tomcat: http://split-s.blogspot.com/2005/12/internationalized-get-parameters-with.html edit serv

Re: Cyrillic characters

2006-07-19 Thread Yonik Seeley
On 7/19/06, WHIRLYCOTT <[EMAIL PROTECTED]> wrote: Solr-trunk currently uses ISO-8859-1 as the character encoding for the admin pages. One of the patches I submitted changes the admin pages to use UTF-8 and that fixes the problem. OK, we are closer to working correctly. It appears that the web

Re: Cyrillic characters

2006-07-19 Thread WHIRLYCOTT
On Jul 19, 2006, at 11:44 AM, Bertrand Delacretaz wrote: -If I search "désormais" from the solr/admin page, it is translated to q=d%E9sormais in the URL, and nothing's found (the word is in my index) http://www.w3.org/TR/REC-html40/interact/forms.html#adef-accept-charset "The default value fo

Re: Re: Cyrillic characters

2006-07-19 Thread Bertrand Delacretaz
On 7/19/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: ...Can anyone else shed some light on this?.. I have to run now but I *think* there are encoding settings in web.xml, and IIRC they might be different for Tomcat or Jetty. Setting UTF-8 everywhere should help. -Bertrand

Re: Cyrillic characters

2006-07-19 Thread WHIRLYCOTT
I submitted two patches that fix one problem with URL encoding and another with the screens on the webapp. http://issues.apache.org/jira/browse/SOLR-35 phil. On Jul 19, 2006, at 11:58 AM, Yonik Seeley wrote: On 7/19/06, Tricia Williams <[EMAIL PROTECTED]> wrote: You mentioned i

Re: Cyrillic characters

2006-07-19 Thread Yonik Seeley
On 7/19/06, Tricia Williams <[EMAIL PROTECTED]> wrote: You mentioned in another earlier post that q=h%c3%e9 would find matching hits. My experience shows that while the UTF-8 encoded query doesn't generate any exceptions, no results are matched. However q=h%e9llo would find matching results

Re: Re: Cyrillic characters

2006-07-19 Thread Bertrand Delacretaz
On 7/19/06, Tricia Williams <[EMAIL PROTECTED]> wrote: ...What I called the _solr url encoding_ was the q= parameter translated into encoding in the url... I think I've seen the same problem, haven't investigated deeper but IIUC the encoding used when posting a form is related to both the enc

Re: Cyrillic characters

2006-07-19 Thread Tricia Williams
Hi Yonik, I was incorrect to describe it as _solr encoding_. Hoss suggested that it might be a form error - I haven't checked this yet but it sound plausible. What I called the _solr url encoding_ was the q= parameter translated into encoding in the url. As I mention in my ps this tran