Re: Cyrillic characters

Yonik Seeley Wed, 19 Jul 2006 12:13:12 -0700

On 7/19/06, WHIRLYCOTT <[EMAIL PROTECTED]> wrote:

Solr-trunk currently uses ISO-8859-1 as the character encoding for
the admin pages.  One of the patches I submitted changes the admin
pages to use UTF-8 and that fixes the problem.


OK, we are closer to working correctly.  It appears that the web
browsers are trying to be smart when submitting form data and using
the encoding of the received page to submit the HTTP-GET (non-standard
behaviour as I read it, but it may be to support legacy stuff).

So changing the admin pages to use UTF-8, and clearing the browser
caches, does indeed make both Firefox and IE send percent-encoded
UTF-8 (h%C3%A9llo).

Now the problem: Tomcat 5.5.17 isn't decoding percent-encoded UTF-8,
but instead treating %C3%A9 as two separate characters.  Soooo, I
think Bertrand is right about there being some web.xml setting....
time to hit the tomcat docs, and if that fails, grab Yoav's attention
:-)

I would be interested to know what some of the built-in http client
libs out there do:
 - HTTPClient, python, ruby, rhino, etc
Hopefully most do the right thing w.r.t. UTF-8, but if not, one can
always post queries with a content-type of UTF-8.


-Yonik

Re: Cyrillic characters

Reply via email to