On 7/19/06, WHIRLYCOTT <[EMAIL PROTECTED]> wrote:
Solr-trunk currently uses ISO-8859-1 as the character encoding for the admin pages. One of the patches I submitted changes the admin pages to use UTF-8 and that fixes the problem.
OK, we are closer to working correctly. It appears that the web browsers are trying to be smart when submitting form data and using the encoding of the received page to submit the HTTP-GET (non-standard behaviour as I read it, but it may be to support legacy stuff). So changing the admin pages to use UTF-8, and clearing the browser caches, does indeed make both Firefox and IE send percent-encoded UTF-8 (h%C3%A9llo). Now the problem: Tomcat 5.5.17 isn't decoding percent-encoded UTF-8, but instead treating %C3%A9 as two separate characters. Soooo, I think Bertrand is right about there being some web.xml setting.... time to hit the tomcat docs, and if that fails, grab Yoav's attention :-) I would be interested to know what some of the built-in http client libs out there do: - HTTPClient, python, ruby, rhino, etc Hopefully most do the right thing w.r.t. UTF-8, but if not, one can always post queries with a content-type of UTF-8. -Yonik