Re: UTF-8 POST request results in corrupted data

Tim Funk Tue, 07 Oct 2008 03:37:58 -0700

If you take that form and post it - how does the server know that thecontent is UTF-8? (Answer: it doesn't)

The HTML directives tell the browser to encode everything into UTF-8 onthe way to the web server. But there is nothing that tells the webserverexplicitly what the charset is of the incoming request.


See the server spec fo more details, in particular
4.9 Request data encoding

Currently, many browsers do not send a char encoding qualifier with theContent-Type header, leaving open the determination of the characterencoding for reading HTTP requests. The default encoding of a requestthe container uses to create the request reader and parse POST data mustbe “ISO-8859-1” if none has been specified by the client request.However, in order to indicate to the developer in this case the failureof the client to send a character encoding, the container returns null

from the getCharacterEncoding method.


-Tim

Andre-John Mas wrote:

Thanks for the answer on this point. Reading section 3.7.1 of RFC 2616indicates that request can specify a character other than the default.For this reason the following should technically be legal:
<form action="" method="post"enctype="application/x-www-form-urlencoded; charset=utf-8"accept-charset="utf-8">
What I see, from testing on my Mac, is that Firefox and Safari fail topass the charset attribute, but Opera does. What I do notice here isthat even though Opera does specify the character set, Tomcat ignores itreplacing the submitted Japanese characters by questionmarks. This is an indication that UTF-8 was accepted but it wasconverted to ISO-8859-1 and no equivalent mapping was available. WithFirefox and Safari I get the same behaviour when I specify:
   request.setCharacterEncoding("UTF-8");
Basically I am not getting the Japanese characters as typed in the form.There is a problem here.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: UTF-8 POST request results in corrupted data

Reply via email to