Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tomcat Wiki" for change notification.
The "FAQ/CharacterEncoding" page has been changed by KonstantinKolinko. The comment on this change is: Rearranged. http://wiki.apache.org/tomcat/FAQ/CharacterEncoding?action=diff&rev1=13&rev2=14 -------------------------------------------------- = Character Encoding Issues = == Questions == + + 1. '''Why''' - 1. [[#Q1|What is the default character encoding of the request or response body?]] + 1. [[#Q1|What is the default character encoding of the request or response body?]] + 1. [[#Q9|Why does everything have to be this way?]] + 1. '''How''' - 1. [[#Q2|How do I change how GET parameters are interpreted?]] + 1. [[#Q2|How do I change how GET parameters are interpreted?]] - 1. [[#Q3|How do I change how POST parameters are interpreted?]] + 1. [[#Q3|How do I change how POST parameters are interpreted?]] + 1. [[#Q8|What can you recommend to just make everything work? (How to use UTF-8 everywhere).]] - 1. [[#Q4|How can I test if my configuration will work correctly?]] + 1. [[#Q4|How can I test if my configuration will work correctly?]] - 1. [[#Q6|How can I send higher characters in HTTP headers?]] + 1. [[#Q6|How can I send higher characters in HTTP headers?]] + 1. '''Troubleshooting''' - 1. [[#Q8|What can you recommend to just make everything work? -- How to use UTF-8 everywhere.]] - 1. [[#Q9|Why does everything have to be this way?]] - 1. [[#Q5|I'm having a problem with character encoding in Tomcat 5]] + 1. [[#Q5|I'm having a problem with character encoding in Tomcat 5]] == Answers == + + === Why === + <<Anchor(Q1)>>'''What is the default character encoding of the request or response body?''' If a character encoding is not specified, the Servlet specification requires that an encoding of ISO-8859-1 is used. The character encoding for the body of an HTTP message (request ''or'' response) is specified in the `Content-Type` header field. An example of such a header is `Content-Type: text/html; charset=ISO-8859-1` which explicitly states that the default (ISO-8859-1) is being used. References: [[http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1|HTTP 1.1 Specification, Section 3.7.1]] - <<Anchor(Q2)>>'''How do I change how GET parameters are interpreted?''' + ---- - Tomcat will use ISO-8859-1 as the default character encoding of the entire URL, including the query string ("GET parameters"). - - There are two ways to specify how GET parameters are interpreted: - - 1. Set the `URIEncoding` attribute on the <Connector> element in server.xml to something specific (e.g. `URIEncoding="UTF-8"`). - 1. Set the `useBodyEncodingForURI` attribute on the <Connector> element in server.xml to `true`. This will cause the Connector to use the request body's encoding for GET parameters. - - References: [[http://tomcat.apache.org/tomcat-6.0-doc/config/http.html|Tomcat 6 HTTP Connector]], [[http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html|Tomcat 6 AJP Connector]] - - <<Anchor(Q3)>>'''How do I change how POST parameters are interpreted?''' - - POST requests should specify the encoding of the parameters and values they send. Since many clients fail to set an explicit encoding, the default is used (ISO-8859-1). In many cases this is not the preferred interpretation so one can employ a javax.servlet.Filter to set request encodings. Writing such a filter is trivial. Furthermore Tomcat already comes with such an example filter. Please take a look at: - 5.x:: - {{{ - webapps/servlets-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java - webapps/jsp-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java - }}} - 6.x:: - {{{ - webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java - }}} - - <<Anchor(Q4)>>'''How can I test if my configuration will work correctly?''' - - The following sample JSP should work on a clean Tomcat install for any input. If you set the URIEncoding="UTF-8" on the connector, it will also work with method="GET". - {{{ - <%@ page contentType="text/html; charset=UTF-8" %> - <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> - <html> - <head> - <title>Character encoding test page</title> - </head> - <body> - <p>Data posted to this form was: - <% - request.setCharacterEncoding("UTF-8"); - out.print(request.getParameter("mydata")); - %> - - </p> - <form method="POST" action="index.jsp"> - <input type="text" name="mydata"> - <input type="submit" value="Submit" /> - <input type="reset" value="Reset" /> - </form> - </body> - </html> - }}} - - <<Anchor(Q8)>>'''How can I send higher characters in my HTTP headers?''' - - You have to encode them in some way before you insert them into a header. Using url-encoding (`%` + high byte number + low byte number) would be a good idea. - - <<Anchor(Q8)>>'''What can you recommend to just make everything work? -- How to use UTF-8 everywhere.''' - - Using `UTF-8` as your character encoding for everything is a safe bet. This should work for pretty much every situation. - - In order to completely switch to using UTF-8, you need to make the following changes: - - 1. Set {{{URIEncoding="UTF-8"}}} on your <Connector> in `server.xml`. References: [[http://tomcat.apache.org/tomcat-6.0-doc/config/http.html|HTTP Connector]], [[http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html|AJP Connector]]. - 1. Use a [[#Q3|character encoding filter]] with the default encoding set to UTF-8 - 1. Change all your JSPs to include charset name in their contentType. For example, use {{{<%...@page contentType="text/html; charset=UTF-8" %>}}} for the usual JSP pages and {{{<jsp:directive.page contentType="text/html; charset=UTF-8" />}}} for the pages in XML syntax (aka JSP Documents). - 1. Change all your servlets to set the content type for responses and to include charset name in the content type to be UTF-8. Use {{{response.setContentType("text/html; charset=UTF-8")}}} or {{{response.setCharacterEncoding("UTF-8")}}}. - 1. Change any content-generation libraries you use (Velocity, Freemarker, etc.) to use UTF-8 and to specify UTF-8 in the content type of the responses that they generate. - 1. Disable any valves or filters that may read request parameters before your character encoding filter or jsp page has a chance to set the encoding to UTF-8. For more information see http://www.mail-archive.com/us...@tomcat.apache.org/msg21117.html. <<Anchor(Q9)>>'''Why does everything have to be this way?''' @@ -124, +66 @@ Section 3.1 of the ARPA Internet Text Messages spec states that headers are always in US-ASCII encoding. Anything outside of that needs to be encoded. See the section above regarding query strings in URIs. + + ---- + + === How === + + <<Anchor(Q2)>>'''How do I change how GET parameters are interpreted?''' + + Tomcat will use ISO-8859-1 as the default character encoding of the entire URL, including the query string ("GET parameters"). + + There are two ways to specify how GET parameters are interpreted: + + 1. Set the `URIEncoding` attribute on the <Connector> element in server.xml to something specific (e.g. `URIEncoding="UTF-8"`). + 1. Set the `useBodyEncodingForURI` attribute on the <Connector> element in server.xml to `true`. This will cause the Connector to use the request body's encoding for GET parameters. + + References: [[http://tomcat.apache.org/tomcat-6.0-doc/config/http.html|Tomcat 6 HTTP Connector]], [[http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html|Tomcat 6 AJP Connector]] + + + ---- + + <<Anchor(Q3)>>'''How do I change how POST parameters are interpreted?''' + + POST requests should specify the encoding of the parameters and values they send. Since many clients fail to set an explicit encoding, the default is used (ISO-8859-1). In many cases this is not the preferred interpretation so one can employ a javax.servlet.Filter to set request encodings. Writing such a filter is trivial. Furthermore Tomcat already comes with such an example filter. Please take a look at: + 5.x:: + {{{ + webapps/servlets-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java + webapps/jsp-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java + }}} + 6.x:: + {{{ + webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java + }}} + + + ---- + + <<Anchor(Q8)>>'''What can you recommend to just make everything work? (How to use UTF-8 everywhere).''' + + Using `UTF-8` as your character encoding for everything is a safe bet. This should work for pretty much every situation. + + In order to completely switch to using UTF-8, you need to make the following changes: + + 1. Set {{{URIEncoding="UTF-8"}}} on your <Connector> in `server.xml`. References: [[http://tomcat.apache.org/tomcat-6.0-doc/config/http.html|HTTP Connector]], [[http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html|AJP Connector]]. + 1. Use a [[#Q3|character encoding filter]] with the default encoding set to UTF-8 + 1. Change all your JSPs to include charset name in their contentType. + For example, use {{{<%...@page contentType="text/html; charset=UTF-8" %>}}} for the usual JSP pages and {{{<jsp:directive.page contentType="text/html; charset=UTF-8" />}}} for the pages in XML syntax (aka JSP Documents). + 1. Change all your servlets to set the content type for responses and to include charset name in the content type to be UTF-8. + Use {{{response.setContentType("text/html; charset=UTF-8")}}} or {{{response.setCharacterEncoding("UTF-8")}}}. + 1. Change any content-generation libraries you use (Velocity, Freemarker, etc.) to use UTF-8 and to specify UTF-8 in the content type of the responses that they generate. + 1. Disable any valves or filters that may read request parameters before your character encoding filter or jsp page has a chance to set the encoding to UTF-8. For more information see http://www.mail-archive.com/us...@tomcat.apache.org/msg21117.html. + + + ---- + + <<Anchor(Q4)>>'''How can I test if my configuration will work correctly?''' + + The following sample JSP should work on a clean Tomcat install for any input. If you set the URIEncoding="UTF-8" on the connector, it will also work with method="GET". + {{{ + <%@ page contentType="text/html; charset=UTF-8" %> + <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> + <html> + <head> + <title>Character encoding test page</title> + </head> + <body> + <p>Data posted to this form was: + <% + request.setCharacterEncoding("UTF-8"); + out.print(request.getParameter("mydata")); + %> + + </p> + <form method="POST" action="index.jsp"> + <input type="text" name="mydata"> + <input type="submit" value="Submit" /> + <input type="reset" value="Reset" /> + </form> + </body> + </html> + }}} + + + ---- + + <<Anchor(Q6)>>'''How can I send higher characters in my HTTP headers?''' + + You have to encode them in some way before you insert them into a header. Using url-encoding (`%` + high byte number + low byte number) would be a good idea. + + + ---- + + === Troubleshooting === + <<Anchor(Q5)>>'''I'm having a problem with character encoding in Tomcat 5''' In Tomcat 5 - there have been issues reported with respect to character encoding (usually of the the form "request.setCharacterEncoding(String) doesn't work"). Odds are, its not a bug. Before filing a bug report, see these bug reports as well as any bug reports linked to these bug reports: --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org