Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tomcat Wiki" for change 
notification.

The "FAQ/CharacterEncoding" page has been changed by KonstantinKolinko.
The comment on this change is: Rearranged.
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding?action=diff&rev1=13&rev2=14

--------------------------------------------------

  = Character Encoding Issues =
  
  == Questions ==
+ 
+  1. '''Why'''
-  1. [[#Q1|What is the default character encoding of the request or response 
body?]]
+   1. [[#Q1|What is the default character encoding of the request or response 
body?]]
+   1. [[#Q9|Why does everything have to be this way?]]
+  1. '''How'''
-  1. [[#Q2|How do I change how GET parameters are interpreted?]]
+   1. [[#Q2|How do I change how GET parameters are interpreted?]]
-  1. [[#Q3|How do I change how POST parameters are interpreted?]]
+   1. [[#Q3|How do I change how POST parameters are interpreted?]]
+   1. [[#Q8|What can you recommend to just make everything work? (How to use 
UTF-8 everywhere).]]
-  1. [[#Q4|How can I test if my configuration will work correctly?]]
+   1. [[#Q4|How can I test if my configuration will work correctly?]]
-  1. [[#Q6|How can I send higher characters in HTTP headers?]]
+   1. [[#Q6|How can I send higher characters in HTTP headers?]]
+  1. '''Troubleshooting'''
-  1. [[#Q8|What can you recommend to just make everything work? -- How to use 
UTF-8 everywhere.]]
-  1. [[#Q9|Why does everything have to be this way?]]
-  1. [[#Q5|I'm having a problem with character encoding in Tomcat 5]]
+   1. [[#Q5|I'm having a problem with character encoding in Tomcat 5]]
  
  == Answers ==
+ 
+ === Why ===
+ 
  <<Anchor(Q1)>>'''What is the default character encoding of the request or 
response body?'''
  
  If a character encoding is not specified, the Servlet specification requires 
that an encoding of ISO-8859-1 is used. The character encoding for the body of 
an HTTP message (request ''or'' response) is specified in the `Content-Type` 
header field. An example of such a header is `Content-Type: text/html; 
charset=ISO-8859-1` which explicitly states that the default (ISO-8859-1) is 
being used.
  
  References: 
[[http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1|HTTP 1.1 
Specification, Section 3.7.1]]
  
- <<Anchor(Q2)>>'''How do I change how GET parameters are interpreted?'''
  
+ ----
- Tomcat will use ISO-8859-1 as the default character encoding of the entire 
URL, including the query string ("GET parameters").
- 
- There are two ways to specify how GET parameters are interpreted:
- 
-  1. Set the `URIEncoding` attribute on the <Connector> element in server.xml 
to something specific (e.g. `URIEncoding="UTF-8"`).
-  1. Set the `useBodyEncodingForURI` attribute on the <Connector> element in 
server.xml to `true`. This will cause the Connector to use the request body's 
encoding for GET parameters.
- 
- References: [[http://tomcat.apache.org/tomcat-6.0-doc/config/http.html|Tomcat 
6 HTTP Connector]], 
[[http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html|Tomcat 6 AJP 
Connector]]
- 
- <<Anchor(Q3)>>'''How do I change how POST parameters are interpreted?'''
- 
- POST requests should specify the encoding of the parameters and values they 
send. Since many clients fail to set an explicit encoding, the default is used 
(ISO-8859-1). In many cases this is not the preferred interpretation so one can 
employ a javax.servlet.Filter to set request encodings. Writing such a filter 
is trivial. Furthermore Tomcat already comes with such an example filter. 
Please take a look at:
-  5.x::
- {{{
- 
webapps/servlets-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
- webapps/jsp-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
- }}}
-  6.x::
- {{{
- webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
- }}}
- 
- <<Anchor(Q4)>>'''How can I test if my configuration will work correctly?'''
- 
- The following sample JSP should work on a clean Tomcat install for any input. 
If you set the URIEncoding="UTF-8" on the connector, it will also work with 
method="GET".
- {{{
- <%@ page contentType="text/html; charset=UTF-8" %>
- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
- <html>
-    <head>
-      <title>Character encoding test page</title>
-    </head>
-    <body>
-      <p>Data posted to this form was:
-      <%
-        request.setCharacterEncoding("UTF-8");
-        out.print(request.getParameter("mydata"));
-      %>
- 
-      </p>
-      <form method="POST" action="index.jsp">
-        <input type="text" name="mydata">
-        <input type="submit" value="Submit" />
-        <input type="reset" value="Reset" />
-      </form>
-    </body>
- </html>
- }}}
- 
- <<Anchor(Q8)>>'''How can I send higher characters in my HTTP headers?'''
- 
- You have to encode them in some way before you insert them into a header. 
Using url-encoding (`%` + high byte number + low byte number) would be a good 
idea.
- 
- <<Anchor(Q8)>>'''What can you recommend to just make everything work? -- How 
to use UTF-8 everywhere.'''
- 
- Using `UTF-8` as your character encoding for everything is a safe bet. This 
should work for pretty much every situation.
- 
- In order to completely switch to using UTF-8, you need to make the following 
changes:
- 
-  1. Set {{{URIEncoding="UTF-8"}}} on your <Connector> in `server.xml`. 
References: [[http://tomcat.apache.org/tomcat-6.0-doc/config/http.html|HTTP 
Connector]], [[http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html|AJP 
Connector]].
-  1. Use a [[#Q3|character encoding filter]] with the default encoding set to 
UTF-8
-  1. Change all your JSPs to include charset name in their contentType. For 
example, use {{{<%...@page contentType="text/html; charset=UTF-8" %>}}} for the 
usual JSP pages and {{{<jsp:directive.page contentType="text/html; 
charset=UTF-8" />}}} for the pages in XML syntax (aka JSP Documents).
-  1. Change all your servlets to set the content type for responses and to 
include charset name in the content type to be UTF-8. Use 
{{{response.setContentType("text/html; charset=UTF-8")}}} or 
{{{response.setCharacterEncoding("UTF-8")}}}.
-  1. Change any content-generation libraries you use (Velocity, Freemarker, 
etc.) to use UTF-8 and to specify UTF-8 in the content type of the responses 
that they generate.
-  1. Disable any valves or filters that may read request parameters before 
your character encoding filter or jsp page has a chance to set the encoding to 
UTF-8.  For more information see 
http://www.mail-archive.com/us...@tomcat.apache.org/msg21117.html.
  
  <<Anchor(Q9)>>'''Why does everything have to be this way?'''
  
@@ -124, +66 @@

  
  Section 3.1 of the ARPA Internet Text Messages spec states that headers are 
always in US-ASCII encoding. Anything outside of that needs to be encoded. See 
the section above regarding query strings in URIs.
  
+ 
+ ----
+ 
+ === How ===
+ 
+ <<Anchor(Q2)>>'''How do I change how GET parameters are interpreted?'''
+ 
+ Tomcat will use ISO-8859-1 as the default character encoding of the entire 
URL, including the query string ("GET parameters").
+ 
+ There are two ways to specify how GET parameters are interpreted:
+ 
+  1. Set the `URIEncoding` attribute on the <Connector> element in server.xml 
to something specific (e.g. `URIEncoding="UTF-8"`).
+  1. Set the `useBodyEncodingForURI` attribute on the <Connector> element in 
server.xml to `true`. This will cause the Connector to use the request body's 
encoding for GET parameters.
+ 
+ References: [[http://tomcat.apache.org/tomcat-6.0-doc/config/http.html|Tomcat 
6 HTTP Connector]], 
[[http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html|Tomcat 6 AJP 
Connector]]
+ 
+ 
+ ----
+ 
+ <<Anchor(Q3)>>'''How do I change how POST parameters are interpreted?'''
+ 
+ POST requests should specify the encoding of the parameters and values they 
send. Since many clients fail to set an explicit encoding, the default is used 
(ISO-8859-1). In many cases this is not the preferred interpretation so one can 
employ a javax.servlet.Filter to set request encodings. Writing such a filter 
is trivial. Furthermore Tomcat already comes with such an example filter. 
Please take a look at:
+  5.x::
+ {{{
+ 
webapps/servlets-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
+ webapps/jsp-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
+ }}}
+  6.x::
+ {{{
+ webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
+ }}}
+ 
+ 
+ ----
+ 
+ <<Anchor(Q8)>>'''What can you recommend to just make everything work? (How to 
use UTF-8 everywhere).'''
+ 
+ Using `UTF-8` as your character encoding for everything is a safe bet. This 
should work for pretty much every situation.
+ 
+ In order to completely switch to using UTF-8, you need to make the following 
changes:
+ 
+  1. Set {{{URIEncoding="UTF-8"}}} on your <Connector> in `server.xml`. 
References: [[http://tomcat.apache.org/tomcat-6.0-doc/config/http.html|HTTP 
Connector]], [[http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html|AJP 
Connector]].
+  1. Use a [[#Q3|character encoding filter]] with the default encoding set to 
UTF-8
+  1. Change all your JSPs to include charset name in their contentType.
+  For example, use {{{<%...@page contentType="text/html; charset=UTF-8" %>}}} 
for the usual JSP pages and {{{<jsp:directive.page contentType="text/html; 
charset=UTF-8" />}}} for the pages in XML syntax (aka JSP Documents).
+  1. Change all your servlets to set the content type for responses and to 
include charset name in the content type to be UTF-8.
+  Use {{{response.setContentType("text/html; charset=UTF-8")}}} or 
{{{response.setCharacterEncoding("UTF-8")}}}.
+  1. Change any content-generation libraries you use (Velocity, Freemarker, 
etc.) to use UTF-8 and to specify UTF-8 in the content type of the responses 
that they generate.
+  1. Disable any valves or filters that may read request parameters before 
your character encoding filter or jsp page has a chance to set the encoding to 
UTF-8.  For more information see 
http://www.mail-archive.com/us...@tomcat.apache.org/msg21117.html.
+ 
+ 
+ ----
+ 
+ <<Anchor(Q4)>>'''How can I test if my configuration will work correctly?'''
+ 
+ The following sample JSP should work on a clean Tomcat install for any input. 
If you set the URIEncoding="UTF-8" on the connector, it will also work with 
method="GET".
+ {{{
+ <%@ page contentType="text/html; charset=UTF-8" %>
+ <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+ <html>
+    <head>
+      <title>Character encoding test page</title>
+    </head>
+    <body>
+      <p>Data posted to this form was:
+      <%
+        request.setCharacterEncoding("UTF-8");
+        out.print(request.getParameter("mydata"));
+      %>
+ 
+      </p>
+      <form method="POST" action="index.jsp">
+        <input type="text" name="mydata">
+        <input type="submit" value="Submit" />
+        <input type="reset" value="Reset" />
+      </form>
+    </body>
+ </html>
+ }}}
+ 
+ 
+ ----
+ 
+ <<Anchor(Q6)>>'''How can I send higher characters in my HTTP headers?'''
+ 
+ You have to encode them in some way before you insert them into a header. 
Using url-encoding (`%` + high byte number + low byte number) would be a good 
idea.
+ 
+ 
+ ----
+ 
+ === Troubleshooting ===
+ 
  <<Anchor(Q5)>>'''I'm having a problem with character encoding in Tomcat 5'''
  
  In Tomcat 5 - there have been issues reported with respect to character 
encoding (usually of the the form "request.setCharacterEncoding(String) doesn't 
work"). Odds are, its not a bug. Before filing a bug report, see these bug 
reports as well as any bug reports linked to these bug reports:

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to