Hi Apache team, We are updating our encoding implementation on Tomcat and the ServiceNow platform from ISO-8859-1 to UTF-8 and ran into some concerns. There is a URIEncoding property that defaults to UTF-8 if it is not specified. Is there any reason there is no BodyEncoding property or is there a workaround you guys are aware of that does not require the source code to be modified? Looking through the Tomcat source code the default body encoding seems to be ISO-8859-1, looking at the Parameters, ByteChunk, and Constants classes there are two variables DEFAULT_BODY_CHARSET and DEFAULT_CHARSET that determine the body charset/encoding.
We have forked the Tomcat source code and applied the changes below which fixed the issue. Are you guys aware of this? Seems strange to have a URIEncoding property but not a BodyEncoding property unless I am missing something. Maybe this is an enhancement request we can submit unless there is a valid reason to not have such property. diff --git a/java/org/apache/coyote/Constants.java b/java/org/apache/coyote/Constants.java index 9de194d55..0883904f4 100644 --- a/java/org/apache/coyote/Constants.java +++ b/java/org/apache/coyote/Constants.java @@ -33,7 +33,7 @@ public final class Constants { public static final String DEFAULT_CHARACTER_ENCODING="ISO-8859-1"; public static final Charset DEFAULT_URI_CHARSET = StandardCharsets.ISO_8859_1; - public static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.ISO_8859_1; + public static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.UTF_8; public static final int MAX_NOTES = 32; diff --git a/java/org/apache/tomcat/util/buf/ByteChunk.java b/java/org/apache/tomcat/util/buf/ByteChunk.java index 555c0f6b8..ed9f6e5ea 100644 --- a/java/org/apache/tomcat/util/buf/ByteChunk.java +++ b/java/org/apache/tomcat/util/buf/ByteChunk.java @@ -123,7 +123,7 @@ public final class ByteChunk extends AbstractChunk { * standards seem to converge, but the servlet API requires 8859_1, and this * object is used mostly for servlets. */ - public static final Charset DEFAULT_CHARSET = StandardCharsets.ISO_8859_1; + public static final Charset DEFAULT_CHARSET = StandardCharsets.UTF_8; private transient Charset charset; diff --git a/java/org/apache/tomcat/util/http/Parameters.java b/java/org/apache/tomcat/util/http/Parameters.java index 4d7d6cc1e..f59f75514 100644 --- a/java/org/apache/tomcat/util/http/Parameters.java +++ b/java/org/apache/tomcat/util/http/Parameters.java @@ -266,7 +266,7 @@ public final class Parameters { */ @Deprecated public static final String DEFAULT_ENCODING = "ISO-8859-1"; - private static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.ISO_8859_1; + private static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.UTF_8; private static final Charset DEFAULT_URI_CHARSET = StandardCharsets.UTF_8; _____________________________________________ Luis Arriaga Software Engineer M: +17605192599 servicenow.com<https://www.servicenow.com> LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>