https://bz.apache.org/bugzilla/show_bug.cgi?id=58859

            Bug ID: 58859
           Summary: Allow to limit charsets / encodings supported by
                    Tomcat
           Product: Tomcat 9
           Version: unspecified
          Hardware: PC
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Catalina
          Assignee: dev@tomcat.apache.org
          Reporter: knst.koli...@gmail.com

There was an enhancement request (bug 57808 "Don't preload all charsets").
I want to implement a similar thing, but as a security / paranoid feature.

The issue: A client request can specify an encoding (charset) name. This
charset is used to parse request parameters (the query string and parameters in
the body of a POST request).

The problem is that a Java Runtime supports many charsets, but I really use
only a handful of them (ISO-8859-1, US-ASCII, UTF-8, and several charsets used
in my country).

There exists such nasty charset as UTF-7 [1], and some old browser was "nice"
to implement it. Luckily the current versions of Java do not implement it
(tested Sun/Oracle Java 5/6/7/8), but I really do not know about all of those
implemented charsets, and there are some exotic ones among them and some
experimental ones (X-*).

Proposal
==========
1. A new system property with the following name:
org.apache.tomcat.SUPPORTED_CHARSETS

2. The following behaviour:
If this property is set to a non-empty string, then in the static
initialization block of B2CConverter use the character sets named in this
property to populate a Set<Charset> that will be used instead of
Charset.availableCharsets() in initialization loop.

For example, if org.apache.tomcat.SUPPORTED_CHARSETS=ISO-8859-1,UTF-8 then
Tomcat will only support those two charsets and all aliases of their names. An
attempt to use any other character set name will result in an
UnsupportedEncodingException.

For Java 8u66 and those two charsets it gives the following allowed names:

ISO-8859-1
819 (alias for ISO-8859-1)
ISO8859-1 (alias for ISO-8859-1)
l1 (alias for ISO-8859-1)
ISO_8859-1:1987 (alias for ISO-8859-1)
ISO_8859-1 (alias for ISO-8859-1)
8859_1 (alias for ISO-8859-1)
iso-ir-100 (alias for ISO-8859-1)
latin1 (alias for ISO-8859-1)
cp819 (alias for ISO-8859-1)
ISO8859_1 (alias for ISO-8859-1)
IBM819 (alias for ISO-8859-1)
ISO_8859_1 (alias for ISO-8859-1)
IBM-819 (alias for ISO-8859-1)
csISOLatin1 (alias for ISO-8859-1)

UTF-8
unicode-1-1-utf-8 (alias for UTF-8)
UTF8 (alias for UTF-8)

This feature applies only to the set of charsets used via B2CConverter class,
that is used internally by Tomcat. I think that Jasper does not use it, so it
does not apply to the encoding used to write source code of JSP pages.

The difference with enhancement proposed in bug 57808 is that all unnamed
charsets are not supported, instead of loading them lazily.


SetCharacterEncodingFilter 
============================
I should also note the following:

The issue of charset name provided by client can also be solved by using a 
 org.apache.catalina.filters.SetCharacterEncodingFilter
that is configured with initialization parameter ignore="true".

This filter is available in all current Tomcat versions (6/7/8/9). Some web
frameworks (e.g. Spring) also provide similar filters.

If a web application renders all its pages in UTF-8, then it can expect that
all requests to it to use UTF-8 as well.


[1] https://en.wikipedia.org/wiki/UTF-7
[2]
http://tomcat.apache.org/tomcat-8.0-doc/config/filter.html#Set_Character_Encoding_Filter

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to