In comments on issue #55917, there was suggestion for refactoring cookie support along the lines described in RFC6265. Reading this RFC, it appears to be more of an effort to standardize the actual behaviour seen on the Internet for different browser and server implementations. The observation is the RFC2109 has received limited adoption and RFC2965 virtually none at all, with most implementations falling back to the original specification released by Netscape that contains certain ambiguities.
The Servlet spec’s JavaDoc for Cookie refers to RFC2109 behaviour with caveats around interoperability. It defines version 0 as complying with Netscape’s original specification and version 1 as complying RFC2109 (with the note “Since RFC 2109 is still somewhat new, consider version 1 as experimental; do not use it yet on production sites”). The current implementation uses a number of system properties to control how cookies are validated. In implementing RFC6265 I hope that some of these can be eliminated. If not, I would propose to add configuration options on the Connector or Host objects to allow the configuration to be set separately for different host domains. RFC6265 has separate sections in respect for generating and parsing cookie headers. It follows the practice that generation be strict but parsing be more tolerant of invalid input. Our current implementation generally follows that trend by suppressing invalid input data (after logging). However, for some input data, primary CTLs, it throws an IllegalArgumentException from the connector which does not allow the application to recover. In refactoring, I would propose to simply ignore that input thereby allowing the application to handle it, for example by parsing the header field manually. Cookie parsing in particular needs to be tolerant of cookies set by other sources, including different servers handling other parts of the domain and JavaScript or other client-side code setting values in the browser. In light of this, I propose separating the “Set-Cookie” generation side from the “Cookie” parsing side. Generation ========== The general principle here would be to use the version property of Cookie to determine the level of verification to perform: if 0 follow RFC6265, if 1 use RFC2109. The primary verification point would be in HttpServletRequest#addCookie() which would use the version in the Cookie instance. Characters will always be converted to octets using the ISO-8859-1 charset; unmappable values will result in an IAE. The Servlet spec requires an IAE be thrown in Cookie’s constructor if the name is not valid pre RFC2109. Both RFC6265 and RFC2109 define the name to be a “token” (per RFC2616 HTTP/1.1) so I would propose to always validate by those rules; this would allow US-ASCII characters except CTLs and separators. This will different from the current implementation that slash “/“ would be treated as a separator which would not be allowed in a name by default; this is consistent with the RFC’s and Glassfish’s implementation and I’m assuming that allowing it in our current implementation is a hangover from where we enabled use of “/“ in values. The spec allows vendors to provide "a configuration option that allows cookie names conforming to the original Netscape Cookie Specification to be accepted” and I propose to retain the system property “org.apache.tomcat.util.http.ServerCookie.STRICT_NAMING” for that. If explicitly set to false, it will verify names using Netscape’s rules and allow "a sequence of characters excluding semi-colon, comma and white space” but also excluding “=“ and CTLs per RFC2616; note this *would* allow 8-bit ISO-8859-1 characters in the name and relax the RFC2109 constraint that "NAMEs that begin with $ are reserved for other uses and must not be used by applications.” The value would not be checked until addCookie() was called and the cookie version is known. This would in principle use RFC6265’s “cookie-value” rule if version == 0 or RFC2109’s “value” rules if version == 1; values that do not conform would result in an IAE from addCookie(). Unlike the current implementation, this would not automatically upgrade the version or add quotes around RFC2109 “values” that did not match the “token” rule. If STRICT_SERVLET_COMPLIANCE is set, the rule for version 0 values would be relaxed to allow any value conforming to the Netscape specification except CTLs; this would effectively add DQUOTE, backslash, and 0x80-0xFF. For more granular control, I propose adding the system property “org.apache.tomcat.util.http.ServerCookie.ALLOW_IN_VALUE” which would take one of the following enum values to determine what octets were allowed: * Netscape * RFC2616_token * RFC2109_value * RFC6265_cookie_octet * Netscape_restricted (limits the permitted characters as recommended in the Servlet spec) * RFC6265_ISO-8859-1 (adds 0x80-0xff to cookie_octet) RFC6265 does allow value to be omitted so if value is null then a name-only cookie will be produced. This will contain the “=“ character required by the “cookie-pair” rule. RFC2109 does not allow the value to be omitted so a null value will result in an IAE unless “org.apache.tomcat.util.http.ServerCookie.ALLOW_NAME_ONLY” is set to true. Max-Age and Expires will always be sent. Parsing ======= RFC6265 says the user-agent MUST send only a single Cookie header, and RFC2109 is written to imply that assumption. Netscape says “a line” is added to the request. Our current implementation processes all Cookie headers in a request independently which leads to a difference in behaviour if the headers are folded by an intermediate proxy. Specifically any $Version value specified in one header is lost when processing the next whereas if the headers were folded the version information would apply when processing the values from the second header. To avoid this inconsistency, I propose only processing the first header received. RFC2109 requires the header start with “cookie-version” which can be used to discriminate between RFC2109 and RFC6265/Netscape formats. Specifically, if the line starts with “$Version” then it would be processed using RFC2109 rules, otherwise processing would use RFC6265 rules. The analysis behind RFC6265 indicates that most user agents simply ignore any “Version” attribute in a RFC2109 Set-Cookie so we would expect most requests to simply contain a RFC6265-format header. When parsing a RFC2109 Cookie header, we will assume conformance to the specification. Any invalid “cookie-value” will simply be dropped and the parser will attempt to recover at the next potential “cookie-value” based on the current parse state (i.e. in a quoted-string or not). A missing VALUE will be considered invalid unless “org.apache.tomcat.util.http.ClientCookie.ALLOW_NAME_ONLY” is set to true. The version, path and domain properties in Cookie will be set based on attributes in the cookie-value. When parsing a RFC6265 Cookie header, we will assume it is comprised of “cookie-pair” separated by the sequence “;” SP. A “cookie-pair” must start with a valid “cookie-name” followed by an “=“ character. The name must be a valid “token” unless “org.apache.tomcat.util.http.ClientCookie.STRICT_NAMING” is explicitly set to false. The will would be considered valid if it complies with the “cookie-value” rule unless “org.apache.tomcat.util.http.ClientCookie.ALLOW_IN_VALUE” is set to one of the alternatives above. This will allow name-only cookies (in the form "name=“). It will also mean cookies with unencoded JSON values will normally be suppressed so any application expecting such a value would need to parser the Cookie header directly. Any invalid cookie-pair will not be returned to the application and will not cause an exception to be thrown by the parser; a user-data error will be logged. To recover from an invalid cookie-pair the parser will look for the next “;” SP sequence. The path and domain properties in Cookie will always be null and the version will always be 0. ====== I plan to start looking at this for trunk/TC8.0 starting with a cleanup of the current tests. It should be possible to back-port to TC7.0.x if that is desirable. Thanks Jeremy
signature.asc
Description: Message signed with OpenPGP using GPGMail