In comments on issue #55917, there was suggestion for refactoring cookie 
support along the lines described in RFC6265. Reading this RFC, it appears to 
be more of an effort to standardize the actual behaviour seen on the Internet 
for different browser and server implementations. The observation is the 
RFC2109 has received limited adoption and RFC2965 virtually none at all, with 
most implementations falling back to the original specification released by 
Netscape that contains certain ambiguities. 

The Servlet spec’s JavaDoc for Cookie refers to RFC2109 behaviour with caveats 
around interoperability. It defines version 0 as complying with Netscape’s 
original specification and version 1 as complying RFC2109 (with the note “Since 
RFC 2109 is still somewhat new, consider version 1 as experimental; do not use 
it yet on production sites”).

The current implementation uses a number of system properties to control how 
cookies are validated. In implementing RFC6265 I hope that some of these can be 
eliminated. If not, I would propose to add configuration options on the 
Connector or Host objects to allow the configuration to be set separately for 
different host domains.

RFC6265 has separate sections in respect for generating and parsing cookie 
headers. It follows the practice that generation be strict but parsing be more 
tolerant of invalid input. Our current implementation generally follows that 
trend by suppressing invalid input data (after logging). However, for some 
input data, primary CTLs, it throws an IllegalArgumentException from the 
connector which does not allow the application to recover. In refactoring, I 
would propose to simply ignore that input thereby allowing the application to 
handle it, for example by parsing the header field manually. Cookie parsing in 
particular needs to be tolerant of cookies set by other sources, including 
different servers handling other parts of the domain and JavaScript or other 
client-side code setting values in the browser.

In light of this, I propose separating the “Set-Cookie” generation side from 
the “Cookie” parsing side.

Generation
==========
The general principle here would be to use the version property of Cookie to 
determine the level of verification to perform: if 0 follow RFC6265, if 1 use 
RFC2109. The primary verification point would be in 
HttpServletRequest#addCookie() which would use the version in the Cookie 
instance. Characters will always be converted to octets using the ISO-8859-1 
charset; unmappable values will result in an IAE.

The Servlet spec requires an IAE be thrown in Cookie’s constructor if the name 
is not valid pre RFC2109. Both RFC6265 and RFC2109 define the name to be a 
“token” (per RFC2616 HTTP/1.1) so I would propose to always validate by those 
rules; this would allow US-ASCII characters except CTLs and separators. This 
will different from the current implementation that slash “/“ would be treated 
as a separator which would not be allowed in a name by default; this is 
consistent with the RFC’s and Glassfish’s implementation and I’m assuming that 
allowing it in our current implementation is a hangover from where we enabled 
use of “/“ in values. 

The spec allows vendors to provide "a configuration option that allows cookie 
names conforming to the original Netscape Cookie Specification to be accepted” 
and I propose to retain the system property 
“org.apache.tomcat.util.http.ServerCookie.STRICT_NAMING” for that. If 
explicitly set to false, it will verify names using Netscape’s rules and allow 
"a sequence of characters excluding semi-colon, comma and white space” but also 
excluding “=“ and CTLs per RFC2616; note this *would* allow 8-bit ISO-8859-1 
characters in the name and relax the RFC2109 constraint that "NAMEs that begin 
with $ are reserved for other uses and must not be used by applications.” 

The value would not be checked until addCookie() was called and the cookie 
version is known. This would in principle use RFC6265’s “cookie-value” rule if 
version == 0 or RFC2109’s “value” rules if version == 1; values that do not 
conform would result in an IAE from addCookie(). Unlike the current 
implementation, this would not automatically upgrade the version or add quotes 
around RFC2109 “values” that did not match the “token” rule.

If STRICT_SERVLET_COMPLIANCE is set, the rule for version 0 values would be 
relaxed to allow any value conforming to the Netscape specification except 
CTLs; this would effectively add DQUOTE, backslash, and 0x80-0xFF. For more 
granular control, I propose adding the system property 
“org.apache.tomcat.util.http.ServerCookie.ALLOW_IN_VALUE” which would take one 
of the following enum values to determine what octets were allowed:
* Netscape
* RFC2616_token
* RFC2109_value
* RFC6265_cookie_octet
* Netscape_restricted (limits the permitted characters as recommended in the 
Servlet spec)
* RFC6265_ISO-8859-1 (adds 0x80-0xff to cookie_octet)

RFC6265 does allow value to be omitted so if value is null then a name-only 
cookie will be produced. This will contain the “=“ character required by the 
“cookie-pair” rule. RFC2109 does not allow the value to be omitted so a null 
value will result in an IAE unless 
“org.apache.tomcat.util.http.ServerCookie.ALLOW_NAME_ONLY” is set to true.

Max-Age and Expires will always be sent.

Parsing
=======
RFC6265 says the user-agent MUST send only a single Cookie header, and RFC2109 
is written to imply that assumption. Netscape says “a line” is added to the 
request. Our current implementation processes all Cookie headers in a request 
independently which leads to a difference in behaviour if the headers are 
folded by an intermediate proxy. Specifically any $Version value specified in 
one header is lost when processing the next whereas if the headers were folded 
the version information would apply when processing the values from the second 
header. To avoid this inconsistency, I propose only processing the first header 
received.

RFC2109 requires the header start with “cookie-version” which can be used to 
discriminate between RFC2109 and RFC6265/Netscape formats. Specifically, if the 
line starts with “$Version” then it would be processed using RFC2109 rules, 
otherwise processing would use RFC6265 rules. The analysis behind RFC6265 
indicates that most user agents simply ignore any “Version” attribute in a 
RFC2109 Set-Cookie so we would expect most requests to simply contain a 
RFC6265-format header.

When parsing a RFC2109 Cookie header, we will assume conformance to the 
specification. Any invalid “cookie-value” will simply be dropped and the parser 
will attempt to recover at the next potential “cookie-value” based on the 
current parse state (i.e. in a quoted-string or not). A missing VALUE will be 
considered invalid unless 
“org.apache.tomcat.util.http.ClientCookie.ALLOW_NAME_ONLY” is set to true. The 
version, path and domain properties in Cookie will be set based on attributes 
in the cookie-value.

When parsing a RFC6265 Cookie header, we will assume it is comprised of 
“cookie-pair” separated by the sequence “;” SP. A “cookie-pair” must start with 
a valid “cookie-name” followed by an “=“ character. The name must be a valid 
“token” unless “org.apache.tomcat.util.http.ClientCookie.STRICT_NAMING” is 
explicitly set to false. The will would be considered valid if it complies with 
the “cookie-value” rule unless 
“org.apache.tomcat.util.http.ClientCookie.ALLOW_IN_VALUE” is set to one of the 
alternatives above. This will allow name-only cookies (in the form "name=“). It 
will also mean cookies with unencoded JSON values will normally be suppressed 
so any application expecting such a value would need to parser the Cookie 
header directly. Any invalid cookie-pair will not be returned to the 
application and will not cause an exception to be thrown by the parser; a 
user-data error will be logged. To recover from an invalid cookie-pair the 
parser will look for the next “;” SP sequence. The path and domain properties 
in Cookie will always be null and the version will always be 0.

======

I plan to start looking at this for trunk/TC8.0 starting with a cleanup of the 
current tests. It should be possible to back-port to TC7.0.x if that is 
desirable.

Thanks
Jeremy

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to