Re: RFC6265, cookie parsing and UTF-8

Rémy Maucherat Tue, 26 Aug 2014 15:09:41 -0700

2014-08-26 21:53 GMT+02:00 Mark Thomas <[email protected]>:

> One of the aims of the proposed cookie changes [1] was to deal with the
> HTML 5 changes that mean UTF-8 can appear in cookie headers.
>
> This has some potentially large implications for Tomcat.
>
> Currently, Tomcat handles cookies as MessageBytes, processing everything
> in bytes and only converting to String when necessary. This is largely
> possible because of the assumption that everything is ASCII.
>
> Introduce UTF-8 and processing everything in bytes gets a whole lot
> harder. You essentially have to decode to UTF-8 to ensure that you have
> valid data - at a which point why not just use Strings anyway?
>
> I am currently leaning towards removing a lot of the current cookie
> header caching  recycling and doing something along the following lines:
> - Lazy parsing as currently (but unless cookie based session tracking is
>   disabled this is going to run on every request)
> - Convert headers to UTF-8 strings
> - Parse them with a new parser along the lines of o.a.t.u.http.parser
> - Have that parser return an array of javax.servlet.http.Cookie objects
> - Pass those to the app if/when requested
>
> In terms of handling RFC6265 and RFC2109 my plan is to have two parsers,
> share as much code as possible and switch between them based on the
> cookie header with the expectation that 99.9% of cookies will be parsed
> by the RFC6265 parser. We could add some options to this switching to
> enable other parsers (e.g. a Netscape parser) to be used.
>
> I'd also like to keep the current cookie parsing implementation for now.
> Until we are happy with the new parsing, the current implementation will
> be the default. Once we are happy with the new parsing we can change the
> default. We can add an option to switch between the current and the new
> parsing.
>
> Thoughts?
>


As far as I am concerned, this could turn out badly. String manipulation is
consistently the slowest thing overall other than IO, and rather often
webapps use a massive amount of cookies [to the point they get errors
because the HTTP header size is too small by default].

So the current processing should probably be the default [as proposed],
then remain an option until it can be demonstrated this is not slower
[which IMO is not possible, so it would have to remain].

Rémy

Re: RFC6265, cookie parsing and UTF-8

Reply via email to