On 27/08/2014 10:38, Konstantin Kolinko wrote: > 2014-08-27 13:29 GMT+04:00 Mark Thomas <ma...@apache.org>: >>> >> >> Bad news: The issue is that if there is a chance of UTF-8 in the header >> then you can't simply split the header into individual cookies based on >> the separator byte since you can't tell (without decoding to characters) >> if a byte represents the separator or is part of a sequence of several >> bytes representing some other character. >> > > You can. All separator bytes are 7-bit US-ASCII. > > BTW, There is also a feature in UTF-8 that you can split it into > characters without actually decoding them. > > I mean "Character boundaries are easily found from anywhere in an > octet stream." as said in "1. Introduction" of > http://tools.ietf.org/html/rfc3629
Doh. Thanks for the correction. That gives us rather more options (if we want/need them). I had in the back of my mind an old UTF-8 related security issue where multi-byte characters were being incorrectly processed and the remaining bytes were incorrectly being treated single byte characters in the range 0-127. I need to re-read through that issue to remind myself exactly what was going on as with UTF-8 that simply should not be possible. On a related topic... Since ISO-8859-1 is valid for use in a cookie value (BZ 55917) we are going to have to provide an option somewhere to select the encoding to use to decode cookie values. Mark --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org