Re: RFC6265, cookie parsing and UTF-8

Mark Thomas Wed, 27 Aug 2014 02:58:59 -0700

On 27/08/2014 10:38, Konstantin Kolinko wrote:
> 2014-08-27 13:29 GMT+04:00 Mark Thomas <ma...@apache.org>:
>>>
>>
>> Bad news: The issue is that if there is a chance of UTF-8 in the header
>> then you can't simply split the header into individual cookies based on
>> the separator byte since you can't tell (without decoding to characters)
>> if a byte represents the separator or is part of a sequence of several
>> bytes representing some other character.
>>
> 
> You can. All separator bytes are 7-bit US-ASCII.
> 
> BTW, There is also a feature in UTF-8 that you can split it into
> characters without actually decoding them.
> 
> I mean "Character boundaries are easily found from anywhere in an
> octet stream." as said in "1. Introduction" of
> http://tools.ietf.org/html/rfc3629


Doh. Thanks for the correction. That gives us rather more options (if we
want/need them).

I had in the back of my mind an old UTF-8 related security issue where
multi-byte characters were being incorrectly processed and the remaining
bytes were incorrectly being treated single byte characters in the range
0-127. I need to re-read through that issue to remind myself exactly
what was going on as with UTF-8 that simply should not be possible.

On a related topic... Since ISO-8859-1 is valid for use in a cookie
value (BZ 55917) we are going to have to provide an option somewhere to
select the encoding to use to decode cookie values.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Re: RFC6265, cookie parsing and UTF-8

Reply via email to