Adding more confusion to the pile, HTML5[1] now specifies that JavaScript can 
set Unicode characters through document.cookie and that they must be encoded as 
UTF-8 in the header. Quick testing with Chrome shows it does just that (i.e. 
U+00E1 is sent as 0xC3 0xA1). If client and server-side application code is 
going to interoperate then we would need to accept them in a Cookie header and 
allow them to be sent in a Set-Cookie header. However, this is ambiguous when 
compared to Netscape and its implicit assumption of ISO-8859-1.

[1] http://www.w3.org/html/wg/drafts/html/master/single-page.html#cookie

On Jan 1, 2014, at 10:18 AM, Jeremy Boynes <jboy...@apache.org> wrote:

> On Jan 1, 2014, at 8:59 AM, Mark Thomas <ma...@apache.org> wrote:
> 
>> Signed PGP part
>> On 26/12/2013 19:23, Jeremy Boynes wrote:
>>> On Dec 26, 2013, at 2:47 AM, Mark Thomas <ma...@apache.org> wrote:
>>> 
>>> Focusing on the 8-bit issue address by the patch, leaving the other
>>> RFC6265 thread for broader discussion ...
>>> 
>>>>> The change only allows these characters in values if version ==
>>>>> 0 where Netscape’s rather than RFC2109’s syntax applies (per
>>>>> the Servlet spec). The Netscape spec is vague in that it does
>>>>> not define “OPAQUE_STRING" at all and defines “VALUE” as
>>>>> containing equally undefined “characters” although
>>>>> historically[1] those have been taken to be OCTETs as permitted
>>>>> by RFC2616’s “*TEXT” variant of “field-content.” The change
>>>>> will continue to reject these characters in names and in
>>>>> unquoted values when version != 0 (RFC2109’s “word" rule)
>>>>> 
>>>>> [1] based on comments by Fielding et al. on http-state and
>>>>> what I’ve seen in the wild
>>>> 
>>>> Can you provide references for [1]?
>>> 
>>> This is the mail in the run up to RFC6265 that triggered the
>>> discussion:
>>> http://www.ietf.org/mail-archive/web/http-state/current/msg01232.html
>> 
>> Thanks
>>> 
>> for that reference. What a complete mess. RFC6265 really
>> dropped the ball on this. The grammar for cookie-value is a disaster.
>> So far the issues include:
>> - no support for 0x80 to 0xFF
>> - no support for \" sequences
>> - no support for using whitespace, comma, semi-colon, backslash
>> 
>> I was beginning to think that factoring out the cookie generation /
>> parsing and then providing different implementations (one for Netscape
>> + RFC2109 - roughly what we have now with a few fixes, one for RFC6265
>> and maybe one very relaxed) would be the way to go. Having looked at
>> the first issue that plan already looks like it needs a re-think.
>> 
>> I'm still hoping that by documenting all the various issues in one
>> place we will be able to come up with a solution that both addresses
>> all the issues you have raised and is better than the handful of
>> system properties we have currently.
> 
> I think they did a reasonable job given the mess cookies are in the wild 
> today. They summarize this in the preamble:
>> The recommendations for cookie generation provided in Section 4 represent a 
>> preferred subset of current server behavior, and even the more liberal 
>> cookie processing algorithm provided in Section 5 does not recommend all of 
>> the syntactic and semantic variations in use today.
> 
> Section 4 recommends guidelines for servers generating cookies. I interpret 
> that as being “if you follow these guidelines, you have a good chance of 
> actually getting back the value you tried to set.” The rules above (no 8-bit, 
> no escaping, no Netscape delimiters) reflect that principle. A server 
> application can step outside those guidelines but "thar ther be dragons."
> 
> —
> Jeremy

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to