On 2014-01-04, at 17:24 , Chris Angelico <ros...@gmail.com> wrote:

> On Sun, Jan 5, 2014 at 2:36 AM, Hugo G. Fierro <h...@gfierro.com> wrote:
>> I am trying to download an HTML document. I get an HTTP 301 (Moved
>> Permanently) with a UTF-8 encoded Location header and http.client decodes it
>> as iso-8859-1. When there's a non-ASCII character in the redirect URL then I
>> can't download the document.
>> 
>> In client.py def parse_headers() I see the call to decode('iso-8859-1'). My
>> personal  hack is to use whatever charset is defined in the Content-Type
>> HTTP header (utf8) or fall back into iso-8859-1.
>> 
>> At this point I am not sure where/how a fix should occur  so I thought I'd
>> run it by you in case I should file a bug. Note that I don't use http.client
>> directly, but through the python-requests library.
> 
> I'm not 100% sure, but I believe non-ASCII characters are outright
> forbidden in a Location: header. It's possible that an RFC2047 tag
> might be used, but my reading of RFC2616 is that that's only for text
> fields, not for Location. These non-ASCII characters ought to be
> percent-encoded, and anything doing otherwise is buggy.

That is also my reading, the Location field’s value is defined as an
absoluteURI (RFC2616, section 14.30):

> Location = "Location" ":" absoluteURI

section 3.2.1 indicates that "absoluteURI" (and other related
concepts) are used as defined by RFC 2396 "Uniform Resource
Identifiers (URI): Generic Syntax", that is:

> absoluteURI = scheme ":" ( hier_part | opaque_part )

both "hier_part" and "opaque_part" consist of some punctuation
characters, "escaped" and "unreserved". "escaped" is %-encoded
characters which leaves "unreserved" defined as "alphanum | mark".
"mark" is more punctuation and "alphanum" is ASCII's alphanumeric
ranges.

Furthermore, although RFC 3986 moves some stuff around and renames some
production rules, it seems to have kept this limitation.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to