On 2014-01-04, at 17:24 , Chris Angelico <ros...@gmail.com> wrote: > On Sun, Jan 5, 2014 at 2:36 AM, Hugo G. Fierro <h...@gfierro.com> wrote: >> I am trying to download an HTML document. I get an HTTP 301 (Moved >> Permanently) with a UTF-8 encoded Location header and http.client decodes it >> as iso-8859-1. When there's a non-ASCII character in the redirect URL then I >> can't download the document. >> >> In client.py def parse_headers() I see the call to decode('iso-8859-1'). My >> personal hack is to use whatever charset is defined in the Content-Type >> HTTP header (utf8) or fall back into iso-8859-1. >> >> At this point I am not sure where/how a fix should occur so I thought I'd >> run it by you in case I should file a bug. Note that I don't use http.client >> directly, but through the python-requests library. > > I'm not 100% sure, but I believe non-ASCII characters are outright > forbidden in a Location: header. It's possible that an RFC2047 tag > might be used, but my reading of RFC2616 is that that's only for text > fields, not for Location. These non-ASCII characters ought to be > percent-encoded, and anything doing otherwise is buggy.
That is also my reading, the Location field’s value is defined as an absoluteURI (RFC2616, section 14.30): > Location = "Location" ":" absoluteURI section 3.2.1 indicates that "absoluteURI" (and other related concepts) are used as defined by RFC 2396 "Uniform Resource Identifiers (URI): Generic Syntax", that is: > absoluteURI = scheme ":" ( hier_part | opaque_part ) both "hier_part" and "opaque_part" consist of some punctuation characters, "escaped" and "unreserved". "escaped" is %-encoded characters which leaves "unreserved" defined as "alphanum | mark". "mark" is more punctuation and "alphanum" is ASCII's alphanumeric ranges. Furthermore, although RFC 3986 moves some stuff around and renames some production rules, it seems to have kept this limitation. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com