On Sun, Jan 5, 2014 at 2:36 AM, Hugo G. Fierro <h...@gfierro.com> wrote: > I am trying to download an HTML document. I get an HTTP 301 (Moved > Permanently) with a UTF-8 encoded Location header and http.client decodes it > as iso-8859-1. When there's a non-ASCII character in the redirect URL then I > can't download the document. > > In client.py def parse_headers() I see the call to decode('iso-8859-1'). My > personal hack is to use whatever charset is defined in the Content-Type > HTTP header (utf8) or fall back into iso-8859-1. > > At this point I am not sure where/how a fix should occur so I thought I'd > run it by you in case I should file a bug. Note that I don't use http.client > directly, but through the python-requests library.
I'm not 100% sure, but I believe non-ASCII characters are outright forbidden in a Location: header. It's possible that an RFC2047 tag might be used, but my reading of RFC2616 is that that's only for text fields, not for Location. These non-ASCII characters ought to be percent-encoded, and anything doing otherwise is buggy. Confirming what you're seeing with a plain socket: >>> s=socket.socket() >>> s.connect(("www.starbucks.com",80)) >>> s.send(b'GET >>> /store/158/AT/Karntnerstrasse/K%c3%a4rntnerstrasse-49-Vienna-9-1010 >>> HTTP/1.1\r\nHost: www.starbucks.com\r\nAccept-Encoding: identity\r\n\r\n') 136 >>> s.recv(1024) b'HTTP/1.1 301 Moved Permanently\r\nContent-Type: text/html; charset=UTF-8\r\nLocation: http://www.starbucks.com/store/158/at/karntnerstrasse/k\xc3\xa4rntnerstrasse-49-vienna-9-1010\r\n ........' I'm pretty sure that server is in violation of the spec, so all bets are off as to what any other server will do. If you know you're dealing with this one server, you can probably hack around this, but I don't think it belongs in core code. Unless, of course, I'm completely wrong about the spec, or if there's a de facto spec that lots of servers follow, in which case maybe it would be worth doing. ChrisA _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com