On Tue, 11 Jul 2006, Stefan Rank wrote: > urllib.quote fails on unicode strings and in an unhelpful way:: [...] > >>> urllib.quote(u'a\xf1a') > Traceback (most recent call last): > File "<stdin>", line 1, in ? > File "C:\Python24\lib\urllib.py", line 1117, in quote > res = map(safe_map.__getitem__, s) > KeyError: u'\xf1'
More helpful than silently producing the wrong answer. [...] > I suggest to add (after 2.5 I assume) one of the following to the > beginning of urllib.quote to either fail early and consistently on > unicode arguments and improve the error message:: > > if isinstance(s, unicode): > raise TypeError("quote needs a byte string argument, not unicode," > " use `argument.encode('utf-8')` first.") Won't this break existing code that catches the KeyError, for no big benefit? If nobody is yet sure what the Right Thing is (see below), I think we should not change this yet. > or to do The Right Thing (tm), which is utf-8 encoding:: > > if isinstance(s, unicode): > s = s.encode('utf-8') > > as suggested in > http://www.w3.org/International/O-URL-code.html > and rfc3986. You seem quite confident of that. You may be correct, but have you read all of the following? (not trying to claim superior knowledge by asking that, I just dunno what the right thing is yet: I haven't yet read RFC 2617 or got my head around what the unicode issues are or how they should apply to the Python stdlib) http://www.ietf.org/rfc/rfc2617.txt http://www.ietf.org/rfc/rfc2616.txt http://en.wikipedia.org/wiki/Percent-encoding http://mail.python.org/pipermail/python-dev/2004-September/048944.html Also note the recent discussions here about a module named "uriparse" or "urischemes", which fits in to this somewhere. It would be good to make all the following changes in a single Python release (2.6, with luck): - extend / modify urllib and urllib2 to handle unicode input - address the urllib.quote issue you raise above (+ consider the other utility functions in that module) - add the urischemes module In summary, I agree that your suggested fix (and all of the rest I refer to above) should wait for 2.6, unless somebody (Martin?) who understands all these issues is quite confident your suggested change is OK. Presumably the release managers wouldn't allow it in 2.5 anyway. John _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com