Hi, urllib.quote fails on unicode strings and in an unhelpful way::
Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import urllib >>> urllib.quote('a\xf1a') 'a%F1a' >>> urllib.quote(u'ana') 'ana' >>> urllib.quote(u'a\xf1a') Traceback (most recent call last): File "<stdin>", line 1, in ? File "C:\Python24\lib\urllib.py", line 1117, in quote res = map(safe_map.__getitem__, s) KeyError: u'\xf1' There is a (closed) tracker item, dated 2000-10-12, http://sourceforge.net/tracker/?group_id=5470&atid=105470&aid=216716&func=detail and there was a note added to PEP-42 by Guido. According to a message I found on quixote-users, http://mail.mems-exchange.org/durusmail/quixote-users/5363/ it might have worked prior to 2.4.2. (I guess that this changed because of ascii now being the default encoding?) BTW, a patch by rhettinger from 8 months or so ago allows urllib.unquote to operate transparently on unicode strings:: >>> urllib.unquote('a%F1a') 'a\xf1a' >>> urllib.unquote(u'a%F1a') u'a\xf1a' I suggest to add (after 2.5 I assume) one of the following to the beginning of urllib.quote to either fail early and consistently on unicode arguments and improve the error message:: if isinstance(s, unicode): raise TypeError("quote needs a byte string argument, not unicode," " use `argument.encode('utf-8')` first.") or to do The Right Thing (tm), which is utf-8 encoding:: if isinstance(s, unicode): s = s.encode('utf-8') as suggested in http://www.w3.org/International/O-URL-code.html and rfc3986. cheers, stefan _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com