Zooko O'Whielacronx wrote: > On Apr 28, 2009, at 6:46 AM, Hrvoje Niksic wrote: >> If you switch to iso8859-15 only in the presence of undecodable UTF-8, >> then you have the same round-trip problem as the PEP: both b'\xff' and >> b'\xc3\xbf' will be converted to u'\u00ff' without a way to >> unambiguously recover the original file name. > > Why do you say that? It seems to work as I expected here: > >>>> '\xff'.decode('iso-8859-15') > u'\xff' >>>> '\xc3\xbf'.decode('iso-8859-15') > u'\xc3\xbf' >>>> >>>> >>>> >>>> '\xff'.decode('cp1252') > u'\xff' >>>> '\xc3\xbf'.decode('cp1252') > u'\xc3\xbf' >
You're not showing that this is a fallback path. What won't work is first trying a local encoding (in the following example, utf-8) and then if that doesn't work, trying a one-byte encoding like iso8859-15: try: file1 = '\xff'.decode('utf-8') except UnicodeDecodeError: file1 = '\xff'.decode('iso-8859-15') print repr(file1) try: file2 = '\xc3\xbf'.decode('utf-8') except UnicodeDecodeError: file2 = '\xc3\xbf'.decode('iso-8859-15') print repr(file2) That prints: u'\xff' u'\xff' The two encodings can map different bytes to the same unicode code point so you can't do this type of thing without recording what encoding was used in the translation. -Toshio
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com