Now I feel it is bad thing that encouraging using unicode for binary with latin-1 encoding or surrogateescape errorhandler.
Handling binary data in str type using latin-1 is just a hack. Surrogateescape is just a workaround to keep undecodable bytes in text. Encouraging binary data in str type with latin-1 or surrogateescape means encourage mixing binary and text data. It is worth than Python 2. So Python should encourage handling binary data in bytes type. On Fri, Jan 10, 2014 at 11:28 PM, Matěj Cepl <ma...@ceplovi.cz> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 2014-01-10, 12:19 GMT, you wrote: > > Using the 'latin-1' to mean unknown encoding can easily result > > in Mojibake (unreadable text) entering your application with > > dangerous effects on your other text data. > > > > E.g. "Marc-André" read using 'latin-1' if the string itself > > is encoded as UTF-8 will give you "Marc-André" in your > > application. (Yes, I see that a lot in applications > > and websites I use ;-)) > > I am afraid that for most 'latin-1' is just another attempt to > make Unicode complexity go away and the way how to ignore it. > > Matěj > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.22 (GNU/Linux) > > iD8DBQFS0AOG4J/vJdlkhKwRAgffAKCHn8uMnpZDVSwa2Oat+QI2h32o2wCeJdUN > ZXTbDtiJtJrrhnRPzbgc3dc= > =Pr1X > -----END PGP SIGNATURE----- > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com > -- INADA Naoki <songofaca...@gmail.com>
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com