Re: [Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?

Stefan Behnel Tue, 28 Jun 2011 23:30:55 -0700

Victor Stinner, 28.06.2011 15:43:

In Python 2, open() opens the file in binary mode (e.g. file.readline()
returns a byte string). codecs.open() opens the file in binary mode by
default, you have to specify an encoding name to open it in text mode.


In Python 3, open() opens the file in text mode by default. (It only
opens the binary mode if the file mode contains "b".) The problem is
that open() uses the locale encoding if the encoding is not specified,
which is the case *by default*. The locale encoding can be:

  - UTF-8 on Mac OS X, most Linux distributions
  - ISO-8859-1 os some FreeBSD systems
  - ANSI code page on Windows, e.g. cp1252 (close to ISO-8859-1) in
Western Europe, cp952 in Japan, ...
  - ASCII if the locale is manually set to an empty string or to "C", or
if the environment is empty, or by default on some systems
  - something different depending on the system and user configuration...

If you develop under Mac OS X or Linux, you may have surprises when you
run your program on Windows on the first non-ASCII character. You may
not detect the problem if you only write text in english... until
someone writes the first letter with a diacritic.

I agree that this is a *very* common source of problems. People write codethat doesn't care about encodings all over the place, and are thensurprised when it stops working at some point, either by switchingenvironments or by changing the data. I've seen this in virtually allprojects I've ever come to work in[1]. So, eventually, all of that code waseither thrown away or got debugged and fixed to use an explicit (andusually configurable) encoding.

Consequently, I don't think it's a bad idea to break out of this everrecurring development cycle by either requiring an explicit encoding rightfrom the start, or by making the default encoding platform independent. Theopportunity to fix this was very unfortunately missed in Python 3.0.

Personally, I don't buy the argument that it's harder to write quickscripts if an explicit encoding is required. Most code that gets written isnot just quick scripts, and even those tend to live longer than initiallyintended.


Stefan

[1] Admittedly, most of those projects were in Java, where the situation issubstantially worse than in Python. Java entirely lacks a way to define aper-module source encoding, and it even lacks a straight forward way toencode/decode a file with an explicit encoding. So, by default, *both*input encodings are platform dependent, whereas in Python it's only thedefault file encoding, and properly decoding a file is straight forward there.


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?

Reply via email to