New submission from Ryan McGuire <python....@enigmacurry.com>:

Opening a UTF-8 encoded file with unix newlines ("\n") on Win32:

codecs.open("whatever.txt","r","utf-8").read()

replaces the newlines ("\n") with CR+LF ("\r\n").

The docs specifically say that :

"Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-bit
values. This means that no automatic conversion of '\n' is done on
reading and writing."

And yet, opening the file with an explicit binary mode resolves the
situation:

codecs.open("whatever.txt","rb","utf-8").read()

This reads the file with the original newlines unmodified.

The implementation of codecs.open and the documentation are out of sync.

----------
assignee: georg.brandl
components: Documentation, Library (Lib)
messages: 91995
nosy: EnigmaCurry, georg.brandl
severity: normal
status: open
title: codecs.open on Win32 does not force binary mode
type: behavior
versions: Python 2.6, Python 3.1

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue6788>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to