Bugs item #1377394, was opened at 2005-12-09 22:43 Message generated for change (Comment added) made by birkenfeld You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1377394&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Unicode Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: superwesman (superwesman) Assigned to: M.-A. Lemburg (lemburg) Summary: read() / readline() blow up if file has even number of char. Initial Comment: Hello, I am having a problem with the read() and readline() functions. I'm using codecs.open() to open a text file, then using either read() or readline() to get its contents. In python 2.4.2, if the file has an even number of characters, I get a UnicodeDecodeError. If python 2.4.1 this works regardless of the character count. I've pasted below a sample script and the sample text file I was running. This is the command I executed at the Windows 2000 CMD prompt: python sample.py sample.txt Again, in 2.4.1, this works fine - in 2.4.2 it breaks when the file-to-be-read has an odd number of characters. Thanks. -w # start: sample.py import codecs import sys print "open the file" in_file = codecs.open( sys.argv[1], "r", "unicode_internal" ) print "read the file" the_file = in_file.read() print "close the file" in_file.close() print "done" # end: sample.py # start: sample.txt RESULTHOST=vivaldi RESULTPORT=a DB_XML=/test/art/jfw/config/DBList.xml LOGCHECK_IGNORE=art_actions.txt # end: sample.txt ---------------------------------------------------------------------- >Comment By: Reinhold Birkenfeld (birkenfeld) Date: 2005-12-10 11:57 Message: Logged In: YES user_id=1188172 I'd suggest unicode_internal to be removed from the docs. ---------------------------------------------------------------------- Comment By: superwesman (superwesman) Date: 2005-12-10 00:17 Message: Logged In: YES user_id=1401447 I didn't realize that 'unicode_internal' was not a legitimate value to pass into this function. If 'unicode_internal' is not a valid 3rd parameter to codecs.open(), shouldn't that function complain? If it is a valid option (that should only be used "Python internally" - not sure what that means) then it should perform consistently regardless of the number of characters in the file, should it not? Seems to me that pilot-error uncovered a bug. If this is not a valid choice, then codecs.open() should complain. If it is valid, it should perform consistently, IMHO. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2005-12-09 23:04 Message: Logged In: YES user_id=38388 Why would you want to read a file using the Python internal Unicode encoding (unicode_internal) ? This is an encoding that is only used Python internally and should not be used for anything else. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1377394&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
