[EMAIL PROTECTED] wrote:
Hi Everyone,I am using Python 2.4 and I am converting an excel spreadsheet to a pipe delimited text file and some of the cells contain utf-8 characters. I solved this problem in a very unintuitive way and I wanted to ask why. If I do, csvfile.write(cell.encode("utf-8")) I get a UnicodeDecodeError. However if I do, c = unicode(cell.encode("utf-8"),"utf-8") csvfile.write(c) Why should I have to encode the cell to utf-8 and then make it unicode in order to write to a text file? Is there a more intuitive way to get around these bothersome unicode errors?
The short answer is that you're writing to a file you've opened with the codecs module. Any write to this file expects unicode data and will automatically encode it to the encoding you specified. You're trying to send it utf8-encoded data -- ie a string of bytes, *not* unicode -- and it presumably tries to decode it to a unicode object before encoding it as utf8 like you asked it to. Without looking at the implementation, it probably just does unicode (x) on what you've passed in, will will use the default ascii codec and fail in the way you saw. (Honestly, that was the short answer). To solve it, assuming cell is already unicode, just pass it unadulterated to csvfile.write. The reason the other thing works is because you're in control of the -- unncessary -- unicode conversion, and you're telling Python what encoding to use for decoding and encoding. TJG -- http://mail.python.org/mailman/listinfo/python-list
