>>> You're right, I realised after playing with Tim's example that the >>> problem was that I wasn't calling close() on the codecs file. Adding >>> this after the f.write(html_text) seems to flush the buffer which >>> means that the content now gets written to the file. >> >> Quick note: it may be important to write and read from the file using >> binary mode "b". It's not so significant under Unix, but it is more >> significant under Windows, because otherwise we may get some weird >> results. > > But the file is utf-8 text, ISTM it should be written as text, not > binary. Why do you recommend binaray mode?
Hi Kent, Oh! I just wrote that out because I had a vague and fuzzy feeling that utf-8, having high-order binary bits, needed to be written carefully. But let me examine that unexamined assumption... No, you're right, we don't have to be so careful here, for carriage returns and newlines have their standard interpretation under utf-8 too. Ok, good to know. Thank you! I'd seen too many problems with Windows and binary data that I do 'rb' out of habit whenever dealing with high-order binary data. For example, ord(26) causes Windows to prematurely truncate the reading of a file in text mode: http://mail.python.org/pipermail/python-list/2003-March/154659.html On a close reading of how the utf-8 encoding standard, though, I see that it does say that utf-8 avoids encoding high Unicode code points with control characters, so my caution is unfounded. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor