I am needing to access the text in hundreds of Microsoft .doc files on an Ubuntu OS. I looked at win32 , but only saw support for windows. I am going through all of these files to create a fairly simple text delimited file for a spreadsheet.
A) Batch convert to text files so I can access them B) import some module that allows me to decode this format C) Open Office allows batch conversion to .odc ,but still don't know how to access D) Buy a 24 pack, some Twinkies, and go watch David Hasselhoff reruns Opening .txt documents works fine. Currently get: inFile = open("myTestFile.doc", "r") testRead = inFile.read() Traceback (most recent call last): File "<pyshell#11>", line 1, in <module> test = inFile.read() File "/usr/lib/python3.0/io.py", line 1728, in read decoder.decode(self.buffer.read(), final=True)) File "/usr/lib/python3.0/io.py", line 1299, in decode output = self.decoder.decode(input, final=final) File "/usr/lib/python3.0/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: invalid data Any help greatly appreciated Thanks bunches.
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor