I am needing to access the text in hundreds of Microsoft .doc files on
an Ubuntu OS. I looked at win32 , but only saw support for windows. I am
going through all of these files to create a fairly simple text
delimited file for a spreadsheet.

A) Batch convert to text files so I can access them
B) import some module that allows me to decode this format
C) Open Office allows batch conversion to .odc ,but still don't know how
to access
D) Buy a 24 pack, some Twinkies, and go watch David Hasselhoff reruns

Opening .txt documents works fine.

Currently get:

inFile = open("myTestFile.doc", "r")
testRead = inFile.read()

Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    test = inFile.read()
  File "/usr/lib/python3.0/io.py", line 1728, in read
    decoder.decode(self.buffer.read(), final=True))
  File "/usr/lib/python3.0/io.py", line 1299, in decode
    output = self.decoder.decode(input, final=final)
  File "/usr/lib/python3.0/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
invalid data

Any help greatly appreciated Thanks bunches.




_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to