Thanks guys, as I wanted to do a little preprocessing before importing into
tm (the files have all sorts of stuff in them that I don't need), I used a
"system" to invoke Abiword and do the batch conversions. Mark
------------------------------------------------------------
Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail

"The real problem is not whether machines think but whether men do." -- B.
F. Skinner
******************************************************************


On Tue, Aug 18, 2009 at 10:56 AM, Ingo Feinerer <feine...@logic.at> wrote:

> On Tue, Aug 18, 2009 at 12:00:07PM +0200, Mark Kimpel wrote:
> > I am familiar with packages that read and write Excel files on both
> Windows
> > and Linux platforms.
> >
> > Do any packages provide similar functionality for MS Word files? I have a
> > lot of text processing to do and the text is embedded in ~200 different
> Word
> > files (.doc format Office 2003). All I need to do is read, not write.
>
> See readDOC in package tm. E.g., something like
>
> Corpus(DirSource("aDirectoryContainingTheWordFiles"), readerControl =
> list(reader = readDOC))
>
> Note that you need antiword (http://www.winfield.demon.nl/) in your
> path such that readDOC can use it.
>
> Best regards, Ingo
>
> --
> Ingo Feinerer
> Vienna University of Technology
> http://www.dbai.tuwien.ac.at/staff/feinerer
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to