Thanks guys, as I wanted to do a little preprocessing before importing into tm (the files have all sorts of stuff in them that I don't need), I used a "system" to invoke Abiword and do the batch conversions. Mark ------------------------------------------------------------ Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail "The real problem is not whether machines think but whether men do." -- B. F. Skinner ****************************************************************** On Tue, Aug 18, 2009 at 10:56 AM, Ingo Feinerer <feine...@logic.at> wrote: > On Tue, Aug 18, 2009 at 12:00:07PM +0200, Mark Kimpel wrote: > > I am familiar with packages that read and write Excel files on both > Windows > > and Linux platforms. > > > > Do any packages provide similar functionality for MS Word files? I have a > > lot of text processing to do and the text is embedded in ~200 different > Word > > files (.doc format Office 2003). All I need to do is read, not write. > > See readDOC in package tm. E.g., something like > > Corpus(DirSource("aDirectoryContainingTheWordFiles"), readerControl = > list(reader = readDOC)) > > Note that you need antiword (http://www.winfield.demon.nl/) in your > path such that readDOC can use it. > > Best regards, Ingo > > -- > Ingo Feinerer > Vienna University of Technology > http://www.dbai.tuwien.ac.at/staff/feinerer > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.