If you can use a R <-> java interface, you could use itext to do this as long as the PDF is fairly sane.
see http://itextpdf.com/ It is what pdftk uses. b/w Mark 2010/1/9 David Kane <d...@kanecap.com>: > I have a pdf file that I would like to parse into R: > > http://www.williams.edu/Registrar/geninfo/faculty.pdf > > For now, I open the file in Acrobat by hand, then save it "as text" > and then use readLines(). That works fine but a) I am concerned that > some information may be lost and b) I may be doing this a lot, so I > would rather have R grab the information from the pdf file directly. > > So: is there something like readPDF() for R? > > Thanks, > > Dave Kane > > PS. If you're curious, here is the sort of work that I want to do with > this data: > http://www.ephblog.com/2010/01/08/class-update-and-faculty-ages/ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Dr. Mark Wardle Specialist registrar, Neurology Cardiff, UK ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.