I think a lot would depend on exactly how the data is formatted. I have used 'pdf2text' converters (many freely available on the web) to convert to text and then use R to read-in/preprocess the data to get it into a format to process.
You can invoke these converter with the 'system' function and then read the output file that is generated. I would think that you would have to have some custom code to then interpret the data in the text file depending on how it was created. So I am sure you can do it within R, with some auxiliary functions that are called with 'system', without much trouble. On Fri, Feb 3, 2012 at 4:11 PM, Bryan McCloskey <bmcclos...@usgs.gov> wrote: > All, > > Is anyone familiar with a way to use R to read table data from a large > collection of PDF files? I'm aware there are various command lines and > desktop utilities that might be able to (e.g.,) dump PDFs to text, which > could then be parsed for table data. But I'm hoping there is something more > integrated that could be incorporated into R functions and scripts to handle > large batches of PDFs in a more automated fashion. > > Has anyone used R to extract large amounts of tabular data from PDF documents? > > -bryan > > ------ > Bryan McCloskey, Ph.D. > IT Specialist (Data Management/Internet) > U.S. Geological Survey > St. Petersburg Coastal & Marine Science Center > 600 Fourth St. South > St. Petersburg, FL 33701 > > South Florida Information Access: http://sofia.usgs.gov > Everglades Depth Estimation Network: http://sofia.usgs.gov/eden > Phone: 727.803.8747x3017 * Fax: 727.803.2032 > ------ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.