I think a lot would depend on exactly how the data is formatted.  I
have used 'pdf2text' converters (many freely available on the web) to
convert to text and then use R to read-in/preprocess the data to get
it into a format to process.

You can invoke these converter with the 'system' function and then
read the output file that is generated.  I would think that you would
have to have some custom code to then interpret the data in the text
file depending on how it was created.

So I am sure you can do it within R, with some auxiliary functions
that are called with 'system', without much trouble.

On Fri, Feb 3, 2012 at 4:11 PM, Bryan McCloskey <bmcclos...@usgs.gov> wrote:
> All,
>
> Is anyone familiar with a way to use R to read table data from a large 
> collection of PDF files? I'm aware there are various command lines and 
> desktop utilities that might be able to (e.g.,) dump PDFs to text, which 
> could then be parsed for table data. But I'm hoping there is something more 
> integrated that could be incorporated into R functions and scripts to handle 
> large batches of PDFs in a more automated fashion.
>
> Has anyone used R to extract large amounts of tabular data from PDF documents?
>
> -bryan
>
> ------
> Bryan McCloskey, Ph.D.
> IT Specialist (Data Management/Internet)
> U.S. Geological Survey
> St. Petersburg Coastal & Marine Science Center
> 600 Fourth St. South
> St. Petersburg, FL 33701
>
> South Florida Information Access: http://sofia.usgs.gov
> Everglades Depth Estimation Network: http://sofia.usgs.gov/eden
> Phone: 727.803.8747x3017 * Fax: 727.803.2032
> ------
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to