On Oct 15, 2009, at 10:10 AM, Barry Rowlingson wrote:

On Thu, Oct 15, 2009 at 3:28 PM, Marc Schwartz <marc_schwa...@me.com> wrote:
On Oct 15, 2009, at 3:43 AM, Biedermann, Jürgen wrote:

You don't indicate the OS you are on, but you will want to get a hold of
'pdftotext', which is a command line application that can extract the
textual content from the PDF files.

That's assuming the text is in the PDF as a text object. If it's a
scan of a paper document the chances are that all you have is an
image, in which case you need to do OCR (optical character
recognition) or get someone to type it all in again.

Good point...a scanned image would certainly complicate matters. Even with OCR, you introduce the potential for error in the the translation of the image to text and risk formatting issues, which can lead to inconsistencies in page layouts.

Cheers,

Marc

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to