Re: [R] "Complex?" import of pdf files (criminal records) into R table

Marc Schwartz Thu, 15 Oct 2009 08:26:31 -0700

On Oct 15, 2009, at 10:10 AM, Barry Rowlingson wrote:

On Thu, Oct 15, 2009 at 3:28 PM, Marc Schwartz<marc_schwa...@me.com> wrote:

On Oct 15, 2009, at 3:43 AM, Biedermann, Jürgen wrote:

You don't indicate the OS you are on, but you will want to get ahold of
'pdftotext', which is a command line application that can extract the
textual content from the PDF files.


That's assuming the text is in the PDF as a text object. If it's a
scan of a paper document the chances are that all you have is an
image, in which case you need to do OCR (optical character
recognition) or get someone to type it all in again.

Good point...a scanned image would certainly complicate matters. Evenwith OCR, you introduce the potential for error in the the translationof the image to text and risk formatting issues, which can lead toinconsistencies in page layouts.


Cheers,

Marc

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] "Complex?" import of pdf files (criminal records) into R table

Reply via email to