[copied to list for posterity...]
Sorry. I am completely wrong. I've been using itext to split, fill in
forms and recombine PDF so assumed (wrongly) that text extraction was
possible.
In fact, reading the mailing lists is quite informative - clearly PDF
is not designed for this.
Try this
http:
ep="")
funentries <- paste("\\indexentry ", "{", entrymat[,1],"}{",
entrymat[,2], "}",sep="")
write(funentries, fdxfile)
system(paste("makeindex -o", fndfile, fdxfile))
}
Jo
If you can use a R <-> java interface, you could use itext to do this
as long as the PDF is fairly sane.
see http://itextpdf.com/
It is what pdftk uses.
b/w
Mark
2010/1/9 David Kane :
> I have a pdf file that I would like to parse into R:
>
> http://www.williams.edu/Registrar/geninfo/faculty.p
David Kane a écrit :
I have a pdf file that I would like to parse into R:
http://www.williams.edu/Registrar/geninfo/faculty.pdf
For now, I open the file in Acrobat by hand, then save it "as text"
and then use readLines(). That works fine but a) I am concerned that
some information may be lost
t-Jan
~~
In the face of ambiguity, refuse the temptation to guess.
~~
--- On Sat, 1/9/10, Barry Rowlingson wrote:
From: Barry Rowlingson
Subject: Re: [R] pa
On Sat, Jan 9, 2010 at 1:11 PM, David Kane wrote:
> I have a pdf file that I would like to parse into R:
>
> http://www.williams.edu/Registrar/geninfo/faculty.pdf
>
> For now, I open the file in Acrobat by hand, then save it "as text"
> and then use readLines(). That works fine but a) I am concern
I have a pdf file that I would like to parse into R:
http://www.williams.edu/Registrar/geninfo/faculty.pdf
For now, I open the file in Acrobat by hand, then save it "as text"
and then use readLines(). That works fine but a) I am concerned that
some information may be lost and b) I may be doing th
7 matches
Mail list logo