Hi Uwe, that is what i have now done, following the sugestion from Joris, but unfortuantly that hasn't worked.
Cheers, Tony Breyal Ps. i would like to appologise, i am replying to these posts through google.groups.co.uk and there is a delay beween me posting a reply and that post appearing in the thread. On 16 Nov, 20:34, Uwe Ligges <[EMAIL PROTECTED]> wrote: > Tony Breyal wrote: > > Hi, > > > Uwe -- ahh, thank you kindly, I was able to do a web search after > > reading your post above in order to find a guide on how to set the > > path in windows (i wasn't aware that this is how a file is made > > avaiable to the system). I haven't got it to work yet, but at least > > i'm on the right track! also just after reading your post, i've > > discoverd the system() function in R, what wonderful thing that is! > > > Clair -- I'm still working on getting the files to be accessable to > > the system, but in the mean time i have just discovered the system() > > function in R which is work around for the moment... so using your > > example, you could do: > > ## R code > >> system(paste('"C:/Program Files/xpdf/pdftotext.exe"', '"C:/Documents and > >> Settings/clair/Desktop/test/r-intro.pdf"'), wait=FALSE) > > > the above will create a new text document in your c:/../test folder. > > > Now obviously, we want to use the readPDF() function in package: tm. > > so on my uni laptop, running windows XP, this is what i have done: > > > 1. Click through: start >> control panel >> system > > 2. Click the Advanced tab. > > 3. Click Environment variables. > > 4. Click New (under 'system') to add a new variable name and value. > > 4a. name: pdftotext > > 4b. value: C:\Program Files\xpdf\pdftotext.exe > > 5. Click New (under 'system') to add a new variable name and value. > > 4a. name: pdfinfo > > 4b. value: C:\Program Files\xpdf\pdfinfo.exe > > No, instead of 4 and 5, change the environemnt variable PATH to > > PATH > ...[all what is already in there]...;C:\Program Files\xpdf > > Uwe Ligges > > > > > In theory, i think, that should work. however so far it hasn't, so not > > quite sure what to do. but at least in the mean time we have the system > > () function as a work around. If you can figure out what i'm doing > > wrong (probably something obvious knowing me!) please do let me know. > > > Cheers, > > Tony Breyal > > > On 16 Nov, 18:14, Uwe Ligges <[EMAIL PROTECTED]> wrote: > >> [EMAIL PROTECTED] wrote: > >>> I never said it *should* work. > >>> I was simply trying something out that works on other types of files > >>> I've needed in the past (eg: html, csv, dat, etc.). I don't know the > >>> details of the pdf format, but I thought it was worth a try, certainly > >>> no harm in experimenting, and hence I learned that pdfs aren't stored > >>> in the same way that other files i've used in the past are. that's > >>> fine, good to learn new things. > >>> As for trying the readPDF() function, yes, I have downloaded and used > >>> xpdf to convert pdfs into plain text since reading the OP email. > >>> However, ow you can make xpdf available to the system so that readPDF > >>> () works in R? i don't know, hence why I posted in this thread. > >>> You clearly seem to have a solution, fancy sharing? > >> Sure, I thought that could not be a real question: > >> Set your environment variable PATH so that it additionally points to the > >> directory where these tools are installed. As you would do for any other > >> software that is to be called without knowledge where it is installed. > > >> Uwe Ligges > > >>> Clair Crossupton xx > >>> On 16 Nov, 12:34, Uwe Ligges <[EMAIL PROTECTED]> wrote: > >>>> [EMAIL PROTECTED] wrote: > >>>>> Hello, I was just wondering if you had found a solution? I am having > >>>>> the same difficulty of converting pdf's into plain text documents in > >>>>> R. I originally thought I could use the readLines() function, but as > >>>>> you can see below that did not work. > >>>> Why the hell should it? It is designed to read *text* files. And what > >>>> you get below is exactly how your PDF file looks like if you read it as > >>>> text which it is NOT. Why do you not also go the readPDF() way (and yes, > >>>> it is not always possible nor reliable to go that way). > >>>> Uwe Ligges > >>>>> R> my.destfile <- "C:\\Documents and Settings\\clair\\Desktop\\test\\r- > >>>>> intro.pdf" > >>>>> R> my.url <- "http://cran.r-project.org/doc/manuals/R-intro.pdf" > >>>>> R> download.file(url = my.url, destfile=my.destfile, mode='wb') > >>>>> R> txt <- readLines(my.destfile) > >>>>> R> txt > >>>>> [1] > >>>>> "%PDF-1.4" > >>>>> [2] > >>>>> "%ÐÔÅØ" > >>>>> [3] "1 0 obj > >>>>> <<" > >>>>> [4] "/Length 587 > >>>>> " > >>>>> [5] "/Filter / > >>>>> FlateDecode" > >>>>> [6] > >>>>> ">>" > >>>>> [7] > >>>>> "stream" > >>>>> [8] "xÚmTM [EMAIL > >>>>> PROTECTED]&ÎÁ±?\024tBL\020$ñ°ãd4›½*´.‰\002\001<øï·_•èÌf > >>>>> \017’W¯_wÕ«îrðãc;Šòê`GæUŠOÛV×&³£øç¾ö\006ƒ¤Ê®\027[vïÖæ6ïWÛ7ñÑTÙÖvb > >>>>> \030¯“uYt/N¼.³ó5·½êÿ¢¥=\025åS‚<b¸³¿G› " > >>>>> Warm Regards, > >>>>> Clair > >>>>> On 13 Nov, 15:10, Tony Breyal <[EMAIL PROTECTED]> wrote: > >>>>>> Dear R-Help, > >>>>>> I need to convert a set of '.pdf' files into an equivalent set of > >>>>>> '.txt' files. This is so that i can do some text mining on the > >>>>>> content. > >>>>>> In the latest R-News letter (http://cran.r-project.org/doc/Rnews/ > >>>>>> Rnews_2008-2.pdf), the package 'tm' for text mining is mentioned. In > >>>>>> that lovely package, there is a function called 'readPDF()'. In order > >>>>>> to use this, ?readPDF says > >>>>>> "Note that this PDF reader needs both the tools pdftotext and > >>>>>> pdfinfo installed and accessable on your system." > >>>>>> These tools are available fromhttp://www.foolabs.com/xpdf/download.html > >>>>>> I am able to download this and use it easily from a dos window to > >>>>>> convert a pdf file into a txt file. > >>>>>> Question: how do i make these tools available to R, so that i can use > >>>>>> the readPDF() function? > >>>>>> Thank you in advance for any help, and I hope the above made sense. > >>>>>> Tony Breyal > >>>>>> ###OS = Windows Vista Ultimate>> sessionInfo() > >>>>>> R version 2.8.0 (2008-10-20) > >>>>>> i386-pc-mingw32 > >>>>>> locale: > >>>>>> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. > >>>>>> 1252;LC_MONETARY=English_United Kingdom. > >>>>>> 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > >>>>>> attached base packages: > >>>>>> [1] grid stats graphics grDevices utils datasets > >>>>>> methods base > >>>>>> other attached packages: > >>>>>> [1] tm_0.3-1 XML_1.98-1 Snowball_0.0-3 > >>>>>> RWeka_0.3-14 rJava_0.6-0 Matrix_0.999375-16 > >>>>>> lattice_0.17-15 filehash_2.0 > >>>>>> loaded via a namespace (and not attached): > >>>>>> [1] proxy_0.4-1 > >>>>>> ______________________________________________ > >>>>>> [EMAIL PROTECTED] mailing > >>>>>> listhttps://stat.ethz.ch/mailman/listinfo/r-help > >>>>>> PLEASE do read the posting > >>>>>> guidehttp://www.R-project.org/posting-guide.html > >>>>>> and provide commented, minimal, self-contained, reproducible code. > >>>>> ------------------------------------------------------------------------ > >>>>> ______________________________________________ > >>>>> [EMAIL PROTECTED] mailing list > >>>>>https://stat.ethz.ch/mailman/listinfo/r-help > >>>>> PLEASE do read the posting > >>>>> guidehttp://www.R-project.org/posting-guide.html > >>>>> and provide commented, minimal, self-contained, reproducible code. > >>>> ______________________________________________ > >>>> [EMAIL PROTECTED] mailing > >>>> listhttps://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting > >>>> guidehttp://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>> ______________________________________________ > >>> [EMAIL PROTECTED] mailing list > >>>https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting > >>> guidehttp://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> ______________________________________________ > >> [EMAIL PROTECTED] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > > [EMAIL PROTECTED] mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > [EMAIL PROTECTED] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.