PDF files are actually "programs" that place graphic symbols on pages, and the 
order in which those symbols are placed (the order in which most pdf-to-text 
conversions return characters) may have nothing to do with how they appear 
visually. There is not even a guarantee that those symbols are represented as 
characters in the file... they could be part of embedded bitmaps.

In summary, you need to review what your "pdf_text" function is able to extract 
from your files without filtering... it may or may not be consistent enough to 
allow you to do what you want... and we certainly have no idea what it is able 
to extract from your files.

On May 13, 2020 6:33:03 AM PDT, Manish Mukherjee <manishmukher...@hotmail.com> 
wrote:
>Hi All,
>
>Need some help with the following code , i have a number of pdf files ,
>and the first page of those files gives a currency value $xxx,xxx,xxx .
>How to extract this value from a number of PDF files and put it in a
>data frame . I am able to do it for a single file
>with the code where opinions is the text data and 1 is the first
>currency value
>```
>d=str_nth_currency(opinions, 1)
>df = subset(d, select = c(amount) )
>df
>
>I want this to loop over multiple pdf files
>
>I have tried somesthing like this but not working
>for (i in 1:length(files)){
>  print(i)
>  pdf_text(paste("filepath ", files[i],sep = ""))
>  str_nth_currency(files[i], 1)
>}
>
>
>Please help.
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to