joe1985 wrote: > Hello > > I have around 200 PDF-documents, containing data i want organized in R as a > dataframe. The PDF-documents look like this; > > http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver.jpeg > > or like this; > > http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver%2B2.jpeg > > So i want to pull out the data in coloured boxes it become organized like > this (just in R instead of excel); > > > http://www.nabble.com/file/p21667074/PRRS-billede%2Bexcel.jpeg > > So the 0'es and 1'es represent when either "PRRS-neg" occurs presented by a > 0 in the colums PRRS-VAC and PRRS-DK on a particular date. And the same with > "PRRS-pos VAC" or "Vac" presented by a 1 in the colum PRRS-VAC, and > "PRRS-pos DK" or "DK" presented by a 1 in the colum PRRS-DK. And also with > "sanVAC" there should be a 1 in the colum VACsan, and with "sanDK" there > should be a 1 in the colum DKsan. The first date for each "CHR-nr" should > either be the earliest date ne the red box (as in the first picture), or the > date with word "før" before the date (as in the second picture). All the 200 > PDF-documents looks like the ones in the pictures, each reprenting a > different "CHR-nr" > > > Hope you can help me
Not on the basis of .jpeg files, I think. We'd need some indication of what the PDF looks like inside. There's a tool called pdftotext, which might do something for you, IF you can figure out reliably where your data begin and end. -- O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.