On Mon, Jan 26, 2009 at 9:40 AM, Peter Dalgaard <p.dalga...@biostat.ku.dk> wrote: > joe1985 wrote: >> Hello >> >> I have around 200 PDF-documents, containing data i want organized in R as a >> dataframe. The PDF-documents look like this; >> >> http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver.jpeg >> >> or like this; >> >> http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver%2B2.jpeg >> >> So i want to pull out the data in coloured boxes it become organized like >> this (just in R instead of excel); >> >> >> http://www.nabble.com/file/p21667074/PRRS-billede%2Bexcel.jpeg >> >> So the 0'es and 1'es represent when either "PRRS-neg" occurs presented by a >> 0 in the colums PRRS-VAC and PRRS-DK on a particular date. And the same with >> "PRRS-pos VAC" or "Vac" presented by a 1 in the colum PRRS-VAC, and >> "PRRS-pos DK" or "DK" presented by a 1 in the colum PRRS-DK. And also with >> "sanVAC" there should be a 1 in the colum VACsan, and with "sanDK" there >> should be a 1 in the colum DKsan. The first date for each "CHR-nr" should >> either be the earliest date ne the red box (as in the first picture), or the >> date with word "før" before the date (as in the second picture). All the 200 >> PDF-documents looks like the ones in the pictures, each reprenting a >> different "CHR-nr" >> >> >> Hope you can help me > > Not on the basis of .jpeg files, I think. We'd need some indication of > what the PDF looks like inside. There's a tool called pdftotext, which > might do something for you, IF you can figure out reliably where your > data begin and end.
An alternative is to outsource the problem. You can get very reasonable data entry quotes from sites like http://www.elance.com/, and depending on how much you value your time this might end up being a much cheaper option than figuring out how to do it programmatically (but not as intellectually satisfying). Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.