Dear R community, I have the following problem I hoped you could help me with.
My data is save in thousand of files with a weird extension containing for numbers and a z. For example *.1405z. With list.files I managed to load this data into R. It looks like this (the row numbers are not in the original file): 35 :LATEST STAGE 3.60 FT AT 730 AM CST ON 0102 36 .ER ARCT2 0102 C DC200001020813/DH12/HGIFF/DIH6 37 :QPF FORECAST 6AM NOON 6PM MDNT 38 .E1 :0102: / 3.5/ 3.4/ 3.5 39 .E2 :0103: / 3.5/ 3.0/ 2.5/ 2.1 40 .E3 :0104: / 1.8/ 1.5/ 1.3/ 1.2 41 .E4 :0105: / 1.2/ 1.8/ 2.3/ 2.7 42 .E5 :0106: / 3.0/ 3.0/ 3.1/ 3.3 43 .E6 :0107: / 3.4 I need the table in rows 37 to 43 in a matrix, for example: 0201 NA 3.5 3.4 3.5 0103 3.5 3.0 2.5 2.1 0104 1.8 1.5 1.3 1.2 0105 1.2 1.8 2.3 2.7 0106 3.0 3.0 3.1 3.3 0107 3.4 NA NA NA Unfortunately the row numbers vary per file. I can call up each line with file[40,1] for line 40 for example. It returns: [1] .E3 :0104: / 1.8/ 1.5/ 1.3/ 1.2 38 Levels: .E1 :0102: / 3.5/ 3.4/ 3.5 ... So I have two problems really: 1. How do I detect the table in the file (resp. the line where the table starts)? 2. How do I break up each line to write the values into a matrix? Feel free to suggest an entirely different approach if you think that is helpful. Thanks a lot! Frauke -- View this message in context: http://r.789695.n4.nabble.com/extracting-data-from-unstructured-text-file-tp4464423p4464423.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.