Hi all, I'm trying to read a data set into R, but the file is messy, so I have to do it partially. The whole data is in a .txt file, and the values are separated by a space. So far ok. The problem is that in this file, not all the lines have the same number of elements, and the reading stops. And I loose the reading of the previous lines.
ex. of data set: 11 12 13 21 22 23 31 32 33 34 41 42 43 44 51 52 53 61 62 63 64 71 72 73 74 75 81 82 (...) If I use the following: > aux <- read.table(file="data.txt", sep=" ", header=F, > colClasses="numeric") it stops the reading with the error message: > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : line 3 did not have 3 elements Calls: read.table -> scan and I loose the reading of the previous reading. And since I'm running my data in a cluster (it's actually a big data set) the error halts my execution. What I tried at first was to do > aux1 <- read.table(file="data.txt", sep=" ", header=F, > colClasses="numeric", nrow=2) > aux2 <- read.table(file="data.txt", sep=" ", header=F, > colClasses="numeric", skip=2, nrow=2) > aux3 <- read.table(file="data.txt", sep=" ", header=F, > colClasses="numeric", skip=4, nrow=1) > (...) This procedure works. However, I have about 5000 lines to read, and I don't know precisely which ones are messy. So what I have to do, to keep the above procedure is: 1. try to read data set 2. check error message to find out which line has different size 3. read data set for the block of same sized lines (aux1) 4. read data set skipping the lines read in aux1; check error message to find out which line has different size 5. read data set for second block of same sized lines (aux2) 6. read data set skipping the lines read in aux1 and aux2; check error message to find out which line has different size (and so on) So, if I had only a hundred lines, this would be OK, but I have a few thousands, and It'll take me forever to finish reading if I need to read block by block and check manually where is the problem. My question is: is there anyway I can read my data with some "if's" or "while's" to control the read.table? What I'd like to do is something like: 1. read data set while all lines has the same length 2. if a line has different length from the previous ones, store what was read in a variable and abort reading 3. start reading data set from the line where it stopped, and read it while all lines has the same length 4. if a line has different length from the previous ones, store what was read in a variable and abort reading 5. start reading data set from the line where it stopped, and read it while all lines has the same length 6. if a line has different length from the previous ones, store what was read in a variable and abort reading (and so on until the whole data set was finally read) This would make the program run by itself, and solve my problem. It's OK if it returns a couple of variables, I can just bind them and assemble my data set as I need, since I know how it should look like in the end. Thanks in advance for suggestions! Beatriz -- View this message in context: http://r.789695.n4.nabble.com/reading-partial-data-set-tp4169210p4169210.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.