On Sat, Oct 2, 2010 at 11:31 PM, Nilza BARROS <nilzabar...@gmail.com> wrote: > Dear R-users, > > I would like to know how could I read a file with different lines lengths. > I need read this file and create an output to feed my database. > So after reading I'll need create an output like this > > "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20100910,837460, 39,390)" >
Read the data filling the short lines (i.e. the date and station lines) with NAs. Replace the *s with spaces and compute how many non-NAs are in each row (cnt). Append group which is 1 for lines pertaining to the 1st station, 2 for the 2nd, etc. Then merge it all together in one big data frame, All, and generate a vector of SQL strings: DF <- read.table("d2010100100.txt", fill = TRUE) DF[] <- lapply(DF, function(x) as.numeric(chartr("*", " ", x))) cnt <- rowSums(!is.na(DF)) DF$group <- cumsum(cnt == 4) Merge <- function(x, y) merge(x, y, by = "group") All <- Reduce(Merge, split(DF, cnt)) with(All, sprintf("INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (%04d%02d%02d, %d, %d, %d)", V1.x, V2.x, V3.x, V1.y, V1, V2)) The result looks like this: [1] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 82599, 1008, -9999)" [2] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 83649, 1011, -9999)" [3] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 83649, 1000, 96)" [4] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 83649, 925, 782)" [5] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 83649, 850, 1520)" [6] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 83649, 700, 3171)" [7] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 83649, 500, 5890)" [8] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 83649, 400, 7600)" -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.