Hello,
jeff6868 wrote > > Hi Sarah, > > Thank you for your answer. > Yes I know that my proposition is not necessary the better way to do it. > But my problem concerns only big gaps of course (more than half a day of > missing data, till several months of missing data). > I've already filled small gaps with the interpolation that you were > talking in your message (with the function na.approx of the package zoo). > For the study, it's not important to have perfectly identical values > between the 2 correlated stations, because I'll calculate after the > reconstruction the daily mean of each station. For my boss, it's enough to > work on daily means. But before that, I need to rebuild the big missing > data gaps of my stations (by the way I explained in the first message of > my topic). > Do you have any idea of the way to do it on R according to my first post? > I forgot to precise that my examples are completely fakes! I chose these > numbers in order for you to understand what I want to do (I chose easy and > readable numbers). I tested on excel with 2 stations, it was not too bad > when I filled the gaps (between the data of the 2 well correlated > stations). > I remember this data set from some time ago. (Weeks?) First of all, please use ?dput to post your data, it makes it much easier for everyone to just copy and paste to an R session. The output you should post looks like this: > dput(s1) structure(list(time = c("01/01/2008 00:00", "01/01/2008 00:15", "01/01/2008 00:30", "01/01/2008 00:45"), data = c(1L, 2L, NA, 4L)), .Names = c("time", "data"), row.names = c(NA, -4L), class = "data.frame") > dput(s2) structure(list(time = c("01/01/2008 00:00", "01/01/2008 00:15", "01/01/2008 00:30", "01/01/2008 00:45"), data = 8:11), .Names = c("time", "data"), row.names = c(NA, -4L), class = "data.frame") > dput(s3) structure(list(time = c("01/01/2008 00:00", "01/01/2008 00:15", "01/01/2008 00:30", "01/01/2008 00:45"), data = c(123L, NA, NA, NA)), .Names = c("time", "data"), row.names = c(NA, -4L), class = "data.frame") > dput(m) structure(c(1, 0.9, 0.8, 0.9, 1, 0.7, 0.8, 0.7, 1), .Dim = c(3L, 3L), .Dimnames = list(c("Station1", "Station2", "Station3"), c("Station1", "Station2", "Station3"))) I've named your data.frames 's1', 's2' and made up an 's3'; 'm' is the correlation matrix. Now the problem. Sarah's comment seems sensible, to just fill in missing values using some other dataset isn't very canonic but here it goes. It assumes the data frames are in a list. lst <- list(s1, s2, s3) names(lst) <- paste("Station", seq.int(length(lst)), sep="") lst # station - list number or name, not the data.frame # mat - correlation matrix get.max.cor <- function(station, mat){ mat[row(mat) == col(mat)] <- -Inf which( mat[station, ] == max(mat[station, ]) ) } # x - data.frame to be transformed # y - data.frame with greater correlation na.fill <- function(x, y){ i <- is.na(x$data) x$data[i] <- y$data[i] x } mx.cor <- get.max.cor(1, m) mx.cor na.fill(lst[[1]], lst[[mx.cor]]) Like it's said in the comments before the function, the call to the first function could be get.max.cor("Station1", m) The two functions above solve the problem, all what's left to do is to automate their calls. Note that there might be a need for two passes through 'na.fill', if the data.frame with greater correlation also has NAs. This is the case of Station1 filling in values for Station3. Try commenting out the second pass in the function below process.all <- function(df.list, mat){ f <- function(station) na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]]) # n <- length(df.list) nms <- names(df.list) # First the max on each row max.cor <- sapply(seq.int(n), get.max.cor, m) # Note the two passes df.list <- lapply(seq.int(n), f) df.list <- lapply(seq.int(n), f) # Makes nicer output names(df.list) <- nms df.list } process.all(lst, m) Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4580845.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.