Em 04-05-2012 11:00, jeff6868 <geoffrey_kl...@etu.u-bourgogne.fr> escreveu:
Date: Thu, 3 May 2012 06:45:59 -0700 (PDT)
From: jeff6868<geoffrey_kl...@etu.u-bourgogne.fr>
To:r-help@r-project.org
Subject: [R] add an automatized linear regression in a function
Message-ID:<1336052759474-4606047.p...@n4.nabble.com>
Content-Type: text/plain; charset=us-ascii

Dear R users,

For the moment, I have a script and a function which calculates correlation
matrices between all my data files. Then, it chooses the best correlation
for each data and take it in order to fill missing data in the analysed file
(so the data from the best correlation file is put automatically into the
missing data gaps of the first file (because my files are containing missing
values (NAs))). If the best correlated file doesn't contain data , it takes
the data from the second best correlated file.
The problem is that for the moment, it takes raw data from the best
correlated file.

So I need to adapt this raw data to the file that is going to be filled. As
a consequence, I'd like to automatize the calculation of a linear regression
(after the selection of the best or the second best correlated data file)
between the two files.
Instead of taking the raw data from the best correlated file to fill the
first one, it should take the estimated data from the regression to fill it
(in order to have more precise filled data).
The idea is so to do an lm() between these two files, to extract the
coefficients of the straight line (from the regression) and to calculate the
estimated data for all my file (NA included), and finally to fill the gaps
with this estimated data. Hope you've understand my problem.
Here's the function:

process.all<- function(df.list, mat){
         f<- function(station)
              na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]])

         g<- function(station){
         x<- df.list[[station]]
         if(any(is.na(x$data))){
                 mat[row(mat) == col(mat)]<- -Inf
                 nas<- which(is.na(x$data))
                 ord<- order(mat[station, ], decreasing = TRUE)[-c(1,
ncol(mat))]
                 for(i in nas){
                         for(y in ord){
                                 if(!is.na(df.list[[y]]$data[i])){
                                         x$data[i]<- df.list[[y]]$data[i]
                                         break
                                 }
                         }
                 }
         }
         x
     }

         n<- length(df.list)
         nms<- names(df.list)
         max.cor<- sapply(seq.int(n), get.max.cor, corhiver2008capt1)
         df.list<- lapply(seq.int(n), f)
         df.list<- lapply(seq.int(n), g)
         names(df.list)<- nms
         df.list
     }

I succeded for a small data.frame I've created, but I don't know how to do
it in this particular case.
Thanks a lot for your help!

Statistically speaking, I don't believe in what you want, but a solution could be

na.fill <- function(x, y){
    i <- is.na(x$data)
    xx <- y$data
    new <- data.frame(xx=xx)
    x$data[i] <- predict(lm(x$data~xx, na.action=na.exclude), new)[i]
    x
}

and in process.all, change function g() to

    g <- function(station){
        x <- df.list[[station]]
        if(any(is.na(x$data))){
            mat[row(mat) == col(mat)] <- -Inf
            nas <- which(is.na(x$data))
ord <- order(mat[station, ], decreasing = TRUE)[-c(1, ncol(mat))]
            for(y in ord){
                if(all(!is.na(df.list[[y]]$data[nas]))){
                    xx <- df.list[[y]]$data
                    new <- data.frame(xx=xx)
x$data[nas] <- predict(lm(x$data~xx, na.action=na.exclude), new)[nas]
                    break
                }
            }
        }
        x
    }


Hope this helps,

Rui Barradas

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to