Re: [R] index instead of loop?

Rui Barradas Thu, 08 Mar 2012 22:18:19 -0800

Hello,

> Humm.... If I understand what you are saying, you are correct. I get
> 144.138 for 2009-03-20 for column C. Maybe I posted the wrong code?  If
> so,
> sorry.


I think I have the fastest so far solution, and it checks with your
corrected,last one.

I've made just a change: to transform it into a function I renamed the
parameters
(only for use inside the function) 'zdates', without the period, 'rddata'
and 'uadata'.

'fun1' is yours, 'fun2', mine. Here it goes.


fun1 <- function(zdates, rddata, uadata){
    fix = function(x)
    {
      year = substring(x, 1, 4)
      mo = substring(x, 5, 6)
      day = substring(x, 7, 8)
      ifelse(year=="--", "--", paste(year, mo, day, sep = "-"))

    }
    rd = apply(rddata, 2, fix)
    dimnames(rd) = dimnames(rd)

    wd1 <- seq(from =as.Date(min(zdates)), to = Sys.Date(), by = "day")
    #wd1 = wd1[weekdays(wd1) == "Friday"] # uncomment to go weekly
    wd = sapply(wd1, as.character)
    mat = matrix(NA,nrow=length(wd),ncol=ncol(uadata))
    rownames(mat) = wd
    nms = as.Date(rownames(uadata))

    for(i in 1:length(wd)){
      d = as.Date(wd[i])
      diff = abs(nms - d)
      rd_row_idx = max(which(diff == min(diff)))
      rd_col_idx = which(as.Date(rd[rd_row_idx,], format="%Y-%m-%d")  < d)
      rd_col_idx_lag = which(as.Date(rd[rd_row_idx - 1,], format="%Y-%m-%d") 
< d)
      rd_col_idx_lag2 = which(as.Date(rd[rd_row_idx - 2,],
format="%Y-%m-%d")  < d)

      if(length(rd_col_idx_lag2) && (rd_row_idx - 2) > 0){
        mat[i,rd_col_idx_lag2] = uadata[rd_row_idx - 2,rd_col_idx_lag2]
      }
      if(length(rd_col_idx_lag)){
        mat[i,rd_col_idx_lag] = uadata[rd_row_idx - 1,rd_col_idx_lag]
      }
      if( length(rd_col_idx)){
        mat[i,rd_col_idx] = uadata[rd_row_idx,rd_col_idx]
      }
    }
    colnames(mat)=colnames(uadata)
    mat
}

fun2 <- function(zdates, rddata, uadata){

        fdate <- function(x, format="%Y%m%d"){
                DF <- data.frame(x)
                for(i in colnames(DF)){
                        DF[, i] <- as.Date(DF[, i], format=format)
                        class(DF[, i]) <- "Date"
                }
                DF
        }

        rddata <- fdate(rddata)
        wd1 <- seq(from = as.Date(zdates[1]), to = Sys.Date(), by = "day")
        nwd1 <- length(wd1)

        fin1 <- matrix(NA, nrow=length(wd1), ncol=ncol(uadata))
        nr <- nrow(rddata)
        xstart <- c(integer(nr), nwd1)
        for(j in 1:ncol(uadata)){
                x <- xstart
                for(i in 1:nr)
                        x[i] <- if(!is.na(rddata[i, j]) & !is.nan(rddata[i, j]))
                                        which(wd1 == rddata[i, j])
                                        else NA
                ix <- which(!is.na(x))
                for(i in seq_len(length(ix) - 1)){
                        from <- x[ ix[i] ] + 1
                        to   <- x[ ix[i + 1] ]
                        fin1[ from:to, j ] <- uadata[ ix[i], j ]
                }
        }
        colnames(fin1) <- colnames(uadata)
        rownames(fin1) <- as.character(wd1)
        fin1
}

t1 <- system.time(m1 <- fun1(z.dates, rd1, ua))
t2 <- system.time(m2 <- fun2(z.dates, rd1, ua))

all.equal(m1, m2)
[1] TRUE

rbind(t1, t2)
   user.self sys.self elapsed user.child sys.child
t1      1.50        0    1.50         NA        NA
t2      0.02        0    0.01         NA        NA

And the better news is that I believe it scales up without degrading
performance,
like my first did.

See if it works.

Rui Barradas



--
View this message in context: 
http://r.789695.n4.nabble.com/index-instead-of-loop-tp4447672p4457290.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] index instead of loop?

Reply via email to