[Rd] improvements to plm fitting

Kyle Matoba Thu, 16 Sep 2010 01:15:56 -0700

In the course of some work I have been doing for Revolution Analytics I have
had the necessity of modifying the plm function so that it would not die
halfway through fitting.  In particular, I was able to more than halve the
runtime (for my particular run) and improve its memory usage with three
small modifications:


1.) Replacing throughout apply(X, 2, mean) with colMeans, similarly with
colSums()

2.) In pdata.frame()
Replacing
        #    n <- length(Ti)
        #    time <- c()
        #    for (i in 1:n){
        #        time <- c(time, 1:Ti[i])
        #    }

with 'time <- sequence(Ti)'

3.) To uncork the particular bottleneck I was experiencing in Tapply (the
fitting would die halfway through a massive tapply() ) I have modified the
function to process things in chunks.  By still using tapply we do not give
up too much efficiency and gain the ability to fit much larger models.  Here
is the down-and-dirty code, set at the moment to do everything in one go,
but controllable via 'num_blocks' or 'block_size'.  A nice way to handle
this would be for it to be left as a parameter that, by default, is set to
do the entire data set at once.

Tapply.default <- function (x, effect, func, ...)
{
    na.x <- is.na(x)
    effect_unique <- unique(effect)
    n_effects <- length( effect_unique )
    uniqval <- array(dim=n_effects)
    attr(uniqval, "dimnames")[[1]] <- as.character(effect_unique)

    # change this back so that it can handle larger datasets
    block_size <- n_effects
    num_blocks <- ceiling( n_effects / block_size )
    for( i in 1:num_blocks ){
        these_ind <- ((i-1)*block_size + 1):min(n_effects, (i*block_size))
        these_effects <- effect_unique[ these_ind ]
        this_x        <- x[ effect %in% these_effects ]
        this_effect   <- factor(effect[ effect %in% these_effects ] )
        uniqval[these_ind] <- tapply(this_x, this_effect, func, ...)
    }

    nms <- attr(uniqval, "dimnames")[[1]]
    attr(uniqval, "dimnames") <- attr(uniqval, "dim") <- NULL
    names(uniqval) <- nms
    result <- uniqval[as.character(effect)]
    result[na.x] <- NA
    result
}


Again, Revolution Analytics is to thank for these improvements, should they
make it into the package.  I am happy to work with the authors to see that
this is incorporated.

Thanks, as always, to Yves and everyone else volunteering their time and
expertise.

Kyle

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] improvements to plm fitting

Reply via email to