Hi, I wrote a function to center variables I use in regression and standardize them by the standard deviation (below) within certain groupings (much like the aggregate function can apply a function to groups). This runs fast enough when I have about 50 groups and 50k records, but sometimes I end up with 1000 groups or so and it slows down considerably. The problem is probably the 'for' loops at the group level but I am having a hard time seeing if there is a good way to vectorize that step. Alternatively, is there a fast function already implemented for this sort of thing?
If you want to run the function on a test data frame (from package MASS), here's the syntax: library(MASS) zscore(data = UScereal, columns = c("calories","protein","sugars"), by = list(mfr = UScereal$mfr, vitamins = UScereal$vitamins)) It returns a data frame with new columns appended. ------------------ zscore <- function(data, columns, by) { means <- aggregate(x = data[,columns], by = by, FUN = mean, na.rm=T) sdevs <- aggregate(x = data[,columns], by = by, FUN = sd, na.rm=T) # Efficient (?) index for 'na' in any 'by' column. NA => FALSE noNA <- (rowSums(is.na(as.data.frame(by))) == 0) for (col in columns) { # Final name for the new column. column <- paste(col,"CMS",sep="") for (i in 1:nrow(means)) { # Allocate objects for indexing on 'by' terms. byTFmean <- by byTFsd <- by for (j in names(by)) { # Construct index for each 'by' term byTFmean[[j]] <- !(data[[j]] == means[[j]][[i]]) byTFsd[[j]] <- !(data[[j]] == sdevs[[j]][[i]]) } # collapse indexes for 'by' using '&' byTFmean <- (rowSums(as.data.frame(byTFmean)) == 0) byTFsd <- (rowSums(as.data.frame(byTFsd)) == 0) data[[column]][noNA & byTFmean & byTFsd] <- ( data[[col]][noNA & byTFmean & byTFsd] - means[[col]][i] ) / sdevs[[col]][i] } } return(data) } ------------------------ Any suggestions are welcome and I'm happy to post back the final code. Best, Krzysztof ----------------------------------------------- Krzysztof Sakrejda-Leavitt Organismic and Evolutionary Biology University of Massachusetts, Amherst 319 Morrill Science Center South 611 N. Pleasant Street Amherst, MA 01003 work #: 413-325-6555 email: sakre...@nsm.umass.edu ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.