Dear R-ers, In my question there are no statistics involved - it's all about data manipulation in R. I am trying to write a code that should replace what's currently being done in SAS and SPSS. Or, at least, I am trying to show to my colleagues R is not much worse than SAS/SPSS for the task at hand. I've written a code that works but it's too slow. Probably because it's looping through a lot of things. But I am not seeing how to improve it. I've already written a different code but it's 5 times slower than this one. The code below takes me slightly above 5 sec for the tiny data set. I've tried using it with a real one - was not done after hours. Need help of the list! Maybe someone will have an idea on how to increase the efficiency of my code (just one block of it - in the "DATA TRANSFORMATION" Section below)?
Below - I am creating the data set whose structure is similar to the data sets the code should be applied to. Also - I have desribed what's actually being done - in comments. Thanks a lot to anyone for any suggestion! Dimitri ###### CREATING THE TEST DATA SET ################################ set.seed(123) data<-data.frame(group=c(rep("first",10),rep("second",10)),week=c(1:10,1:10),a=abs(round(rnorm(20)*10,0)), b=abs(round(rnorm(20)*100,0))) data dim(data)[1] # !!! In real life I might have up to 150 (!) rows (weeks) within each subgroup ### Specifying parameters used in the code below: vars<-names(data)[3:4] # names of variables to be transformed nr.vars<-length(vars) # number of variables to be transformed; !!! in real life I'll have to deal with up to 50-60 variables, not 2. group.var<-names(data)[1] # name of the grouping variable subgroups<-levels(data[[group.var]]) # names of subgroups; !!! in real life I'll have up to 20-25 subgroups, not 2. # For EACH subgroup: indexing variables a and b to their maximum in that subgroup; # Further, I'll have to use these indexed variables to build the new ones: for(i in vars){ new.name<-paste(i,".ind.to.max",sep="") data[[new.name]]<-NA } indexed.vars<-names(data)[grep("ind.to.max$", names(data))] # variables indexed to subgroup max for(subgroup in subgroups){ data[data[[group.var]] %in% subgroup,indexed.vars]<-lapply(data[data[[group.var]] %in% subgroup,vars],function(x){ y<-x/max(x) return(y) }) } data ############# DATA TRANSFORMATION ######################################### # Objective: Create new variables based on the old ones (a and b ind.to.max) # For each new variable, the value in a given row is a function of (a) 2 constants (that have several levels each), # (b) the corresponding value of the original variable (e.g., a.ind.to.max"), and the value in the previous row on the same new variable # PLUS: - it has to be done by subgroup (variable "group") constant1<-c(1:3) # constant 1 used for transformation - has 3 levels; !!! in real life it will have up to 7 levels constant2<-seq(.15,.45,.15) # constant 2 used for transformation - has 3 levels; !!! in real life it will have up to 7 levels # CODE THAT IS TOO SLOW (it uses parameters specified in the previous code section): start1<-Sys.time() for(var in indexed.vars){ # looping through variables for(c1 in 1:length(constant1)){ # looping through levels of constant1 for(c2 in 1:length(constant2)){ # looping through levels of constant2 d=log(0.5)/constant1[c1] l=-log(1-constant2[c2]) name<-paste(strsplit(var,".ind.to.max"),constant1[c1],constant2[c2]*100,"..transf",sep=".") data[[name]]<-NA for(subgroup in subgroups){ # looping through subgroups data[data[[group.var]] %in% subgroup, name][1] = 1-((1-0*exp(1)^d)/(exp(1)^(data[data[[group.var]] %in% subgroup, var][1]*l*10))) # this is just the very first row of each subgroup for(case in 2:nrow(data[data[[group.var]] %in% subgroup, ])){ # looping through the remaining rows of the subgroup data[data[[group.var]] %in% subgroup, name][case]= 1-((1-data[data[[group.var]] %in% subgroup, name][case-1]*exp(1)^d)/(exp(1)^(data[data[[group.var]] %in% subgroup, var][case]*l*10))) } } } } } end1<-Sys.time() print(end1-start1) # Takes me ~0.53 secs names(data) data -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.