My sincere apologies if it looked large. Let me try again with less code. It's hard to do less than that. In fact - there is nothing in this code but 1 formula and many loops, which is the problem I am not sure how to solve. I also tried to be as clear as possible with the comments. Dimitri
## START OF THE CODE TO PRODUCE SMALL DATA EXAMPLE set.seed(123) data<-data.frame(group=c(rep("first",10),rep("second",10)),a=abs(round(rnorm(20,mean=0, sd=.55),2)), b=abs(round(rnorm(20,mean=0, sd=.55),2))) data # "data" it is the data frame to work with ## END OF THE CODE TO PRODUCE SMALL DATA EXAMPLE. In real life "data" would contain up to 150-200 rows PER SUBGROUP ### Specifying useful parameters used in the slow code below: vars<-names(data)[2:3] # names of variables used in transformation; in real life - up to 50-60 variables group.var<-names(data)[1] # name of the grouping variable subgroups<-levels(data[[group.var]]) # names of subgroups; in real life - up to 30 subgroups # OBJECTIVE: # Need to create new variables based on the old ones (a & b) # For each new variable, the value in a given row is a function of (a) 2 constants (that have several levels each), # (b) value of the original variable (e.g., a.ind.to.max"), and the value in the previous row on the same new variable # Plus - it has to be done by subgroup (variable "group") # Defining 2 constants: constant1<-c(1:3) # constant 1 used in transformation - has 3 levels, in real life - up to 7 levels constant2<-seq(.15,.45,.15) # constant 2 used in transformation - has 3 levels, in real life - up to 7 levels ### CODE THAT IS SLOW. Reason - too many loops with the inner-most loop being very slow - as it is looping through rows: for(var in vars){ # looping through variables for(c1 in 1:length(constant1)){ # looping through values of constant1 for(c2 in 1:length(constant2)){ # looping through values of constant2 d=log(0.5)/constant1[c1] l=-log(1-constant2[c2]) name<-paste(var,constant1[c1],constant2[c2]*100,".transf",sep=".") data[[name]]<-NA for(subgroup in subgroups){ # looping through subgroups data[data[[group.var]] %in% subgroup, name][1] = 1-((1-0*exp(1)^d)/(exp(1)^(data[data[[group.var]] %in% subgroup, var][1]*l*10))) ### THIS SECTION IS THE SLOWEST - BECAUSE I AM LOOPING THROUGH ROWS: for(case in 2:nrow(data[data[[group.var]] %in% subgroup, ])){ # looping through rows data[data[[group.var]] %in% subgroup, name][case]= 1-((1-data[data[[group.var]] %in% subgroup, name][case-1]*exp(1)^d)/(exp(1)^(data[data[[group.var]] %in% subgroup, var][case]*l*10))) } ### END OF THE SLOWEST SECTION (INNERMOST LOOP) } } } } ### END OF THE CODE On Fri, Mar 26, 2010 at 5:25 PM, Bert Gunter <gunter.ber...@gene.com> wrote: > Dmitri: > > If you follow the R posting guide you're more likely to get useful replies. > In particular it asks for **small** reproducible examples -- your example is > far more code then I care to spend time on anyway (others may be more > willing or more able to do so of course). I suggest you try (if you haven't > already): > > 1. Profiling the code using Rprof to isolate where the time is spent.And > then... > > 2. Writing a **small** reproducible example to exercise that portion of the > code and post it with your question to the list. If you need to... > Typically, if you do these things you'll figure out how to fix the > situation on your own. > > Cheers, > > Bert Gunter > Genentech Nonclinical Statistics > > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Dimitri Liakhovitski > Sent: Friday, March 26, 2010 2:06 PM > To: r-help > Subject: [R] Competing with SPSS and SAS: improving code that loops > throughrows (data manipulation) > > Dear R-ers, > > In my question there are no statistics involved - it's all about data > manipulation in R. > I am trying to write a code that should replace what's currently being > done in SAS and SPSS. Or, at least, I am trying to show to my > colleagues R is not much worse than SAS/SPSS for the task at hand. > I've written a code that works but it's too slow. Probably because > it's looping through a lot of things. But I am not seeing how to > improve it. I've already written a different code but it's 5 times > slower than this one. The code below takes me slightly above 5 sec for > the tiny data set. I've tried using it with a real one - was not done > after hours. > Need help of the list! Maybe someone will have an idea on how to > increase the efficiency of my code (just one block of it - in the > "DATA TRANSFORMATION" Section below)? > > Below - I am creating the data set whose structure is similar to the > data sets the code should be applied to. Also - I have desribed what's > actually being done - in comments. > Thanks a lot to anyone for any suggestion! > > Dimitri > > ###### CREATING THE TEST DATA SET ################################ > > set.seed(123) > data<-data.frame(group=c(rep("first",10),rep("second",10)),week=c(1:10,1:10) > ,a=abs(round(rnorm(20)*10,0)), > b=abs(round(rnorm(20)*100,0))) > data > dim(data)[1] # !!! In real life I might have up to 150 (!) rows > (weeks) within each subgroup > > ### Specifying parameters used in the code below: > vars<-names(data)[3:4] # names of variables to be transformed > nr.vars<-length(vars) # number of variables to be transformed; !!! > in real life I'll have to deal with up to 50-60 variables, not 2. > group.var<-names(data)[1] # name of the grouping variable > subgroups<-levels(data[[group.var]]) # names of subgroups; !!! in > real life I'll have up to 20-25 subgroups, not 2. > > # For EACH subgroup: indexing variables a and b to their maximum in > that subgroup; > # Further, I'll have to use these indexed variables to build the new ones: > for(i in vars){ > new.name<-paste(i,".ind.to.max",sep="") > data[[new.name]]<-NA > } > > indexed.vars<-names(data)[grep("ind.to.max$", names(data))] # > variables indexed to subgroup max > for(subgroup in subgroups){ > data[data[[group.var]] %in% > subgroup,indexed.vars]<-lapply(data[data[[group.var]] %in% > subgroup,vars],function(x){ > y<-x/max(x) > return(y) > }) > } > data > > ############# DATA TRANSFORMATION ######################################### > > # Objective: Create new variables based on the old ones (a and b ind.to.max) > # For each new variable, the value in a given row is a function of (a) > 2 constants (that have several levels each), > # (b) the corresponding value of the original variable (e.g., > a.ind.to.max"), and the value in the previous row on the same new > variable > # PLUS: - it has to be done by subgroup (variable "group") > > constant1<-c(1:3) # constant 1 used for transformation - > has 3 levels; !!! in real life it will have up to 7 levels > constant2<-seq(.15,.45,.15) # constant 2 used for transformation - > has 3 levels; !!! in real life it will have up to 7 levels > > # CODE THAT IS TOO SLOW (it uses parameters specified in the previous > code section): > start1<-Sys.time() > for(var in indexed.vars){ # looping through variables > for(c1 in 1:length(constant1)){ # looping through levels of constant1 > for(c2 in 1:length(constant2)){ # looping through levels of > constant2 > d=log(0.5)/constant1[c1] > l=-log(1-constant2[c2]) > > name<-paste(strsplit(var,".ind.to.max"),constant1[c1],constant2[c2]*100,"..t > ransf",sep=".") > data[[name]]<-NA > for(subgroup in subgroups){ # looping through subgroups > data[data[[group.var]] %in% subgroup, name][1] = > 1-((1-0*exp(1)^d)/(exp(1)^(data[data[[group.var]] %in% subgroup, > var][1]*l*10))) # this is just the very first row of each subgroup > for(case in 2:nrow(data[data[[group.var]] %in% subgroup, ])){ > # looping through the remaining rows of the subgroup > data[data[[group.var]] %in% subgroup, name][case]= > 1-((1-data[data[[group.var]] %in% subgroup, > name][case-1]*exp(1)^d)/(exp(1)^(data[data[[group.var]] %in% subgroup, > var][case]*l*10))) > } > } > } > } > } > end1<-Sys.time() > print(end1-start1) # Takes me ~0.53 secs > names(data) > data > > -- > Dimitri Liakhovitski > Ninah.com > dimitri.liakhovit...@ninah.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.