I meant - even if 0 = 0.004 D. On Tue, Mar 30, 2010 at 12:47 PM, Dimitri Liakhovitski <ld7...@gmail.com> wrote: > Dear Charles, thank you so much! > On my example data frame you code takes 0 sec and mine - 0.05 sec - a > huge difference even if 0 = 0.04 sec. > Dimitri > > > On Tue, Mar 30, 2010 at 12:30 PM, Dimitri Liakhovitski <ld7...@gmail.com> > wrote: >> Thanks a lot, Charles - I'll try your approach. >> Yes - don't worry about dividing by negative means - in real data all >> values are positive. >> Dimitri >> >> On Tue, Mar 30, 2010 at 12:24 PM, Charles C. Berry <cbe...@tajo.ucsd.edu> >> wrote: >>> On Tue, 30 Mar 2010, Dimitri Liakhovitski wrote: >>> >>>> Dear R-ers, >>>> >>>> I have a large data frame (several thousands of rows and about 2.5 >>>> thousand columns). One variable ("group") is a grouping variable with >>>> over 30 levels. And I have a lot of NAs. >>>> For each variable, I need to divide each value by variable mean - by >>>> subgroup. I have the code but it's way too slow - takes me about 1.5 >>>> hours. >>>> Below is a data example and my code that is too slow. Is there a >>>> different, faster way of doing the same thing? >>>> Thanks a lot for your advice! >>>> >>>> Dimitri >>>> >>>> >>>> # Building an example frame - with groups and a lot of NAs: >>>> set.seed(1234) >>>> >>>> frame<-data.frame(group=rep(paste("group",1:10),10),a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100)) >>> >>> >>> Use model.matrix and crossprod to do this in a vectorized fashion: >>> >>>> mat <- as.matrix(frame[,-1]) >>>> mm <- model.matrix(~0+group,frame) >>>> col.grp.N <- crossprod( !is.na(mat), mm ) >>>> mat[is.na(mat)] <- 0.0 >>>> col.grp.sum <- crossprod( mat, mm ) >>>> mat <- mat / ( t(col.grp.sum/col.grp.N)[ frame$group,] ) >>>> is.na(mat) <- is.na(frame[,-1]) >>>> >>> >>> mat is now a matrix whose columns each correspond to the columns in 'frame' >>> as you have it after do.call(...) >>> >>> >>> Are you sure you want to divide the values by their (possibly negative) >>> means?? >>> >>> HTH, >>> >>> Chuck >>> >>> >>> >>>> frame<-frame[order(frame$group),] >>>> names.used<-names(frame)[2:length(frame)] >>>> set.seed(1234) >>>> for(i in names.used){ >>>> i.for.NA<-sample(1:100,60) >>>> frame[[i]][i.for.NA]<-NA >>>> } >>>> frame >>>> >>>> ### Code that does what's needed but is too slow: >>>> Start<-Sys.time() >>>> frame <- do.call(cbind, lapply(names.used, function(x){ >>>> unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T))) >>>> })) >>>> Finish<-Sys.time() >>>> print(Finish-Start) # Takes too long >>>> >>>> -- >>>> Dimitri Liakhovitski >>>> Ninah.com >>>> dimitri.liakhovit...@ninah.com >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> Charles C. Berry (858) 534-2098 >>> Dept of Family/Preventive >>> Medicine >>> E mailto:cbe...@tajo.ucsd.edu UC San Diego >>> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 >>> >>> >>> >> >> >> >> -- >> Dimitri Liakhovitski >> Ninah.com >> dimitri.liakhovit...@ninah.com >> > > > > -- > Dimitri Liakhovitski > Ninah.com > dimitri.liakhovit...@ninah.com >
-- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.