On Thu, 2013-12-19 at 20:37 -0500, Duncan Murdoch wrote: > On 13-12-19 6:37 PM, Ross Boylan wrote: > > My code seems to be spending most of its time in assignment statements, > > in some cases simple assignment of a model frame or model matrix. > > > > Can anyone provide any insights into what's going on, or how to speed > > things up? > > You are seeing a lot of time being spent on complex assignments. For > example, line 158 is > > data(sims.c1[[k]]) <- sp > > That makes a function call to `data<-` to do the assignment, and that > could be slow. Since it's an S4 method there's a bunch of machinery > involved in dispatching it; most of that would not have line number > information, so it'll be charged to that line. > > I can't really suggest how to speed it up. > > Duncan Murdoch Simply reexpressing the same computations without assigning to S4 slots or dispatching using S4 cut the execution time to under 50% of what it had been.
I attempted to recreate this with a small bit of code, but got nothing like the effect in the original. Here's the example; v1 is like my original code and v2 is like the restructured code: slow <- setClass("slow", representation(form="formula", data="data.frame", mm="matrix")) slowb <- setClass("slowb", representation(form="formula", data="data.frame", mm="matrix")) slowc <- setClass("slowc", representation(form="formula", data="data.frame", mm="matrix")) # original had only 3 classes # fake data. Using my real data didn't seem to affect relative times much. # real data had ~ 100 columns in model matrix mydata <- data.frame(a=rnorm(1500), b=rnorm(1500), c=rnorm(1500),d=as.factor(rep(c("x", "y", "z"), 500))) myformula <- ~a*d + b*d + c*d mymm <- model.matrix(myformula, mydata) if (!isGeneric("putData<-")) setGeneric("putData<-", function(obj, value) standardGeneric("putData<-")) setMethod("putData<-", c("slow", "data.frame"), function(obj, value){ myf <- model.frame(obj@form, value) obj@data <- myf obj@mm <- model.matrix(obj@form, myf) obj } ) setMethod("putData<-", c("slowb", "data.frame"), function(obj, value){ myf <- model.frame(obj@form, value) obj@data <- myf obj@mm <- model.matrix(obj@form, myf) obj } ) setMethod("putData<-", c("slowc", "data.frame"), function(obj, value){ myf <- model.frame(obj@form, value) obj@data <- myf obj@mm <- model.matrix(obj@form, myf) obj } ) v1 <- function(n) { s <- list(slow(form=myformula, mm=mymm), 1:5) for (i in 1:n) { mydata$b <- rnorm(nrow(mydata)) putData(s[[1]]) <- mydata } mm <- s[[1]]@mm } # v2 eliminates the dispatch on putData and the assignment # to the S4 slot v2 <- function(n) { s <- slow(form=myformula, mm=mymm) for (i in 1:n) { mydata$b <- rnorm(nrow(mydata)) myf <- model.frame(s@form, mydata) mm <- model.matrix(s@form, myf) } mm } > system.time(r <- v1(100)) > > > user system elapsed 0.304 0.000 0.307 > system.time(r <- v2(100)) > > > user system elapsed 0.26 0.00 0.26 Ross Boylan > > > > > For starters, is it possible that the reports are not accurate, or that > > I am misreading them. In R 3.0.1 (running under ESS): > > > Rprof(line.profiling=TRUE) > > > system.time(r <- totalEffect(dodata[[1]], dodata[[2]], 1:3, 4)) > > user system elapsed > > 21.629 0.756 22.469 > > !> Rprof(NULL) > > > summaryRprof(lines="both") > > $by.self > > self.time self.pct total.time total.pct > > box.R#158 6.74 29.56 13.06 57.28 > > simulator.multinomial.R#64 2.92 12.81 2.96 12.98 > > simulator.multinomial.R#63 2.76 12.11 2.76 12.11 > > box.R#171 2.54 11.14 5.08 22.28 > > simulator.d1.R#70 0.98 4.30 0.98 4.30 > > simulator.d1.R#71 0.98 4.30 0.98 4.30 > > densMap.R#42 0.72 3.16 0.86 3.77 > > "standardGeneric" 0.52 2.28 11.30 49.56 > > ...... > > > > Here's some of the code, with comments at the line numbers > > box.R: > > sp <- merge(sexpartner, data, by="studyidx") > > sp$y <- numFactor(sp$pEthnic) #I think y is not used but > > must be present > > data(sims.c1[[k]]) <- sp ###<<<<< line 158 > > sp0 <- sp > > sp <- sim(sims.c1[[k]], i) > > ctable[[k]] <- update.c1(ctable[[k]], sp) > > if (is.null(i.c1.in)) { > > i.c1.in <- match("pEthnic", colnames(sp0)) > > i.c1.out <- match(c("studyidx", "n", "pEthnic"), > > colnames(sp)) > > } > > sp0 <- merge(sp0[,-i.c1.in], sp[,i.c1.out], > > by=c("studyidx", "n")) > > # d1 > > sp0 <- sp0[sp0$pIsMale == 1,] > > # avoid lots of conversion warnings > > sp0$pEthnic <- factor(sp0$pEthnic, levels=partRaceLevels) > > data(sims.d1[[k]]) <- sp0 ###<<<<< line 171 > > sp <- sim(sims.d1[[k]], i) > > dtable[[k]] <- update.d1(dtable[[k]], sp) > > rngstate[[k]] <- .Random.seed > > The timing seems odd since it doesn't appear there's anything to do at > > the 2 lines except invoke data<-, but if that's slow I would expect the > > time to go to the data<- function (in a different file) and not to the > > call. > > > > In fact the other big time items are inside the data<- functions. > > simulator.multinomial.R: > > > > setMethod("data<-", c("simulator.multinomial", "data.frame"), > > function(obj, value) { > > mf <- model.frame(obj@dataFormula, data=value) > > mf$iCluster <- fromOrig(obj@idmap, as.character(mf$studyidx)) > > if (any(is.na(mf$iCluster))) > > stop("New studyidx--need to draw from meta distn") > > mm <- model.matrix(obj@modelFormula, data=mf) > > obj@data <- mf ##<<< line 63 > > obj@mm <- mm ##<<< line 64 > > return(obj) > > }) > > > > The mm and data slots have type restrictions, but no other validation > > tests. > > setClass("simulator.multinomial", > > representation(fit="stanfit", idmap="sIDMap", > > modelFormula="formula", > > categories="ANY", # could be factor or character > > # categories should be in the > > order of their numeric codes in y > > # cached results > > coef="list", > > data="data.frame", > > dataFormula="formula", > > mm="matrix")) > > Does it matter that, e.g., a model frame is more than a vanilla data frame? > > > > I thought assignment, given R's lazy copying behavior, was essentially > > resetting a pointer, and so should be fast. > > > > Or maybe the time is going to garbage collecting the previous contents > > of the slots? > > > > Ross Boylan > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.