On 13-12-19 6:37 PM, Ross Boylan wrote:
My code seems to be spending most of its time in assignment statements,
in some cases simple assignment of a model frame or model matrix.
Can anyone provide any insights into what's going on, or how to speed
things up?
You are seeing a lot of time being spent on complex assignments. For
example, line 158 is
data(sims.c1[[k]]) <- sp
That makes a function call to `data<-` to do the assignment, and that
could be slow. Since it's an S4 method there's a bunch of machinery
involved in dispatching it; most of that would not have line number
information, so it'll be charged to that line.
I can't really suggest how to speed it up.
Duncan Murdoch
For starters, is it possible that the reports are not accurate, or that
I am misreading them. In R 3.0.1 (running under ESS):
> Rprof(line.profiling=TRUE)
> system.time(r <- totalEffect(dodata[[1]], dodata[[2]], 1:3, 4))
user system elapsed
21.629 0.756 22.469
!> Rprof(NULL)
> summaryRprof(lines="both")
$by.self
self.time self.pct total.time total.pct
box.R#158 6.74 29.56 13.06 57.28
simulator.multinomial.R#64 2.92 12.81 2.96 12.98
simulator.multinomial.R#63 2.76 12.11 2.76 12.11
box.R#171 2.54 11.14 5.08 22.28
simulator.d1.R#70 0.98 4.30 0.98 4.30
simulator.d1.R#71 0.98 4.30 0.98 4.30
densMap.R#42 0.72 3.16 0.86 3.77
"standardGeneric" 0.52 2.28 11.30 49.56
......
Here's some of the code, with comments at the line numbers
box.R:
sp <- merge(sexpartner, data, by="studyidx")
sp$y <- numFactor(sp$pEthnic) #I think y is not used but must
be present
data(sims.c1[[k]]) <- sp ###<<<<< line 158
sp0 <- sp
sp <- sim(sims.c1[[k]], i)
ctable[[k]] <- update.c1(ctable[[k]], sp)
if (is.null(i.c1.in)) {
i.c1.in <- match("pEthnic", colnames(sp0))
i.c1.out <- match(c("studyidx", "n", "pEthnic"),
colnames(sp))
}
sp0 <- merge(sp0[,-i.c1.in], sp[,i.c1.out], by=c("studyidx",
"n"))
# d1
sp0 <- sp0[sp0$pIsMale == 1,]
# avoid lots of conversion warnings
sp0$pEthnic <- factor(sp0$pEthnic, levels=partRaceLevels)
data(sims.d1[[k]]) <- sp0 ###<<<<< line 171
sp <- sim(sims.d1[[k]], i)
dtable[[k]] <- update.d1(dtable[[k]], sp)
rngstate[[k]] <- .Random.seed
The timing seems odd since it doesn't appear there's anything to do at
the 2 lines except invoke data<-, but if that's slow I would expect the
time to go to the data<- function (in a different file) and not to the
call.
In fact the other big time items are inside the data<- functions.
simulator.multinomial.R:
setMethod("data<-", c("simulator.multinomial", "data.frame"),
function(obj, value) {
mf <- model.frame(obj@dataFormula, data=value)
mf$iCluster <- fromOrig(obj@idmap, as.character(mf$studyidx))
if (any(is.na(mf$iCluster)))
stop("New studyidx--need to draw from meta distn")
mm <- model.matrix(obj@modelFormula, data=mf)
obj@data <- mf ##<<< line 63
obj@mm <- mm ##<<< line 64
return(obj)
})
The mm and data slots have type restrictions, but no other validation
tests.
setClass("simulator.multinomial",
representation(fit="stanfit", idmap="sIDMap",
modelFormula="formula",
categories="ANY", # could be factor or character
# categories should be in the order of
their numeric codes in y
# cached results
coef="list",
data="data.frame",
dataFormula="formula",
mm="matrix"))
Does it matter that, e.g., a model frame is more than a vanilla data frame?
I thought assignment, given R's lazy copying behavior, was essentially
resetting a pointer, and so should be fast.
Or maybe the time is going to garbage collecting the previous contents
of the slots?
Ross Boylan
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.