On 13-12-19 6:37 PM, Ross Boylan wrote:
My code seems to be spending most of its time in assignment statements,
in some cases simple assignment of a model frame or model matrix.

Can anyone provide any insights into what's going on, or how to speed
things up?

You are seeing a lot of time being spent on complex assignments. For example, line 158 is

data(sims.c1[[k]]) <- sp

That makes a function call to `data<-` to do the assignment, and that could be slow. Since it's an S4 method there's a bunch of machinery involved in dispatching it; most of that would not have line number information, so it'll be charged to that line.

I can't really suggest how to speed it up.

Duncan Murdoch


For starters, is it possible that the reports are not accurate, or that
I am misreading them.  In R 3.0.1 (running under ESS):
  > Rprof(line.profiling=TRUE)
  > system.time(r <- totalEffect(dodata[[1]], dodata[[2]], 1:3, 4))
     user  system elapsed
   21.629   0.756  22.469
!> Rprof(NULL)
  > summaryRprof(lines="both")
  $by.self
                             self.time self.pct total.time total.pct
  box.R#158                       6.74    29.56      13.06     57.28
  simulator.multinomial.R#64      2.92    12.81       2.96     12.98
  simulator.multinomial.R#63      2.76    12.11       2.76     12.11
  box.R#171                       2.54    11.14       5.08     22.28
  simulator.d1.R#70               0.98     4.30       0.98      4.30
  simulator.d1.R#71               0.98     4.30       0.98      4.30
  densMap.R#42                    0.72     3.16       0.86      3.77
  "standardGeneric"               0.52     2.28      11.30     49.56
......

Here's some of the code, with comments at the line numbers
box.R:
                 sp <- merge(sexpartner, data, by="studyidx")
                 sp$y <- numFactor(sp$pEthnic)  #I think y is not used but must 
be present
                 data(sims.c1[[k]]) <- sp    ###<<<<< line 158
                 sp0 <- sp
                 sp <- sim(sims.c1[[k]], i)
                 ctable[[k]] <- update.c1(ctable[[k]], sp)
                 if (is.null(i.c1.in)) {
                     i.c1.in <- match("pEthnic", colnames(sp0))
                     i.c1.out <- match(c("studyidx", "n", "pEthnic"), 
colnames(sp))
                 }
                 sp0 <- merge(sp0[,-i.c1.in], sp[,i.c1.out], by=c("studyidx", 
"n"))
                 # d1
                 sp0 <- sp0[sp0$pIsMale == 1,]
                 # avoid lots of conversion warnings
                 sp0$pEthnic <- factor(sp0$pEthnic, levels=partRaceLevels)
                 data(sims.d1[[k]]) <- sp0    ###<<<<< line 171
                 sp <- sim(sims.d1[[k]], i)
                 dtable[[k]] <- update.d1(dtable[[k]], sp)
                 rngstate[[k]] <- .Random.seed
The timing seems odd since it doesn't appear there's anything to do at
the 2 lines except invoke data<-, but if that's slow I would expect the
time to go to the data<- function (in a different file) and not to the
call.

In fact the other big time items are inside the data<- functions.
simulator.multinomial.R:

    setMethod("data<-", c("simulator.multinomial", "data.frame"),
           function(obj, value) {
     mf <- model.frame(obj@dataFormula, data=value)
     mf$iCluster <- fromOrig(obj@idmap, as.character(mf$studyidx))
     if (any(is.na(mf$iCluster)))
         stop("New studyidx--need to draw from meta distn")
     mm <- model.matrix(obj@modelFormula, data=mf)
     obj@data <- mf  ##<<< line 63
     obj@mm <- mm    ##<<< line 64
     return(obj)
})

The mm and data slots have type restrictions, but no other validation
tests.
setClass("simulator.multinomial",
          representation(fit="stanfit", idmap="sIDMap",
                         modelFormula="formula",
                         categories="ANY",  # could be factor or character
                                         # categories should be in the order of 
their numeric codes in y
                         # cached results
                         coef="list",
                         data="data.frame",
                         dataFormula="formula",
                         mm="matrix"))
Does it matter that, e.g., a model frame is more than a vanilla data frame?

I thought assignment, given R's lazy copying behavior, was essentially
resetting a pointer, and so should be fast.

Or maybe the time is going to garbage collecting the previous contents
of the slots?

Ross Boylan

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to