Hello,

(This is follow up from this thread: http://www.nabble.com/execution-time-of-.packages-td22304833.html but with a different focus)

I am often confused by the result of the profiler, when a loop is involved. Consider these two scripts:

script1:

Rprof( )
x <- numeric( )
  for( i in 1:10000){
    x <- c( x, rnorm(10) )
  }
Rprof( NULL )
print( summaryRprof( ) )


script2:

Rprof( )
ffor <- function(){
  x <- numeric( )
  for( i in 1:10000){
    x <- c( x, rnorm(10) )
  }
}
ffor()
Rprof( NULL )
print( summaryRprof( ) )


[]$ time Rscript --vanilla script1.R
$by.self
      self.time self.pct total.time total.pct
"rnorm"      0.22      100       0.22       100

$by.total
      total.time total.pct self.time self.pct
"rnorm"       0.22       100      0.22      100

$sampling.time
[1] 0.22

real    0m7.786s
user    0m5.192s
sys     0m0.735s

[]$$ time Rscript --vanilla script2.R
$by.self
      self.time self.pct total.time total.pct
"ffor"       4.94     92.5       5.34     100.0
"rnorm"      0.40      7.5       0.40       7.5

$by.total
      total.time total.pct self.time self.pct
"ffor"        5.34     100.0      4.94     92.5
"rnorm"       0.40       7.5      0.40      7.5

$sampling.time
[1] 5.34


real    0m7.841s
user    0m5.152s
sys     0m0.712s



In the first one, I call a for loop from the top level and in the second one, the loop is wrapped in a function call. This shows the inability of the profiler to point loops as responsible for bottlenecks. The coder of script1 would not know what to do to improve on the script.

I have had a quick look in the code, and here are a few thoughts:

in the function "doprof" in eval.c, this loop write the call stack on the profiler file:

for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
  if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN))
      && TYPEOF(cptr->call) == LANGSXP) {
      SEXP fun = CAR(cptr->call);
      if (!newline) newline = 1;
      fprintf(R_ProfileOutfile, "\"%s\" ",
          TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
          "<Anonymous>");
  }
  }
so we can see it only cares about context CTXT_FUNCTION and CTXT_BUILTIN, when for loops play with CTXT_LOOP (this is again in eval.c within the do_for function)

begincontext(&cntxt, CTXT_LOOP, R_NilValue, rho, R_BaseEnv, R_NilValue,
       R_NilValue);

which as the name implies, begins the context of the for loop. The begincontext function looks like this :

void begincontext(RCNTXT * cptr, int flags,
        SEXP syscall, SEXP env, SEXP sysp,
        SEXP promargs, SEXP callfun)
{
  cptr->nextcontext = R_GlobalContext;
  cptr->cstacktop = R_PPStackTop;
  cptr->evaldepth = R_EvalDepth;
  cptr->callflag = flags;
  cptr->call = syscall;
  cptr->cloenv = env;
  cptr->sysparent = sysp;
  cptr->conexit = R_NilValue;
  cptr->cend = NULL;
  cptr->promargs = promargs;
  cptr->callfun = callfun;
  cptr->vmax = vmaxget();
  cptr->intsusp = R_interrupts_suspended;
  cptr->handlerstack = R_HandlerStack;
  cptr->restartstack = R_RestartStack;
  cptr->prstack = R_PendingPromises;
#ifdef BYTECODE
  cptr->nodestack = R_BCNodeStackTop;
# ifdef BC_INT_STACK
  cptr->intstack = R_BCIntStackTop;
# endif
#endif
  R_GlobalContext = cptr;
}


So it could be possible to set the last argument of the begincontext function to "for" and use this code in the doprof function:


for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
if ( ( cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN ) )
      && TYPEOF(cptr->call) == LANGSXP) {
      SEXP fun = CAR(cptr->call);
      if (!newline) newline = 1;
      fprintf(R_ProfileOutfile, "\"%s\" ",
          TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
          "<Anonymous>");
  } else if( cptr->callflag & CTXT_LOOP){
    SEXP fun = CAR(cptr->syscall);
    if (!newline) newline = 1;
    fprintf(R_ProfileOutfile, "\"%s\" ", CHAR(PRINTNAME(fun)) );
  }
}

so that we see for in the list of "functions" that appear in the profiler file.

Obviously I am taking some shortcuts here, because of the other loops, but I would like to make a formal patch with this. Before I do that, I'd like to know : - is this has a chance of breaking something else (does the CTXT_LOOP being R_NilValue is used elsewhere)
- would this feature be welcome.
- Should I differentiate real functions with loops in the output file, maybe I can write "[for]" instead of for to emphacize this is not a function.

Romain

--
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr


--
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to