[Rd] execution time of .packages

2009-03-03 Thread Romain Francois

Hello,

The first time in a session I call .packages( all.available = T ), it 
takes a long time (I have many packages installed from CRAN):


> system.time( packs <- .packages( all = T ) )
 user  system elapsed
0.738   0.276  43.787

When I call it again, the time is now much reduced, so there must be 
some caching somewhere. I would like to try to reduce the  time it takes 
the first time, but I have not been able to identify where the caching 
takes place, and so how I can remove it to try to improve the running 
time without the caching. Without this, I have to restart my computer 
each time to vanish the caching to test a new version of the function 
(this is not going to happen)


Here is the .packages function, I am suspicious about this part : "ans 
<- c(ans, nam)" which grows the ans vector each time a suitable package 
is found, this does not sound right.


> .packages
function (all.available = FALSE, lib.loc = NULL)
{   
   if (is.null(lib.loc))   
   lib.loc <- .libPaths()  
   if (all.available) {
   ans <- character(0L)
   lib.loc <- lib.loc[file.exists(lib.loc)]
   valid_package_version_regexp <- 
.standard_regexps()$valid_package_version 

   for (lib in lib.loc) 
{
   a <- list.files(lib, all.files = FALSE, full.names = 
FALSE)   
   for (nam in a) 
{  
   pfile <- file.path(lib, nam, "Meta", 
"package.rds")   
   if 
(file.exists(pfile))   
 info <- 
.readRDS(pfile)$DESCRIPTION[c("Package",
   
"Version")]   
   else 
next 
   if ((length(info) != 2L) || any(is.na(info)))

 next
   if (!grepl(valid_package_version_regexp, info["Version"]))
 next
   ans <- c(ans, nam)   ## suspicious about this
   }
   }
   return(unique(ans))
   }
   s <- search()
   return(invisible(substring(s[substr(s, 1L, 8L) == "package:"],
   9)))
}


> version
  _
platform   i686-pc-linux-gnu
arch   i686
os linux-gnu
system i686, linux-gnu
status Under development (unstable)
major  2
minor  9.0
year   2009
month  02
day08
svn rev47879
language   R
version.string R version 2.9.0 Under development (unstable) (2009-02-08 
r47879)



--
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] execution time of .packages

2009-03-03 Thread Prof Brian Ripley
The caching is in the disc system: you need to find and read the 
package metadata for every package.  AFAIK it is not easy to flush the 
disc cache, but quite easy to overwrite it with later reads.  (Google 
for more info.)


If you are not concerned about validity of the installed packages you 
could skip the tests and hence the reads.


Your times are quite a bit slower than mine, so a faster disc system 
might help.  Since my server has just been rebooted (for a new 
kernel), with all of CRAN and most of BioC I get



system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.518   0.262  25.042

system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.442   0.080   0.522

length(packs)

[1] 2096

There's a similar issue when installing packages: the Perl code reads 
the indices from every visible package to resolve links, and that can 
be slow the first time.



On Tue, 3 Mar 2009, Romain Francois wrote:


Hello,

The first time in a session I call .packages( all.available = T ), it takes a 
long time (I have many packages installed from CRAN):



system.time( packs <- .packages( all = T ) )

user  system elapsed
0.738   0.276  43.787

When I call it again, the time is now much reduced, so there must be some 
caching somewhere. I would like to try to reduce the  time it takes the first 
time, but I have not been able to identify where the caching takes place, and 
so how I can remove it to try to improve the running time without the 
caching. Without this, I have to restart my computer each time to vanish the 
caching to test a new version of the function (this is not going to happen)


Here is the .packages function, I am suspicious about this part : "ans <- 
c(ans, nam)" which grows the ans vector each time a suitable package is 
found, this does not sound right.


It's OK as there are only going to be ca 2000 packages.  Try 
profiling this: .readRDS and grepl take most of the time.



.packages

function (all.available = FALSE, lib.loc = NULL)
{  if (is.null(lib.loc)) 
lib.loc <- .libPaths() if (all.available) { 
ans <- character(0L)   lib.loc <- 
lib.loc[file.exists(lib.loc)]
  valid_package_version_regexp <- 
.standard_regexps()$valid_package_version
  for (lib in lib.loc) { 
a <- list.files(lib, all.files = FALSE, full.names = FALSE) 
for (nam in a) { 
pfile <- file.path(lib, nam, "Meta", "package.rds") 
if (file.exists(pfile)) 
info <- .readRDS(pfile)$DESCRIPTION[c("Package", 
"Version")]  else 
nextif 
((length(info) != 2L) || any(is.na(info)))

next
  if (!grepl(valid_package_version_regexp, info["Version"]))
next
  ans <- c(ans, nam)   ## suspicious about this
  }
  }
  return(unique(ans))
  }
  s <- search()
  return(invisible(substring(s[substr(s, 1L, 8L) == "package:"],
  9)))
}



version

 _
platform   i686-pc-linux-gnu
arch   i686
os linux-gnu
system i686, linux-gnu
status Under development (unstable)
major  2
minor  9.0
year   2009
month  02
day08
svn rev47879
language   R
version.string R version 2.9.0 Under development (unstable) (2009-02-08 
r47879)



--
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] execution time of .packages

2009-03-03 Thread Romain Francois

Prof Brian Ripley wrote:
The caching is in the disc system: you need to find and read the 
package metadata for every package.  AFAIK it is not easy to flush the 
disc cache, but quite easy to overwrite it with later reads.  (Google 
for more info.)

Thanks for the info, I'll try to find my way with these directions.

If you are not concerned about validity of the installed packages you 
could skip the tests and hence the reads.


Your times are quite a bit slower than mine, so a faster disc system 
might help.  Since my server has just been rebooted (for a new 
kernel), with all of CRAN and most of BioC I get



system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.518   0.262  25.042

system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.442   0.080   0.522

length(packs)

[1] 2096

There's a similar issue when installing packages: the Perl code reads 
the indices from every visible package to resolve links, and that can 
be slow the first time.



On Tue, 3 Mar 2009, Romain Francois wrote:


Hello,

The first time in a session I call .packages( all.available = T ), it 
takes a long time (I have many packages installed from CRAN):



system.time( packs <- .packages( all = T ) )

user  system elapsed
0.738   0.276  43.787

When I call it again, the time is now much reduced, so there must be 
some caching somewhere. I would like to try to reduce the  time it 
takes the first time, but I have not been able to identify where the 
caching takes place, and so how I can remove it to try to improve the 
running time without the caching. Without this, I have to restart my 
computer each time to vanish the caching to test a new version of the 
function (this is not going to happen)


Here is the .packages function, I am suspicious about this part : 
"ans <- c(ans, nam)" which grows the ans vector each time a suitable 
package is found, this does not sound right.


It's OK as there are only going to be ca 2000 packages.  Try profiling 
this: .readRDS and grepl take most of the time.
I usually do not trust the result of the profiler when a for loop is 
involved, as it tends to miss the point (or maybe I am).


Consider this script below, the profiler reports 0.22 seconds when the 
actual time spent is about 6 seconds,  and would blame rnorm as the 
bottleneck when the inefficiency is in with growing the data structure.


Rprof( )
x <- numeric( )
for( i in 1:1){
 x <- c( x, rnorm(10) )
}
Rprof( NULL )
print( summaryRprof( ) )

$ time Rscript --vanilla profexample.R
$by.self
   self.time self.pct total.time total.pct
"rnorm"  0.22  100   0.22   100

$by.total
   total.time total.pct self.time self.pct
"rnorm"   0.22   100  0.22  100

$sampling.time
[1] 0.22

real0m6.164s
user0m5.156s
sys 0m0.737s

$ time Rscript --vanilla -e "rnorm(10)"
[1]  0.836411851  1.762081444  1.076305644  2.063515383  0.643254750
[6]  1.698620443 -1.774479062 -0.432886214 -0.007949533  0.284089832

real0m0.224s
user0m0.187s
sys 0m0.024s


Now, if i replace the for loop with a similar silly lapply construct, 
profiler tells me a rather different story:


Rprof( )
x <- numeric( )
y <- lapply( 1:1, function(i){
   x <<- c( x, rnorm(10) )
   NULL
} )
Rprof( NULL )
print( summaryRprof( ) )

$ time Rscript --vanilla prof2.R
$by.self   
self.time self.pct total.time total.pct   
"FUN" 6.48 96.1   6.68  99.1   
"rnorm"   0.20  3.0   0.20   3.0

"lapply"  0.06  0.9   6.74 100.0

$by.total
total.time total.pct self.time self.pct
"lapply"   6.74 100.0  0.06  0.9
"FUN"  6.68  99.1  6.48 96.1
"rnorm"0.20   3.0  0.20  3.0

$sampling.time
[1] 6.74

real0m8.352s
user0m4.762s
sys 0m2.574s

Or let us wrap the for loop of the first example in a function:

Rprof( )
x <- numeric( )
ffor <- function(){
   for( i in 1:1){
 x <- c( x, rnorm(10) )
   }
}
ffor()
Rprof( NULL )
print( summaryRprof( ) )
 


$ time Rscript --vanilla prof3.R
$by.self
   self.time self.pct total.time total.pct
"ffor"5.4 96.45.6 100.0
"rnorm"   0.2  3.60.2   3.6

$by.total
   total.time total.pct self.time self.pct
"ffor" 5.6 100.0   5.4 96.4
"rnorm"0.2   3.6   0.2  3.6

$sampling.time
[1] 5.6

real0m6.379s
user0m5.408s
sys 0m0.717s



Maybe I get this all wrong, maybe the global assignment operator is 
responsible for some of the time in the second example. But how can I 
analyse the result of profiler in the first example when it seems to 
only be interested in the .22 seconds when I want to know what is going 
on with the rest of the time.


Is it possible to treat "for" as a function when writing the profiler 
data so that I can trust it more ?


Romain

--
Romain Francois
Independent R Consulta

Re: [Rd] execution time of .packages

2009-03-03 Thread Prof Brian Ripley

On Tue, 3 Mar 2009, Romain Francois wrote:


Prof Brian Ripley wrote:
The caching is in the disc system: you need to find and read the package 
metadata for every package.  AFAIK it is not easy to flush the disc cache, 
but quite easy to overwrite it with later reads.  (Google for more info.)

Thanks for the info, I'll try to find my way with these directions.

If you are not concerned about validity of the installed packages you could 
skip the tests and hence the reads.


Your times are quite a bit slower than mine, so a faster disc system might 
help.  Since my server has just been rebooted (for a new kernel), with all 
of CRAN and most of BioC I get



system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.518   0.262  25.042

system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.442   0.080   0.522

length(packs)

[1] 2096

There's a similar issue when installing packages: the Perl code reads the 
indices from every visible package to resolve links, and that can be slow 
the first time.



On Tue, 3 Mar 2009, Romain Francois wrote:


Hello,

The first time in a session I call .packages( all.available = T ), it 
takes a long time (I have many packages installed from CRAN):



system.time( packs <- .packages( all = T ) )

user  system elapsed
0.738   0.276  43.787

When I call it again, the time is now much reduced, so there must be some 
caching somewhere. I would like to try to reduce the  time it takes the 
first time, but I have not been able to identify where the caching takes 
place, and so how I can remove it to try to improve the running time 
without the caching. Without this, I have to restart my computer each time 
to vanish the caching to test a new version of the function (this is not 
going to happen)


Here is the .packages function, I am suspicious about this part : "ans <- 
c(ans, nam)" which grows the ans vector each time a suitable package is 
found, this does not sound right.


It's OK as there are only going to be ca 2000 packages.  Try profiling 
this: .readRDS and grepl take most of the time.


I usually do not trust the result of the profiler when a for loop is 
involved, as it tends to miss the point (or maybe I am).


Here are the data for the actual example (repeated for this message):


Rprof()
system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.447   0.078   0.525

Rprof(NULL)
summaryRprof()

$by.self
   self.time self.pct total.time total.pct
"grepl" 0.18 34.6   0.18  34.6
".readRDS"  0.12 23.1   0.20  38.5
".packages" 0.08 15.4   0.50  96.2
"close.connection"  0.04  7.7   0.04   7.7
"close" 0.02  3.8   0.06  11.5
"file.exists"   0.02  3.8   0.02   3.8
"gc"0.02  3.8   0.02   3.8
"gzfile"0.02  3.8   0.02   3.8
"list"  0.02  3.8   0.02   3.8
"system.time"   0.00  0.0   0.52 100.0
"file.path" 0.00  0.0   0.02   3.8

$by.total
   total.time total.pct self.time self.pct
"system.time"0.52 100.0  0.00  0.0
".packages"  0.50  96.2  0.08 15.4
".readRDS"   0.20  38.5  0.12 23.1
"grepl"  0.18  34.6  0.18 34.6
"close"  0.06  11.5  0.02  3.8
"close.connection"   0.04   7.7  0.04  7.7
"file.exists"0.02   3.8  0.02  3.8
"gc" 0.02   3.8  0.02  3.8
"gzfile" 0.02   3.8  0.02  3.8
"list"   0.02   3.8  0.02  3.8
"file.path"  0.02   3.8  0.00  0.0

$sampling.time
[1] 0.52

there is little tiime unaccounted for, and 0.38 sec is going in 
.readRDS and grepl.  Whereas


system.time({
ans <- character(0)
for(i in 1:2096) ans <- c(ans, "foo")
})

takes 0.024 secs, negligible here (one profiler tick).


Consider this script below,


Whether profiling works in other examples is beside the point here.

[...]

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] execution time of .packages

2009-03-03 Thread Romain Francois

Prof Brian Ripley wrote:

On Tue, 3 Mar 2009, Romain Francois wrote:


Prof Brian Ripley wrote:
The caching is in the disc system: you need to find and read the 
package metadata for every package.  AFAIK it is not easy to flush 
the disc cache, but quite easy to overwrite it with later reads.  
(Google for more info.)

Thanks for the info, I'll try to find my way with these directions.

If you are not concerned about validity of the installed packages 
you could skip the tests and hence the reads.


Your times are quite a bit slower than mine, so a faster disc system 
might help.  Since my server has just been rebooted (for a new 
kernel), with all of CRAN and most of BioC I get



system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.518   0.262  25.042

system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.442   0.080   0.522

length(packs)

[1] 2096

There's a similar issue when installing packages: the Perl code 
reads the indices from every visible package to resolve links, and 
that can be slow the first time.



On Tue, 3 Mar 2009, Romain Francois wrote:


Hello,

The first time in a session I call .packages( all.available = T ), 
it takes a long time (I have many packages installed from CRAN):



system.time( packs <- .packages( all = T ) )

user  system elapsed
0.738   0.276  43.787

When I call it again, the time is now much reduced, so there must 
be some caching somewhere. I would like to try to reduce the  time 
it takes the first time, but I have not been able to identify where 
the caching takes place, and so how I can remove it to try to 
improve the running time without the caching. Without this, I have 
to restart my computer each time to vanish the caching to test a 
new version of the function (this is not going to happen)


Here is the .packages function, I am suspicious about this part : 
"ans <- c(ans, nam)" which grows the ans vector each time a 
suitable package is found, this does not sound right.


It's OK as there are only going to be ca 2000 packages.  Try 
profiling this: .readRDS and grepl take most of the time.


I usually do not trust the result of the profiler when a for loop is 
involved, as it tends to miss the point (or maybe I am).


Here are the data for the actual example (repeated for this message):


Rprof()
system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.447   0.078   0.525

Rprof(NULL)
summaryRprof()

$by.self
   self.time self.pct total.time total.pct
"grepl" 0.18 34.6   0.18  34.6
".readRDS"  0.12 23.1   0.20  38.5
".packages" 0.08 15.4   0.50  96.2
"close.connection"  0.04  7.7   0.04   7.7
"close" 0.02  3.8   0.06  11.5
"file.exists"   0.02  3.8   0.02   3.8
"gc"0.02  3.8   0.02   3.8
"gzfile"0.02  3.8   0.02   3.8
"list"  0.02  3.8   0.02   3.8
"system.time"   0.00  0.0   0.52 100.0
"file.path" 0.00  0.0   0.02   3.8

$by.total
   total.time total.pct self.time self.pct
"system.time"0.52 100.0  0.00  0.0
".packages"  0.50  96.2  0.08 15.4
".readRDS"   0.20  38.5  0.12 23.1
"grepl"  0.18  34.6  0.18 34.6
"close"  0.06  11.5  0.02  3.8
"close.connection"   0.04   7.7  0.04  7.7
"file.exists"0.02   3.8  0.02  3.8
"gc" 0.02   3.8  0.02  3.8
"gzfile" 0.02   3.8  0.02  3.8
"list"   0.02   3.8  0.02  3.8
"file.path"  0.02   3.8  0.00  0.0

$sampling.time
[1] 0.52

there is little tiime unaccounted for, and 0.38 sec is going in 
.readRDS and grepl.  Whereas


system.time({
ans <- character(0)
for(i in 1:2096) ans <- c(ans, "foo")
})

takes 0.024 secs, negligible here (one profiler tick).

Here is what happens to me if I restart the computer:

> Rprof( )
> system.time( packs <- .packages( all = T ) )
  user  system elapsed
 0.888   0.342  35.589
> Rprof(NULL)
> summaryRprof()
$by.self
  self.time self.pct total.time total.pct
".readRDS"  0.34 28.8   0.64  54.2
".packages" 0.14 11.9   1.16  98.3
"file.exists"   0.14 11.9   0.14  11.9
"gzfile"0.12 10.2   0.16  13.6
"close" 0.10  8.5   0.14  11.9
"grepl" 0.08  6.8   0.10   8.5
"$" 0.08  6.8   0.08   6.8
"file.path" 0.06  5.1   0.06   5.1
"close.connection"  0.04  3.4   0.04   3.4
"getOption" 0.02  1.7   0.04   3.4
"as.cha

[Rd] profiler and loops

2009-03-03 Thread Romain Francois

Hello,

(This is follow up from this thread: 
http://www.nabble.com/execution-time-of-.packages-td22304833.html but 
with a different focus)


I am often confused by the result of the profiler, when a loop is 
involved. Consider these two scripts:


script1:

Rprof( )
x <- numeric( )
  for( i in 1:1){
x <- c( x, rnorm(10) )
  }
Rprof( NULL )
print( summaryRprof( ) )


script2:

Rprof( )
ffor <- function(){
  x <- numeric( )
  for( i in 1:1){
x <- c( x, rnorm(10) )
  }
}
ffor()
Rprof( NULL )
print( summaryRprof( ) )


[]$ time Rscript --vanilla script1.R
$by.self
  self.time self.pct total.time total.pct
"rnorm"  0.22  100   0.22   100

$by.total
  total.time total.pct self.time self.pct
"rnorm"   0.22   100  0.22  100

$sampling.time
[1] 0.22

real0m7.786s
user0m5.192s
sys 0m0.735s

[]$$ time Rscript --vanilla script2.R
$by.self
  self.time self.pct total.time total.pct
"ffor"   4.94 92.5   5.34 100.0
"rnorm"  0.40  7.5   0.40   7.5

$by.total
  total.time total.pct self.time self.pct
"ffor"5.34 100.0  4.94 92.5
"rnorm"   0.40   7.5  0.40  7.5

$sampling.time
[1] 5.34


real0m7.841s
user0m5.152s
sys 0m0.712s



In the first one, I call a for loop from the top level and in the second 
one, the loop is wrapped in a function call. This shows the inability of 
the profiler to point loops as responsible for bottlenecks. The coder of 
script1 would not know what to do to improve on the script.


I have had a quick look in the code, and here are a few thoughts:

in the function "doprof" in eval.c,  this loop write the call stack on 
the profiler file:


for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
  if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN))
  && TYPEOF(cptr->call) == LANGSXP) {
  SEXP fun = CAR(cptr->call);
  if (!newline) newline = 1;
  fprintf(R_ProfileOutfile, "\"%s\" ",
  TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
  "");
  }
  }
 so we can see it only cares about context CTXT_FUNCTION and 
CTXT_BUILTIN, when for loops play with CTXT_LOOP (this is again in 
eval.c within the do_for function)


begincontext(&cntxt, CTXT_LOOP, R_NilValue, rho, R_BaseEnv, R_NilValue,
   R_NilValue);

which as the name implies, begins the context of the for loop. The 
begincontext function looks like this :


void begincontext(RCNTXT * cptr, int flags,
SEXP syscall, SEXP env, SEXP sysp,
SEXP promargs, SEXP callfun)
{
  cptr->nextcontext = R_GlobalContext;
  cptr->cstacktop = R_PPStackTop;
  cptr->evaldepth = R_EvalDepth;
  cptr->callflag = flags;
  cptr->call = syscall;
  cptr->cloenv = env;
  cptr->sysparent = sysp;
  cptr->conexit = R_NilValue;
  cptr->cend = NULL;
  cptr->promargs = promargs;
  cptr->callfun = callfun;
  cptr->vmax = vmaxget();
  cptr->intsusp = R_interrupts_suspended;
  cptr->handlerstack = R_HandlerStack;
  cptr->restartstack = R_RestartStack;
  cptr->prstack = R_PendingPromises;
#ifdef BYTECODE
  cptr->nodestack = R_BCNodeStackTop;
# ifdef BC_INT_STACK
  cptr->intstack = R_BCIntStackTop;
# endif
#endif
  R_GlobalContext = cptr;
}


So it could be possible to set the last argument of the begincontext 
function to "for" and use this code in the doprof function:



for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
if ( ( cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN ) )
  && TYPEOF(cptr->call) == LANGSXP) {
  SEXP fun = CAR(cptr->call);
  if (!newline) newline = 1;
  fprintf(R_ProfileOutfile, "\"%s\" ",
  TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
  "");
  } else if( cptr->callflag & CTXT_LOOP){
SEXP fun = CAR(cptr->syscall);
if (!newline) newline = 1;
fprintf(R_ProfileOutfile, "\"%s\" ", CHAR(PRINTNAME(fun)) );
  }
}

so that we see for in the list of "functions" that appear in the 
profiler file.


Obviously I am taking some shortcuts here, because of the other loops, 
but I would like to make a formal patch with this. Before I do that, I'd 
like to know :
- is this has a chance of breaking something else (does the CTXT_LOOP 
being R_NilValue is used elsewhere)

- would this feature be welcome.
- Should I differentiate real functions with loops in the output file, 
maybe I can write "[for]" instead of for to emphacize this is not a 
function.


Romain

--
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr


--
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] execution time of .packages

2009-03-03 Thread Prof Brian Ripley
Let me repeat: what is happening for me in the equivalent of your 
35.589 - 1.18 seconds is that R is waiting for my OS to read its discs 
(and they can be heard chuntering away).  As the R process is not 
runniing at those times, the profiler is not running either (on a 
Unix-alike: on Windows the profiler does measure elapsed time).  I 
expect it will be the same explanation for you.


What I have already suggested is that if you want to save time, do not 
read and check the package.rds files.   As far as I can see they were 
checked at installation in any recent version of R.  Just check their 
existence.


On Tue, 3 Mar 2009, Romain Francois wrote:


Prof Brian Ripley wrote:

On Tue, 3 Mar 2009, Romain Francois wrote:


Prof Brian Ripley wrote:
The caching is in the disc system: you need to find and read the package 
metadata for every package.  AFAIK it is not easy to flush the disc 
cache, but quite easy to overwrite it with later reads.  (Google for more 
info.)

Thanks for the info, I'll try to find my way with these directions.

If you are not concerned about validity of the installed packages you 
could skip the tests and hence the reads.


Your times are quite a bit slower than mine, so a faster disc system 
might help.  Since my server has just been rebooted (for a new kernel), 
with all of CRAN and most of BioC I get



system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.518   0.262  25.042

system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.442   0.080   0.522

length(packs)

[1] 2096

There's a similar issue when installing packages: the Perl code reads the 
indices from every visible package to resolve links, and that can be slow 
the first time.



On Tue, 3 Mar 2009, Romain Francois wrote:


Hello,

The first time in a session I call .packages( all.available = T ), it 
takes a long time (I have many packages installed from CRAN):



system.time( packs <- .packages( all = T ) )

user  system elapsed
0.738   0.276  43.787

When I call it again, the time is now much reduced, so there must be 
some caching somewhere. I would like to try to reduce the  time it takes 
the first time, but I have not been able to identify where the caching 
takes place, and so how I can remove it to try to improve the running 
time without the caching. Without this, I have to restart my computer 
each time to vanish the caching to test a new version of the function 
(this is not going to happen)


Here is the .packages function, I am suspicious about this part : "ans 
<- c(ans, nam)" which grows the ans vector each time a suitable package 
is found, this does not sound right.


It's OK as there are only going to be ca 2000 packages.  Try profiling 
this: .readRDS and grepl take most of the time.


I usually do not trust the result of the profiler when a for loop is 
involved, as it tends to miss the point (or maybe I am).


Here are the data for the actual example (repeated for this message):


Rprof()
system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.447   0.078   0.525

Rprof(NULL)
summaryRprof()

$by.self
   self.time self.pct total.time total.pct
"grepl" 0.18 34.6   0.18  34.6
".readRDS"  0.12 23.1   0.20  38.5
".packages" 0.08 15.4   0.50  96.2
"close.connection"  0.04  7.7   0.04   7.7
"close" 0.02  3.8   0.06  11.5
"file.exists"   0.02  3.8   0.02   3.8
"gc"0.02  3.8   0.02   3.8
"gzfile"0.02  3.8   0.02   3.8
"list"  0.02  3.8   0.02   3.8
"system.time"   0.00  0.0   0.52 100.0
"file.path" 0.00  0.0   0.02   3.8

$by.total
   total.time total.pct self.time self.pct
"system.time"0.52 100.0  0.00  0.0
".packages"  0.50  96.2  0.08 15.4
".readRDS"   0.20  38.5  0.12 23.1
"grepl"  0.18  34.6  0.18 34.6
"close"  0.06  11.5  0.02  3.8
"close.connection"   0.04   7.7  0.04  7.7
"file.exists"0.02   3.8  0.02  3.8
"gc" 0.02   3.8  0.02  3.8
"gzfile" 0.02   3.8  0.02  3.8
"list"   0.02   3.8  0.02  3.8
"file.path"  0.02   3.8  0.00  0.0

$sampling.time
[1] 0.52

there is little tiime unaccounted for, and 0.38 sec is going in .readRDS 
and grepl.  Whereas


system.time({
ans <- character(0)
for(i in 1:2096) ans <- c(ans, "foo")
})

takes 0.024 secs, negligible here (one profiler tick).

Here is what happens to me if I restart the computer:


Rprof( )
system.time( packs <- .packages( all = T ) )

 user  system elapsed
0.888   0.342  35.589

Rprof(NULL)
summaryRprof()

$by.self

Re: [Rd] execution time of .packages

2009-03-03 Thread Romain Francois

Prof Brian Ripley wrote:
Let me repeat: what is happening for me in the equivalent of your 
35.589 - 1.18 seconds is that R is waiting for my OS to read its discs 
(and they can be heard chuntering away).  As the R process is not 
runniing at those times, the profiler is not running either (on a 
Unix-alike: on Windows the profiler does measure elapsed time).  I 
expect it will be the same explanation for you.

Thank you. I get it this time.

What I have already suggested is that if you want to save time, do not 
read and check the package.rds files.   As far as I can see they were 
checked at installation in any recent version of R.  Just check their 
existence.


On Tue, 3 Mar 2009, Romain Francois wrote:


Prof Brian Ripley wrote:

On Tue, 3 Mar 2009, Romain Francois wrote:


Prof Brian Ripley wrote:
The caching is in the disc system: you need to find and read the 
package metadata for every package.  AFAIK it is not easy to flush 
the disc cache, but quite easy to overwrite it with later reads.  
(Google for more info.)

Thanks for the info, I'll try to find my way with these directions.

If you are not concerned about validity of the installed packages 
you could skip the tests and hence the reads.


Your times are quite a bit slower than mine, so a faster disc 
system might help.  Since my server has just been rebooted (for a 
new kernel), with all of CRAN and most of BioC I get



system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.518   0.262  25.042

system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.442   0.080   0.522

length(packs)

[1] 2096

There's a similar issue when installing packages: the Perl code 
reads the indices from every visible package to resolve links, and 
that can be slow the first time.



On Tue, 3 Mar 2009, Romain Francois wrote:


Hello,

The first time in a session I call .packages( all.available = T 
), it takes a long time (I have many packages installed from CRAN):



system.time( packs <- .packages( all = T ) )

user  system elapsed
0.738   0.276  43.787

When I call it again, the time is now much reduced, so there must 
be some caching somewhere. I would like to try to reduce the  
time it takes the first time, but I have not been able to 
identify where the caching takes place, and so how I can remove 
it to try to improve the running time without the caching. 
Without this, I have to restart my computer each time to vanish 
the caching to test a new version of the function (this is not 
going to happen)


Here is the .packages function, I am suspicious about this part : 
"ans <- c(ans, nam)" which grows the ans vector each time a 
suitable package is found, this does not sound right.


It's OK as there are only going to be ca 2000 packages.  Try 
profiling this: .readRDS and grepl take most of the time.


I usually do not trust the result of the profiler when a for loop 
is involved, as it tends to miss the point (or maybe I am).


Here are the data for the actual example (repeated for this message):


Rprof()
system.time( packs <- .packages( all = T ) )

   user  system elapsed
  0.447   0.078   0.525

Rprof(NULL)
summaryRprof()

$by.self
   self.time self.pct total.time total.pct
"grepl" 0.18 34.6   0.18  34.6
".readRDS"  0.12 23.1   0.20  38.5
".packages" 0.08 15.4   0.50  96.2
"close.connection"  0.04  7.7   0.04   7.7
"close" 0.02  3.8   0.06  11.5
"file.exists"   0.02  3.8   0.02   3.8
"gc"0.02  3.8   0.02   3.8
"gzfile"0.02  3.8   0.02   3.8
"list"  0.02  3.8   0.02   3.8
"system.time"   0.00  0.0   0.52 100.0
"file.path" 0.00  0.0   0.02   3.8

$by.total
   total.time total.pct self.time self.pct
"system.time"0.52 100.0  0.00  0.0
".packages"  0.50  96.2  0.08 15.4
".readRDS"   0.20  38.5  0.12 23.1
"grepl"  0.18  34.6  0.18 34.6
"close"  0.06  11.5  0.02  3.8
"close.connection"   0.04   7.7  0.04  7.7
"file.exists"0.02   3.8  0.02  3.8
"gc" 0.02   3.8  0.02  3.8
"gzfile" 0.02   3.8  0.02  3.8
"list"   0.02   3.8  0.02  3.8
"file.path"  0.02   3.8  0.00  0.0

$sampling.time
[1] 0.52

there is little tiime unaccounted for, and 0.38 sec is going in 
.readRDS and grepl.  Whereas


system.time({
ans <- character(0)
for(i in 1:2096) ans <- c(ans, "foo")
})

takes 0.024 secs, negligible here (one profiler tick).

Here is what happens to me if I restart the computer:


Rprof( )
system.time( packs <- .packages( all = T ) )

 user  system elapsed
0

[Rd] R 2.9.0 devel: package installation with configure-args option

2009-03-03 Thread ml-it-r-devel

Hi,

trying
to install a package containing C code and requiring non-default configure 
argument
settings the incantation (this has worked for R <= 2.8.1 on the same 
architectures)

R CMD INSTALL --configure-args="--with-opt1 --with-opt2" packname

does always result in a warning
Warning: unknown option '--with-opt2'

and consequently the option is ignored. Reverting the order of options results 
in the now
last option to be ignored. Alternative quoting has not provided a solution.

Using

R CMD INSTALL --configure-args=--with-opt1 --configure-args=--with-opt2 packname

does provide a workaround, though. Is this the (new to me) and only intended 
way to
provide more than one configure argument?
I checked ?INSTALL and the referenced R-admin sec. 'Configuration variables' 
but still am
not clear on this.

Regards, Matthias


R version 2.9.0 Under development (unstable) (2009-03-02 r48041)
on Ubuntu 8.04, 8.10

-- 
Matthias Burger Project Manager/ Biostatistician
Epigenomics AGKleine Praesidentenstr. 110178 Berlin, Germany
phone:+49-30-24345-0fax:+49-30-24345-555
http://www.epigenomics.com   matthias.bur...@epigenomics.com
--
Epigenomics AG Berlin   Amtsgericht Charlottenburg HRB 75861
Vorstand:   Geert Nygaard (CEO/Vorsitzender)
Oliver Schacht PhD (CFO)
Aufsichtsrat:   Prof. Dr. Dr. hc. Rolf Krebs (Chairman/Vorsitzender)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] profiler and loops

2009-03-03 Thread Romain Francois


Hello,

Please find attached a patch against svn implementing this proposal.

The part I don't fully understand is the part involving the function  
loopWithContect, so I've put "[loop]" in there instead of "[for]", 
"[while]" or "[repeat]" because I don't really know how to extract the 
information.


With the script1 from my previous post, summaryRprof produces this:

[]$ /home/romain/workspace/R-trunk/bin/Rscript script1.R
$by.self
   self.time self.pct total.time total.pct
"[for]"  5.32 98.9   5.38 100.0
"rnorm"  0.06  1.1   0.06   1.1

$by.total
   total.time total.pct self.time self.pct
"[for]"   5.38 100.0  5.32 98.9
"rnorm"   0.06   1.1  0.06  1.1

$sampling.time
[1] 5.38

Romain


Romain Francois wrote:

Hello,

(This is follow up from this thread: 
http://www.nabble.com/execution-time-of-.packages-td22304833.html but 
with a different focus)


I am often confused by the result of the profiler, when a loop is 
involved. Consider these two scripts:


script1:

Rprof( )
x <- numeric( )
  for( i in 1:1){
x <- c( x, rnorm(10) )
  }
Rprof( NULL )
print( summaryRprof( ) )


script2:

Rprof( )
ffor <- function(){
  x <- numeric( )
  for( i in 1:1){
x <- c( x, rnorm(10) )
  }
}
ffor()
Rprof( NULL )
print( summaryRprof( ) )


[]$ time Rscript --vanilla script1.R
$by.self
  self.time self.pct total.time total.pct
"rnorm"  0.22  100   0.22   100

$by.total
  total.time total.pct self.time self.pct
"rnorm"   0.22   100  0.22  100

$sampling.time
[1] 0.22

real0m7.786s
user0m5.192s
sys 0m0.735s

[]$$ time Rscript --vanilla script2.R
$by.self
  self.time self.pct total.time total.pct
"ffor"   4.94 92.5   5.34 100.0
"rnorm"  0.40  7.5   0.40   7.5

$by.total
  total.time total.pct self.time self.pct
"ffor"5.34 100.0  4.94 92.5
"rnorm"   0.40   7.5  0.40  7.5

$sampling.time
[1] 5.34


real0m7.841s
user0m5.152s
sys 0m0.712s



In the first one, I call a for loop from the top level and in the 
second one, the loop is wrapped in a function call. This shows the 
inability of the profiler to point loops as responsible for 
bottlenecks. The coder of script1 would not know what to do to improve 
on the script.


I have had a quick look in the code, and here are a few thoughts:

in the function "doprof" in eval.c,  this loop write the call stack on 
the profiler file:


for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
  if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN))
  && TYPEOF(cptr->call) == LANGSXP) {
  SEXP fun = CAR(cptr->call);
  if (!newline) newline = 1;
  fprintf(R_ProfileOutfile, "\"%s\" ",
  TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
  "");
  }
  }
 so we can see it only cares about context CTXT_FUNCTION and 
CTXT_BUILTIN, when for loops play with CTXT_LOOP (this is again in 
eval.c within the do_for function)


begincontext(&cntxt, CTXT_LOOP, R_NilValue, rho, R_BaseEnv, R_NilValue,
   R_NilValue);

which as the name implies, begins the context of the for loop. The 
begincontext function looks like this :


void begincontext(RCNTXT * cptr, int flags,
SEXP syscall, SEXP env, SEXP sysp,
SEXP promargs, SEXP callfun)
{
  cptr->nextcontext = R_GlobalContext;
  cptr->cstacktop = R_PPStackTop;
  cptr->evaldepth = R_EvalDepth;
  cptr->callflag = flags;
  cptr->call = syscall;
  cptr->cloenv = env;
  cptr->sysparent = sysp;
  cptr->conexit = R_NilValue;
  cptr->cend = NULL;
  cptr->promargs = promargs;
  cptr->callfun = callfun;
  cptr->vmax = vmaxget();
  cptr->intsusp = R_interrupts_suspended;
  cptr->handlerstack = R_HandlerStack;
  cptr->restartstack = R_RestartStack;
  cptr->prstack = R_PendingPromises;
#ifdef BYTECODE
  cptr->nodestack = R_BCNodeStackTop;
# ifdef BC_INT_STACK
  cptr->intstack = R_BCIntStackTop;
# endif
#endif
  R_GlobalContext = cptr;
}


So it could be possible to set the last argument of the begincontext 
function to "for" and use this code in the doprof function:



for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
if ( ( cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN ) )
  && TYPEOF(cptr->call) == LANGSXP) {
  SEXP fun = CAR(cptr->call);
  if (!newline) newline = 1;
  fprintf(R_ProfileOutfile, "\"%s\" ",
  TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
  "");
  } else if( cptr->callflag & CTXT_LOOP){
SEXP fun = CAR(cptr->syscall);
if (!newline) newline = 1;
fprintf(R_ProfileOutfile, "\"%s\" ", CHAR(PRINTNAME(fun)) );
  }
}

so that we see for in the list of "functions" that appear in the 
profiler file.


Obviously I am taking some shortcuts here, because of the other loops, 
but I would like to make a formal patch with this. Before I do that, 
I'd like to know :
- is this has a chance of breaking something else (does the CTXT_LOOP 
be

Re: [Rd] S4 data dump or?

2009-03-03 Thread Paul Gilbert


Prof Brian Ripley wrote:

On Mon, 2 Mar 2009, Paul Gilbert wrote:

I am trying to dump some data in a file that I will add to a package.  
The data has an attribute which is a S4 object, and this seems to 
cause problems. What is the preferred way to write a file with a 
dataset that has some S4 parts, so that it can be included in a package?



Using save() seems almost always preferable to dump(): usually a smaller 
result, avoids representation error changes for numeric types and 
encoding issues for some character vectors, works for almost all objects.


Ok.  I thought I was having a problem with save()/load() too, but it 
seems that problem was something else. I have this working now.


I am guessing that the note on ?dump about objects of type S4 is the 
issue here.


Yes, the S4 object causes source() of the dump()ed file to fail.

Thanks,
Paul



Paul Gilbert






La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 




Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] make check reg-tests-1.R error on solaris

2009-03-03 Thread Karen Noel
R 2.5.1 compiled, passed the make check and has been successfully
running for a couple
years on a Sun Fire V490 running Solaris 9. I need a newer version of
R, but can't get a
newer version of R to pass the make check. I've tried 2.8.1, 2.7.2,
2.6.2 and 2.6.0. (2.5.1 still
passes on this server) At this point I thought I'd try to compile it
on another Sun server
(Solaris 10), but it had the same problem. Configuring with no options
didn't help. I
commented out the failed test from the Makefile to see if it would
pass the rest of the tests.
It passes all the rest of the tests. Here is the failure error from make check.

make[2]: Entering directory `/usr/local/src/R-2.8.1/tests'
running regression tests
make[3]: Entering directory `/usr/local/src/R-2.8.1/tests'
running code in 'reg-tests-1.R' ...make[3]: *** [reg-tests-1.Rout] Error 1
make[3]: Leaving directory `/usr/local/src/R-2.8.1/tests'
make[2]: *** [test-Reg] Error 2
make[2]: Leaving directory `/usr/local/src/R-2.8.1/tests'
make[1]: *** [test-all-basics] Error 1
make[1]: Leaving directory `/usr/local/src/R-2.8.1/tests'
make: *** [check] Error 2
bash-2.05#

Here is output from reg-tests-1.Rout.fail.

[1] "41c6167e" "dir1" "dir2" "dirs" "file275c23f2"
[6] "file33f963f2" "moredirs"
> file.create(file.path(dd, "somefile"))
[1] TRUE TRUE TRUE TRUE
> dir(".", recursive=TRUE)
[1] "41c6167e" "dir1/somefile" "dir2/somefile"
[4] "dirs/somefile" "file275c23f2" "file33f963f2"
[7] "moredirs/somefile"
> stopifnot(unlink("dir?") == 1) # not an error
Error: unlink("dir?") == 1 is not TRUE
Execution halted
rm: Cannot remove any directory in the path of the current working directory
/tmp/RtmprBjF6W

Looking through the archives I did find a couple other people with
this error, both running
Solaris 10. PR#10501 and PR#11738 have quite a lot of information
about this error, but I
don't see any resolution for them.

This looks like it could possibly be enough of a problem that I
haven't put 2.8.1 in
production. Can you help me with a resolution or let me know if it is
safe to ignore? I'd
appreciate it.

Thank you!
Karen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] profiler and loops

2009-03-03 Thread Romain Francois
Please ignore the previous patch which did not take into account the 
conditional compilation of doprof on windows. This one does, but was not 
tested on windows.


Romain

Romain Francois wrote:


Hello,

Please find attached a patch against svn implementing this proposal.

The part I don't fully understand is the part involving the function  
loopWithContect, so I've put "[loop]" in there instead of "[for]", 
"[while]" or "[repeat]" because I don't really know how to extract the 
information.


With the script1 from my previous post, summaryRprof produces this:

[]$ /home/romain/workspace/R-trunk/bin/Rscript script1.R
$by.self
   self.time self.pct total.time total.pct
"[for]"  5.32 98.9   5.38 100.0
"rnorm"  0.06  1.1   0.06   1.1

$by.total
   total.time total.pct self.time self.pct
"[for]"   5.38 100.0  5.32 98.9
"rnorm"   0.06   1.1  0.06  1.1

$sampling.time
[1] 5.38

Romain


Romain Francois wrote:

Hello,

(This is follow up from this thread: 
http://www.nabble.com/execution-time-of-.packages-td22304833.html but 
with a different focus)


I am often confused by the result of the profiler, when a loop is 
involved. Consider these two scripts:


script1:

Rprof( )
x <- numeric( )
  for( i in 1:1){
x <- c( x, rnorm(10) )
  }
Rprof( NULL )
print( summaryRprof( ) )


script2:

Rprof( )
ffor <- function(){
  x <- numeric( )
  for( i in 1:1){
x <- c( x, rnorm(10) )
  }
}
ffor()
Rprof( NULL )
print( summaryRprof( ) )


[]$ time Rscript --vanilla script1.R
$by.self
  self.time self.pct total.time total.pct
"rnorm"  0.22  100   0.22   100

$by.total
  total.time total.pct self.time self.pct
"rnorm"   0.22   100  0.22  100

$sampling.time
[1] 0.22

real0m7.786s
user0m5.192s
sys 0m0.735s

[]$$ time Rscript --vanilla script2.R
$by.self
  self.time self.pct total.time total.pct
"ffor"   4.94 92.5   5.34 100.0
"rnorm"  0.40  7.5   0.40   7.5

$by.total
  total.time total.pct self.time self.pct
"ffor"5.34 100.0  4.94 92.5
"rnorm"   0.40   7.5  0.40  7.5

$sampling.time
[1] 5.34


real0m7.841s
user0m5.152s
sys 0m0.712s



In the first one, I call a for loop from the top level and in the 
second one, the loop is wrapped in a function call. This shows the 
inability of the profiler to point loops as responsible for 
bottlenecks. The coder of script1 would not know what to do to 
improve on the script.


I have had a quick look in the code, and here are a few thoughts:

in the function "doprof" in eval.c,  this loop write the call stack 
on the profiler file:


for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
  if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN))
  && TYPEOF(cptr->call) == LANGSXP) {
  SEXP fun = CAR(cptr->call);
  if (!newline) newline = 1;
  fprintf(R_ProfileOutfile, "\"%s\" ",
  TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
  "");
  }
  }
 so we can see it only cares about context CTXT_FUNCTION and 
CTXT_BUILTIN, when for loops play with CTXT_LOOP (this is again in 
eval.c within the do_for function)


begincontext(&cntxt, CTXT_LOOP, R_NilValue, rho, R_BaseEnv, R_NilValue,
   R_NilValue);

which as the name implies, begins the context of the for loop. The 
begincontext function looks like this :


void begincontext(RCNTXT * cptr, int flags,
SEXP syscall, SEXP env, SEXP sysp,
SEXP promargs, SEXP callfun)
{
  cptr->nextcontext = R_GlobalContext;
  cptr->cstacktop = R_PPStackTop;
  cptr->evaldepth = R_EvalDepth;
  cptr->callflag = flags;
  cptr->call = syscall;
  cptr->cloenv = env;
  cptr->sysparent = sysp;
  cptr->conexit = R_NilValue;
  cptr->cend = NULL;
  cptr->promargs = promargs;
  cptr->callfun = callfun;
  cptr->vmax = vmaxget();
  cptr->intsusp = R_interrupts_suspended;
  cptr->handlerstack = R_HandlerStack;
  cptr->restartstack = R_RestartStack;
  cptr->prstack = R_PendingPromises;
#ifdef BYTECODE
  cptr->nodestack = R_BCNodeStackTop;
# ifdef BC_INT_STACK
  cptr->intstack = R_BCIntStackTop;
# endif
#endif
  R_GlobalContext = cptr;
}


So it could be possible to set the last argument of the begincontext 
function to "for" and use this code in the doprof function:



for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
if ( ( cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN ) )
  && TYPEOF(cptr->call) == LANGSXP) {
  SEXP fun = CAR(cptr->call);
  if (!newline) newline = 1;
  fprintf(R_ProfileOutfile, "\"%s\" ",
  TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
  "");
  } else if( cptr->callflag & CTXT_LOOP){
SEXP fun = CAR(cptr->syscall);
if (!newline) newline = 1;
fprintf(R_ProfileOutfile, "\"%s\" ", CHAR(PRINTNAME(fun)) );
  }
}

so that we see for in the list of "functions" that appear in the 
profiler file.


Obviously I am taking some shortcuts h

Re: [Rd] R 2.9.0 devel: package installation with configure-args option

2009-03-03 Thread Prof Brian Ripley

That version of R is 'under development' and the INSTALL file says

## FIXME: this loses quotes, so filepaths with spaces in get broken up

so it is I think the same as a known issue.

The whole package installation process has been completely 
reconstructed for R-devel, and the process is not quite finished.

And this is a low priority as there are effective workarounds.

On Tue, 3 Mar 2009, ml-it-r-de...@epigenomics.com wrote:



Hi,

trying
to install a package containing C code and requiring non-default configure 
argument
settings the incantation (this has worked for R <= 2.8.1 on the same 
architectures)

R CMD INSTALL --configure-args="--with-opt1 --with-opt2" packname

does always result in a warning
Warning: unknown option '--with-opt2'

and consequently the option is ignored. Reverting the order of options results 
in the now
last option to be ignored. Alternative quoting has not provided a solution.

Using

R CMD INSTALL --configure-args=--with-opt1 --configure-args=--with-opt2 packname

does provide a workaround, though. Is this the (new to me) and only intended 
way to
provide more than one configure argument?
I checked ?INSTALL and the referenced R-admin sec. 'Configuration variables' 
but still am
not clear on this.

Regards, Matthias


R version 2.9.0 Under development (unstable) (2009-03-02 r48041)
on Ubuntu 8.04, 8.10

--
Matthias Burger Project Manager/ Biostatistician
Epigenomics AGKleine Praesidentenstr. 110178 Berlin, Germany
phone:+49-30-24345-0fax:+49-30-24345-555
http://www.epigenomics.com   matthias.bur...@epigenomics.com
--
Epigenomics AG Berlin   Amtsgericht Charlottenburg HRB 75861
Vorstand:   Geert Nygaard (CEO/Vorsitzender)
Oliver Schacht PhD (CFO)
Aufsichtsrat:   Prof. Dr. Dr. hc. Rolf Krebs (Chairman/Vorsitzender)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] X11.Rd has a dead link

2009-03-03 Thread Stephen Weigand
In X11.Rd the Resources section has the following dead link:

  http://web.mit.edu/answers/xwindows/xwindows_resources.html

I never saw the target document but is this its new URL:

  http://kb.mit.edu/confluence/pages/viewpage.action?pageId=3907291

Thank you,

Stephen

-- 
Rochester, Minn. USA

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] 'anova.gls' in 'nlme' (PR#13567)

2009-03-03 Thread richard_raubertas
There is a bug in 'anova.gls' in the 'nlme' package (3.1-90).  The=20
bug is triggered by calling the function with a single 'gls' object=20
and specifying the 'Terms' argument but not the 'L' argument:

> library(nlme)
>  fm1Orth.gls <- gls(distance ~ Sex * I(age - 11), Orthodont,
+  correlation =3D corSymm(form =3D ~ 1 | Subject),
+  weights =3D varIdent(form =3D ~ 1 | age))
>  anova(fm1Orth.gls)
Denom. DF: 104=20
numDF  F-value p-value
(Intercept) 1 4246.041  <.0001
Sex 17.718  0.0065
I(age - 11) 1  116.806  <.0001
Sex:I(age - 11) 17.402  0.0076
>  anova(fm1Orth.gls, Terms=3D"Sex")
Error in anova.gls(fm1Orth.gls, Terms =3D "Sex") :=20
  object "noZeroColL" not found
>

The bug is in the following lines near the end:

 if (!missing(L)) {
if (nrow(L) > 1)
attr(aod, "L") <- L[, noZeroColL, drop =3D FALSE]
else attr(aod, "L") <- L[, noZeroColL]
 }

where the problem is that when 'Terms' is provided, earlier code=20
sets 'L' (so it is no longer missing) but does not set 'noZeroColL'.

In the similar function 'anova.lme' the problem is avoided by the=20
first line

 Lmiss <- missing(L)

and then testing whether 'Lmiss' is TRUE in the rest of the=20
function, rather than 'missing(L)'.

Rich Raubertas
Merck & Co.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D

> sessionInfo()
R version 2.8.1 (2008-12-22)=20
i386-pc-mingw32=20

locale:
LC_COLLATE=3DEnglish_United States.1252;LC_CTYPE=3DEnglish_United
States.1252;LC_MONETARY=3DEnglish_United
States.1252;LC_NUMERIC=3DC;LC_TIME=3DEnglish_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base


other attached packages:
[1] nlme_3.1-90

loaded via a namespace (and not attached):
[1] grid_2.8.1  lattice_0.17-20 tools_2.8.1   =20
>=20
Notice:  This e-mail message, together with any attachme...{{dropped:12}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] S4 helper functions: regular or generic?

2009-03-03 Thread Gopi Goswami
Dear Martin,


Thanks a lot for your help, apologies for this very late reply. I
decided to go with your suggestion, write a regular function. I guess
this avoids doing

obj <- as(foo(as(obj, 'Base')), 'Derived')

and then repopulating the extra slots of of the 'Derived' class.


Regards,
gopi.


On Wed, Feb 25, 2009 at 9:36 AM, Martin Morgan  wrote:
> Hi Gopi --
>
> Gopi Goswami  writes:
>
>> Hi there,
>>
>>
>> I want to write helper functions for a base class, which will be used
>> by its subclasses in the S4 world. This function ___will___ update
>> certain slots of its argument object. Please help me decide which one
>> of the following is a better approach with respect to coding style,
>> memory usage and speed:
>>
>
> My opinion:
>
>> o   Write a regular function.
>
> memory and speed
>
>> o   Declare a generic and implement it just for the base class.
>
> coding 'style', but style is subjective.
>
> There are other aspects of S4, e.g., type checking, method dispatch,
> programmatically defined and discoverable API, ... (positives),
> cumbersome documentation (negative).
>
> My usual pattern of development is to be seduced by the siren of
> speed, only to regret boxing myself in.
>
> I find that my S4 objects typically serve as containers for
> coordinating other entities.  The important methods typically extract
> R 'base' objects from the S4 class, manipulate them, and repackage the
> result as S4. The time and speed issues are in the manipulation, not
> in the extraction / repackaging. This is contrast to, say, an
> implementation of a tree-like data structure with a collection of
> 'Node' objects, where tree operations would require access to each
> object and would be horribly slow in S4 (and perhaps R when nodes were
> represented as a list, say, at least compared to a C-level
> representation, or an alternative representation that took advantage
> of R's language characteristics).
>
> Martin
>
>>
>> Thanks for sharing your insight and time,
>> gopi.
>> http://gopi-goswami.net/
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M2 B169
> Phone: (206) 667-2793
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] callNextMethod() doesn't pass down arguments

2009-03-03 Thread hpages

Hi,

According to its man page, callNextMethod() (called with
no argument) should call the next method "with the arguments
to the current method passed down to the next method".
But, for the "[[" and "[" generics (primitives), the argument
after the dots doesn't seem to be passed down:

setClass("A", representation(thing="ANY"))
setClass("B", contains="A")

setMethod("[[", "A",
function(x, i, j, ..., exact=TRUE) return(exact))
setMethod("[[", "B",
function(x, i, j, ..., exact=TRUE) callNextMethod())


b <- new("B")
b[[3]]

[1] TRUE

b[[3, exact=FALSE]]

[1] TRUE

setMethod("[", "A",
function(x, i, j, ..., drop=TRUE) return(drop))
setMethod("[", "B",
function(x, i, j, ..., drop=TRUE) callNextMethod())


b[3]

[1] TRUE

b[3, drop=FALSE]

[1] TRUE

I tried this with R 2.8.0 and 2.9.0 (r47727).

Cheers,
H.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel