Re: [Rd] .Call and to reclaim the memory by allocVector

2007-08-29 Thread Yongchao Ge
Hi Seth,

Thank you for the suggestion. Because of using .Call (which does not copy 
the value) for both parts of my program, there is no extra copy shown by 
tracemem(). Anyway, the information shown by gc() is very misleading as 
stated by Prof. Ripley, especially after creating and removing a 
couple of large R datasets and applying the function gc() a couple of times.

As shown by "ps aux", there is no "memory leak" from .Call. It's a big 
relief to me. Mysteriously, my program works now for storing the 
intermediate results as a 660M R object. I can run the same function  as 
often as I want. The maximum space taken by the program has never 
exceeded 1.8G as I expected. The disappearance of taking too much memory 
from .Call may be due to a recompile of my C code or a restart of the 
linux or a fresh mind after the weekend.

Thank you and Prof. Ripley for the suggestions. It helps me to stay 
focused.


Yongchao




On Sat, 25 Aug 2007, Seth Falcon wrote:

> Hi Yongchao,
>
> Yongchao Ge <[EMAIL PROTECTED]> writes:
>> Why am I storing a large dataset in the R? My program consist of two
>> parts. The first part is to get the intermediate results, the computation
>> of which takes a lot of time. The second part contains many
>> different functions to manipulate the the intermediate
>> results.
>>
>> My current solution is to save intermediate result in a temporary file,
>> but my final goal is to to save it as an R object. The "memory leak" in
>> .Call stops me from doing this and I'd like to know if I can have a clean
>> solution for the R package I am writing.
>
> There are many examples of packages that use .Call to create large
> objects.  I don't think there is a "memory leak".
>
> One thing that may be catching you up is that because of R's
> pass-by-value semantics, you may be ending up with multiple copies of
> the object on the R side during some of your operations.  I would
> recommend recompiling with --enable-memory-profiling and using
> tracemem() to see if you can identify places where copies of your
> large object are occurring.  You can also take a look at
> Rprof(memory.profile=TRUE).
>
> + seth
>
>
> -- 
> Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
> BioC: http://bioconductor.org/
> Blog: http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] NA and NaN in function identical

2007-08-29 Thread Petr Savicky
The help page for function identical says:
 'identical' sees 'NaN' as different from 'as.double(NA)', but all
 'NaN's are equal (and all 'NA' of the same type are equal).
However, we have
  x <- NaN
  y <- as.double(NA)
  x # [1] NaN
  y # [1] NA
  identical(x,y) # [1] TRUE

In my opinion, NaN and as.double(NA) should be distinguished as the 
help page suggests.

Tested under R version 2.5.1 Patched (2007-08-19 r42638) on Linux (CPU Xeon).

Petr Savicky.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NA and NaN in function identical

2007-08-29 Thread Prof Brian Ripley
On Wed, 29 Aug 2007, Petr Savicky wrote:

> The help page for function identical says:
> 'identical' sees 'NaN' as different from 'as.double(NA)', but all
> 'NaN's are equal (and all 'NA' of the same type are equal).
> However, we have
>  x <- NaN
>  y <- as.double(NA)
>  x # [1] NaN
>  y # [1] NA
>  identical(x,y) # [1] TRUE
>
> In my opinion, NaN and as.double(NA) should be distinguished as the
> help page suggests.

And sometimes they are:

> identical(y,x)
[1] FALSE

so it is a bug.  A quicker version:

identical(NaN, NA_real_) == identical(NA_real_, NaN)

was false, fixed now, thanks for spotting it.

>
> Tested under R version 2.5.1 Patched (2007-08-19 r42638) on Linux (CPU Xeon).
>
> Petr Savicky.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check: Error in function (env) : could not find function "finalize"

2007-08-29 Thread Henrik Bengtsson
Hi, thanks Seth and others (I've got some offline replies); all
feedback has been useful indeed.

The short story is that as the author of R.oo I am actually the bad
guy here (but not for long since I'm soon committing a fix for R.oo).

REPRODUCIBLE EXAMPLE #1:
% R --vanilla
> library(R.oo)
R.oo v1.2.8 (2006-06-09) successfully loaded. See ?R.oo for help.
> detach("package:R.oo")
> gc()
Error in function (env)  : could not find function "finalize"
Error in function (env)  : could not find function "finalize"
Error in function (env)  : could not find function "finalize"
Error in function (env)  : could not find function "finalize"
Error in function (env)  : could not find function "finalize"
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 142543  3.9 35  9.4   35  9.4
Vcells  82660  0.7 786432  6.0   478255  3.7

REPRODUCIBLE EXAMPLE #2:
Here is another example without R.oo illustrating what is going on.
% R --vanilla
> e <- new.env()
> e$foo <- "foo"
> e$foo <- 1
> e <- new.env()
> e$foo <- 1
> reg.finalizer(e, function(env) { print(ls.str(envir=env)) })
> detach("package:utils")
> rm(e)
> gc()
Error in print(ls.str(envir = env)) : could not find function "ls.str"
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 158213  4.3 35  9.4   35  9.4
Vcells  86800  0.7 786432  6.0   478255  3.7


WHY ONLY WHEN RUNNING R CMD CHECK?
So, the problem I had was with 'affxparser' examples failing in R CMD
check, but not when I tested them manually.  Same thing was happening
with the 'pcaMethods' package.  The common denominator was that both
'affxparser' and 'pcaMethods' had R.oo dependent package in
DESCRIPTION/Suggests; 'affxparser' used Suggests: R.utils (which
depends on R.oo), and 'pcaMethods' used Suggests: aroma.light (which
in turn *suggests* R.utils).  To the best of my understanding, when R
CMD check runs examples, it will load *all* suggested packages, and
when done, detach them.  When the garbage collector later runs and
cleans out objects, the generic function finalize() in R.oo called by
the registered finalize hook is not around anymore.  FYI, if you move
the R.oo-dependent package from Suggests: to Depends:, there is no
longer a problem because then the package is never detached.  It all
makes sense.


CONCLUSION:
When registering finalizers for object using reg.finalizer() there is
always the risk of the finalizer code to be broken because a dependent
package has been detached.


SOLUTION:
At least make the finalizer hook function robust against these things.
 For instance, check if required packages are loaded etc, or just add
a tryCatch() statement.  However, since finalizers are typically used
to deallocate resources, much more effort has to be taken to make sure
that is still work, which is not easy.  For instance, one could make
sure to require() the necessary packages in the finalizer, but that
has side effects and it is not necessarily sufficient, e.g. you might
only load a generic function, but the method for a specific subclass
might be in a package out of your control.  Same problem goes with
explicit namespace calls to generic functions, e.g. R.oo::finalize().
If you have more clever suggestions, please let me know.


SOME MORE DETAILS ON R.OO:
This is what R.oo looks like now:

Object <- function (core = NA) {
  this <- core
  attr(this, ".env") <- new.env()
  class(this) <- "Object"
  attr(this, "...instanciationTime") <- Sys.time()
  reg.finalizer(attr(this, ".env"), function(env) finalize(this))
  this
}

finalize.Object <- function(this, ...) {}

finalize <- function(...) UseMethod("finalize")

As you see, when detaching R.oo, finalize() is no longer around.

Lesson learned!

Cheers

Henrik

On 8/28/07, Seth Falcon <[EMAIL PROTECTED]> wrote:
> Hi Henrik,
>
> "Henrik Bengtsson" <[EMAIL PROTECTED]> writes:
>
> > Hi,
> >
> > does someone else get this error message:
> >
> > Error in function (env)  : could not find function "finalize"?
> >
> > I get an error when running examples in R CMD check (v2.6.0; session
> > info below):
> [snip]
> > The error occurs in R CMD check but also when start a fresh R session
> > and run, in this case, affxparser.Rcheck/affxparser-Ex.R.  It always
> > occur on the same line.
>
> So does options(error=recover) help in determining where the error is
> coming from?
>
> If you can narrow it down, gctorture may help or running the examples
> under valgrind.
>
> + seth
>
> --
> Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
> BioC: http://bioconductor.org/
> Blog: http://userprimary.net/user/
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Modifying R_CheckStack for a speed increase

2007-08-29 Thread Stephen Milborrow
Greetings R developers,

R will run a little faster when executing "pure R" code if the function
R_CheckStack() is modified.

With the modification, the following code for example runs 15% faster
(compared to a virgin R-2.5.1 on my Windows XP machine):

  N = 1e7
  foo <- function(x)
  {
   for (i in 1:N)
x <- x + 1
  x
  }
  foo(0)

The crux of the modification is to change the following line in 
R_CheckStack()

  if(R_CStackLimit != -1 && usage > 0.95 * R_CStackLimit) {...

to

  if(usage > R_CStackLen) { ...

Details and modified sources can be found at
ftp://ftp.sonic.net/pub/users/milbo.

Regards,
Stephen

http://milbo.users.sonic.net

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Modifying R_CheckStack for a speed increase

2007-08-29 Thread Byron Ellis
Alternatively, if you actually wanted to keep the 0.95 you could use

usage > R_CStackLimit - (R_CStackLimit >> 4)

and probably get close enough to 0.95 as it makes no difference or go
with 5 and get something more like 97%. At any rate, you'd avoid
floating point.

On 8/29/07, Stephen Milborrow <[EMAIL PROTECTED]> wrote:
> Greetings R developers,
>
> R will run a little faster when executing "pure R" code if the function
> R_CheckStack() is modified.
>
> With the modification, the following code for example runs 15% faster
> (compared to a virgin R-2.5.1 on my Windows XP machine):
>
>   N = 1e7
>   foo <- function(x)
>   {
>for (i in 1:N)
> x <- x + 1
>   x
>   }
>   foo(0)
>
> The crux of the modification is to change the following line in
> R_CheckStack()
>
>   if(R_CStackLimit != -1 && usage > 0.95 * R_CStackLimit) {...
>
> to
>
>   if(usage > R_CStackLen) { ...
>
> Details and modified sources can be found at
> ftp://ftp.sonic.net/pub/users/milbo.
>
> Regards,
> Stephen
>
> http://milbo.users.sonic.net
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Byron Ellis ([EMAIL PROTECTED])
"Oook" -- The Librarian

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check: Error in function (env) : could not find function "finalize"

2007-08-29 Thread Prof Brian Ripley
> To the best of my understanding, when R
> CMD check runs examples, it will load *all* suggested packages, and
> when done, detach them.  When the garbage collector later runs and

Not so.  R CMD check just runs the examples in a normal session after 
loading the package being tested.  Examples may themselves attach/load 
suggested packages (and if they are suggested it is likely that either 
examples or vignettes will do so).  After each group of examples 
(\examples from a single help file) any packages which have been attached 
in the course of that group are detached.

Looking at pcaMethods, function robustSvd require()s aroma.light, so this 
will happen in example(robustSvd).

I think a package that sets finalizers probably ought to ensure that they 
are run in its .Last.lib or similar hook.  There is no guarantee that they 
will be detached in a particular order, though.  (R CMD check does detach 
them in an ordering determined from the search path, but users may do 
something different.)  If namespaces are involved, similar considerations 
apply to unloading namespaces (although there are some guarantees on order 
since you cannot unload a namespace which is imported from).  Beyond that, 
you may be able to ensure by setting the environment for the 
finalizer function that what it needs will still be present at 
finalization.


On Wed, 29 Aug 2007, Henrik Bengtsson wrote:

> Hi, thanks Seth and others (I've got some offline replies); all
> feedback has been useful indeed.
>
> The short story is that as the author of R.oo I am actually the bad
> guy here (but not for long since I'm soon committing a fix for R.oo).
>
> REPRODUCIBLE EXAMPLE #1:
> % R --vanilla
>> library(R.oo)
> R.oo v1.2.8 (2006-06-09) successfully loaded. See ?R.oo for help.
>> detach("package:R.oo")
>> gc()
> Error in function (env)  : could not find function "finalize"
> Error in function (env)  : could not find function "finalize"
> Error in function (env)  : could not find function "finalize"
> Error in function (env)  : could not find function "finalize"
> Error in function (env)  : could not find function "finalize"
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 142543  3.9 35  9.4   35  9.4
> Vcells  82660  0.7 786432  6.0   478255  3.7
>
> REPRODUCIBLE EXAMPLE #2:
> Here is another example without R.oo illustrating what is going on.
> % R --vanilla
>> e <- new.env()
>> e$foo <- "foo"
>> e$foo <- 1
>> e <- new.env()
>> e$foo <- 1
>> reg.finalizer(e, function(env) { print(ls.str(envir=env)) })
>> detach("package:utils")
>> rm(e)
>> gc()
> Error in print(ls.str(envir = env)) : could not find function "ls.str"
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 158213  4.3 35  9.4   35  9.4
> Vcells  86800  0.7 786432  6.0   478255  3.7
>
>
> WHY ONLY WHEN RUNNING R CMD CHECK?
> So, the problem I had was with 'affxparser' examples failing in R CMD
> check, but not when I tested them manually.  Same thing was happening
> with the 'pcaMethods' package.  The common denominator was that both
> 'affxparser' and 'pcaMethods' had R.oo dependent package in
> DESCRIPTION/Suggests; 'affxparser' used Suggests: R.utils (which
> depends on R.oo), and 'pcaMethods' used Suggests: aroma.light (which
> in turn *suggests* R.utils).  To the best of my understanding, when R
> CMD check runs examples, it will load *all* suggested packages, and
> when done, detach them.  When the garbage collector later runs and
> cleans out objects, the generic function finalize() in R.oo called by
> the registered finalize hook is not around anymore.  FYI, if you move
> the R.oo-dependent package from Suggests: to Depends:, there is no
> longer a problem because then the package is never detached.  It all
> makes sense.
>
>
> CONCLUSION:
> When registering finalizers for object using reg.finalizer() there is
> always the risk of the finalizer code to be broken because a dependent
> package has been detached.
>
>
> SOLUTION:
> At least make the finalizer hook function robust against these things.
> For instance, check if required packages are loaded etc, or just add
> a tryCatch() statement.  However, since finalizers are typically used
> to deallocate resources, much more effort has to be taken to make sure
> that is still work, which is not easy.  For instance, one could make
> sure to require() the necessary packages in the finalizer, but that
> has side effects and it is not necessarily sufficient, e.g. you might
> only load a generic function, but the method for a specific subclass
> might be in a package out of your control.  Same problem goes with
> explicit namespace calls to generic functions, e.g. R.oo::finalize().
> If you have more clever suggestions, please let me know.
>
>
> SOME MORE DETAILS ON R.OO:
> This is what R.oo looks like now:
>
> Object <- function (core = NA) {
>  this <- core
>  attr(this, ".env") <- new.env()
>  class(this) <- "Object"
>  attr(this, "...instanciationTime") <- Sys.time()
>

[Rd] R CMD check recursive copy of tests/

2007-08-29 Thread Henrik Bengtsson
>From NEWS of R v2.6.0 devel:

 o  R CMD check now does a recursive copy on the 'tests' directory.

However, R CMD check does not run *.R scripts in such subdirectories
(as I thought/hoped for), only those directly under tests/, This may
or may not be intentional.  If true, maybe the above should be
clarified as:

 o  R CMD check now does a recursive copy on the 'tests' directory for
the purpose of provided data files for input.  Test scripts still has
to be directly under tests/ to be run.

Just a comment

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel