>>>>> Roy Mendelssohn <- NOAA Federal <roy.mendelss...@noaa.gov>>
>>>>>     on Tue, 22 Mar 2016 07:42:10 -0700 writes:

    > Hi All:
    > I am running prcomp on a very large array, roughly [500000, 3650].  The 
array itself is 16GB.  I am running on a Unix machine and am running “top” at 
the same time and am quite surprised to see that the application memory usage 
is 76GB.  I have the “tol” set very high  (.8) so that it should only pull out 
a few components.  I am surprised at this memory usage because prcomp uses the 
SVD if I am not mistaken, and when I take guesses at the size of the SVD 
matrices they shouldn’t be that large.   While I can fit this  in, for a 
variety of reasons I would like to reduce the memory footprint.  She questions:

    > 1.  I am running with “center=FALSE” and “scale=TRUE”.  Would I save 
memory if I scaled the data first myself, saved the result, cleared out the 
workspace, read the scaled data back in and did the prcomp call?  Basically are 
the intermediate calculations for scaling kept in memory after use.

    > 2. I don’t know how prcomp memory usage compares to a direct call to 
“svn” which allows me to explicitly set how many  singular vectors to compute 
(I only need like the first five at most).  prcomp is convenient because it 
does a lot of the other work for me

For your example, where p := ncol(x)  is 3650  but you only want
the first 5 PCs, it would be *considerably* more efficient to
use svd(..., nv = 5) directly.

So I would take  stats:::prcomp.default  and modify it
correspondingly.

This seems such a useful idea in general that I consider
updating the function in R with a new optional 'rank.'  argument which
you'd set to 5 in your case.

Scrutinizing R's underlying svd() code however, I know see that
there are typicall still two other [n x p] matrices created (on
in R's La.svd(), one in C code) ... which I think should be
unnecessary in this case... but that would really be another
topic (for R-devel , not R-help).

Martin

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to