On 13/08/2021 15:58, luke-tier...@uiowa.edu wrote:
[copying the list]
svd() does support matrices with long vector data. Your example works
fine for me on a machine with enough memory with either the reference
BLAS/LAPACK or the BLAS/LAPACK used on Fedora 33 (flexiblas backed, I
believe, by a version of openBLAS). Take a look at sessionInfo() to
see what you are using and consider switching to another BLAS/LAPACK
if necessary. Running under gdb may help tracking down where the issue
is and reporting it for the BLAS/LAPACK you are using.
See also
https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#Large-matrices which
(to nuance Prof Tierney's comment) mentions that svd on long-vector
*complex* data has been known to segfault (with the reference BLAS/Lapack).
My guess was that this was an out-of-memory condition not handled
elegantly by the OS. (There are many reasons why the posting guide asks
for the output of sessionInfo().)
We do not have the statistical context but it seems unlikely that anyone
is interested in each of the 45m samples, and for information on the
proteins a quite small sample of cells would suffice. And that not all
45m left singular values are required (most likely none are, in which
case the underlying Lapack routine can use a more efficient calculation).
Best,
luke
On Fri, 13 Aug 2021, Dario Strbenac via R-devel wrote:
Good day,
I have a real scenario involving 45 million biological cells (samples)
and 60 proteins (variables) which leads to a segmentation fault for
svd. I thought this might be a good example of why it might benefit
from a long vector upgrade.
test <- matrix(rnorm(45000000*60), ncol = 60)
testSVD <- svd(test)
*** caught segfault ***
address 0x7fe93514d618, cause 'memory not mapped'
Traceback:
1: La.svd(x, nu, nv)
2: svd(test)
--
Brian D. Ripley, rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel