> Dario Strbenac <dstr7...@uni.sydney.edu.au> writes: > > I have a real scenario involving 45 million biological cells > (samples) and 60 proteins (variables) which leads to a segmentation > fault for svd. I thought this might be a good example of why it > might benefit from a long vector upgrade.
Rather than the full SVD of a 45000000x60 X, my guess is that you may really only be interested in the eigenvalues and eigenvectors of X^T X, in which case eigen(t(X)%*%X) would probably be much faster. (And eigen(crossprod(X)) would be even faster.) Note that if you instead want the eigenvalues and eigenvectors of X X^T (which is an enormous matrix), the eigenvalues of this are the same as those of X^T X, and the eigenvectors are Xv, where v is an eigenvector of X^T X. For example, with R 4.0.2, and the reference BLAS/LAPACK, I get > X<-matrix(rnorm(100000),10000,10) > system.time(for(i in 1:1000) rs<-svd(X)) user system elapsed 2.393 0.008 2.403 > system.time(for(i in 1:1000) re<-eigen(crossprod(X))) user system elapsed 0.609 0.000 0.609 > rs$d^2 [1] 10568.003 10431.864 10318.959 10219.961 10138.025 10068.566 9931.538 [8] 9813.841 9703.818 9598.532 > re$values [1] 10568.003 10431.864 10318.959 10219.961 10138.025 10068.566 9931.538 [8] 9813.841 9703.818 9598.532 Possibly some other LAPACK might implement svd better, though I suspect that R will allocate more big matrices than really necessary for the svd even aside from whatever LAPACK is doing. Regards, Radford Neal ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel