Hi Lishu,
I run into the similar large-scale problems recently. I used a parallel
SGD k-means described in this paper for my problem:
http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf
Let n be the samples, k be the number of clusters, and m be the number of
nodes,
1. First, each node r
Hi,
I have a 60k*600k matrix, which exceed the vector length limit of 2^32-1.
But it's rather sparse, only 0.02% has value. So I save is as MarketMatrix
(mm) file, it's about 300M in size. I use readMM in Matrix package to read
it in. If do so, the data type becomes dgTMatrix in 'Matrix' package
i
2 matches
Mail list logo