R community,

I am trying to cluster large omics datasets (10,000-20,000 variables).  
Obviously, with datasets of this size, PC memory is an issue.  I am using a 
custom distance metric, and am able to generate a dissimilarity matrix in 
sparse format.  To cluster, for example, using heirarchical clustering (hclust, 
or fastcluster::hclust), I need to submit the dataset as a distance object.  I 
can use as.dist() to acheive this, but in doing so the sparse matrix format is 
expanded to its full form, which quickly consumes all the memory on most 
desktop PCs.

My question is then:
1. Is there a clustering tool that can take as input a sparse dissimilarity 
matrix directly without expanding it?
2. alternatively, is there a sparse distance object format that I can't seem to 
find (an alternative to as.dist(), for example)?

Any advice is appreciated.
Corey

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to