Hello all, I am using the clustering functions in R in order to work with large masses of binary time series data, however the clustering functions do not seem able to fit this size of practical problem. Library 'hclust' is good (though it may be sub par for this size of problem, thus doubly poor for this application) in that I do not want to make assumptions about the number of clusters present, also due to computational resources and time hclust is not functionally good enough; furthermore k-means works fine assuming the number of clusters within the data, which is not realistic. The silhouette functions in 'Pam' and 'Clara' and (if I remember correctly) 'cluster' seem to be really bad through very thorough experimentation of data generation with known clusters. I am left then with either theoretical abstractions such as pruning hclust trees with minimal spanning trees or perhaps hand-rolling a hierarchical k-medoids which works extremely efficiently and without cluster number assumptions. Anybody have any suggestions as to possible libraries which I have missed or suggestions in general? Note: this is not a question for 'Bigkmeans' unless there exists a 'findbigkmeansnumberofclusters' function also. Thank you in advance for your assistance, Ken
[[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.