[R] Clustering Large Applications..sort of

Ken Hutchison Wed, 10 Aug 2011 12:08:32 -0700

Hello all,
   I am using the clustering functions in R in order to work with large
masses of binary time series data, however the clustering functions do not
seem able to fit this size of practical problem. Library 'hclust' is good
(though it may be sub par for this size of problem, thus doubly poor for
this application) in that I do not want to make assumptions about the number
of clusters present, also due to computational resources and time hclust is
not functionally good enough; furthermore k-means works fine assuming the
number of clusters within the data, which is not realistic. The silhouette
functions in 'Pam' and 'Clara' and (if I remember correctly) 'cluster' seem
to be really bad through very thorough experimentation of data generation
with known clusters. I am left then with either theoretical abstractions
such as pruning hclust trees with minimal spanning trees or perhaps
hand-rolling a hierarchical k-medoids which works extremely efficiently and
without cluster number assumptions. Anybody have any suggestions as to
possible libraries which I have missed or suggestions in general? Note: this
is not a question for 'Bigkmeans' unless there exists a
'findbigkmeansnumberofclusters' function also.
                                        Thank you in advance for your
assistance,
                                             Ken


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Clustering Large Applications..sort of

Reply via email to