Re: [R] Clustering Large Applications..sort of

2011-08-10 Thread Christian Hennig
PS to my previous posting: Also have a look at kmeansruns in fpc. This runs kmeans for several numbers of clusters and decides the number of clusters by either Calinski&Harabasz or Average Silhouette Width. Christian On Wed, 10 Aug 2011, Ken Hutchison wrote: Hello all, I am using the clust

Re: [R] Clustering Large Applications..sort of

2011-08-10 Thread Christian Hennig
There is a number of methods in the literature to decide the number of clusters for k-means. Probably the most popular one is the Calinski and Harabasz index, implemented as calinhara in package fpc. A distance based version (and several other indexes to do this) is in function cluster.stats in

Re: [R] Clustering Large Applications..sort of

2011-08-10 Thread Peter Langfelder
On Wed, Aug 10, 2011 at 12:07 PM, Ken Hutchison wrote: > Hello all, >   I am using the clustering functions in R in order to work with large > masses of binary time series data, however the clustering functions do not > seem able to fit this size of practical problem. Library 'hclust' is good > (t

Re: [R] Clustering Large Applications..sort of

2011-08-10 Thread Thomas Lumley
Try the flow cytometry clustering functions in Bioconductor. -thomas On Thu, Aug 11, 2011 at 7:07 AM, Ken Hutchison wrote: > Hello all, >   I am using the clustering functions in R in order to work with large > masses of binary time series data, however the clustering functions do not > see

[R] Clustering Large Applications..sort of

2011-08-10 Thread Ken Hutchison
Hello all, I am using the clustering functions in R in order to work with large masses of binary time series data, however the clustering functions do not seem able to fit this size of practical problem. Library 'hclust' is good (though it may be sub par for this size of problem, thus doubly poo