On Thursday 30 October 2008, Maura E Monville wrote:
> I have a pretty big similarity matrix (2870x2870). I will produce even
> bigger ones soon.
> I am using PAM to generate clusters.
> The desired number of output clusters is a PAM input parameter.
> I do not know  a-priopri what is the best clusters layout .
> I resorted to the silhouette test. It takes forever as I have to run PAM
> with all possible
> numbers of clusters.
> I wonder whether there is some faster method, either a s/w code or some
> theoretical guidelines,
> to get the optimum clusters number.
>
> Thank you very much,

This is a very general topic in the field of multivariate analysis. There 
really isn't any way to know the 'correct' number of clusters, however there 
are several metrics that can give you an indication of how messy your data 
are. 

For information on the methods in the cluster package, see this book:

Kaufman, L. & Rousseeuw, P. J. Finding Groups in Data An Introduction to 
Cluster Analysis Wiley-Interscience, 2005

Otherwise, consider a book on multivariate analysis. Alternatively, try a 
hierarchical clustering approach, and look for meaningful groupings. Some 
thing like this:

d <- diana(daisy(your_data_matrix))
d.hc <- as.hclust(d)

d.hc$labels <- your_data_matrix$id

plot(d.hc)

Cheers,

Dylan


-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to