I'm running a cluster analysis with many observations (approx. 7,000) using 
both continuous and categorical variables. PAM is a theoretically appealing 
approach however I believe the number of observations makes its use untenable. 
CLARA, which uses the PAM algorithm seems like the algorithm to use however it 
requires a numeric data matrix or data frame with rows corresponding to cases 
and columns to variables. 

Since a dissimilarity matrix is not legitimate input (to CLARA) and since a 
data matrix with categorical variables is also inappropriate, it seems that 
CLARA may only be run on numeric data. If thats true, I'm wondering what the 
benefit is in using the PAM algorithm (a generalization of K-means which, in 
part, addresses inclusion of categorical variables). 
My guess is I'm missing something, any insight would be appreciated.
Many thanks,
Joe Retzer

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to