Hi, With a 9000 observations dataset, I have noticed a significant variability in the silhouette index when I change the default value for samples (5 default value) and sampsize (40+2*clusters number) in CLARA.
Is there somes rules according to the number of cluster and observations to fix samples and sampsize parameters efficiently, so as to avoid under- and oversampling with CLARA in one hand and keeping a good time running in other hand ? I didn't not find any rules of this type on the web (except avoiding biaised samples...). Gratefully yours. vincent [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.