Hi,

With  a 9000 observations dataset, I have noticed a significant variability
in the silhouette index when I change the default value for samples (5
default value) and sampsize (40+2*clusters number) in CLARA.

Is there somes rules according to the number of cluster and observations to
fix samples and sampsize parameters efficiently, so as to avoid under- and
oversampling with CLARA in one hand and keeping a good time running in other
hand ?

I didn't not find any rules of this type on the web (except avoiding biaised
samples...).

Gratefully yours.
vincent

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to