Re: [R] clustering on scaled dataset or not?

Claudia Beleites Thu, 28 Oct 2010 14:25:03 -0700

John,

Hi, just a general question: when we do hierarchical clustering, should we
compute the dissimilarity matrix based on scaled dataset or non-scaled dataset?

daisy() in cluster package allow standardizing the variables before calculating
dissimilarity matrix;


I'd say that should depend on your data.

- if your data is all (physically) different kinds of things (and thusdifferent orders of magnitude), then you should probably scale.

- On the other hand, I cluster spectra. Thus my variates are all thesame unit, and moreover I'd be afraid that scaling would blow upnoise-only variates (i.e. the spectra do have low or no intensityregions), thus I usually don't scale.

- It also depends on your distance. E.g. Mahalanobis should do thescaling by itself, if think correctly at this time of the day...

What I do frequently, though, is subtracting something like the minimumspectrum (in practice, I calculate the 5th percentile for each variate -it's less noisy). You can also center, but I'm strongly for having aphysical meaning, and for my samples that's the minimum spectrum isbetter interpretable (it represents the matrix composition).

but dist() doesn't have that option at all. Appreciate if
you can share your thoughts?

but you could call scale () and then dist ().

Claudia


Thanks

John




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] clustering on scaled dataset or not?

Reply via email to