Re: [R] k-means: should columns in dataset be in same scale?

Prof Brian Ripley Tue, 22 Apr 2008 22:48:17 -0700

k-means uses Euclidean distance, so scaling of the variables does matter.
Whether you want to standardize depends on the example (as it does in most 
multivariate analysis problems, e.g. PCA has the same issues).

On Tue, 22 Apr 2008, Johan Jackson wrote:

> Hi all,
>
> Simple question re k-means. If I have a data set with columns that are on
> different scales (say col 1 has var=100 and col2 var=2), will this make a
> difference to the k-means algorithm? It seems as though it does. If so,
> should we first standardize the columns of the dataset so that each column
> is given equal weight?
>
> JJ

-- 
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] k-means: should columns in dataset be in same scale?

Reply via email to