Re: [R] kmeans function

Prof Brian Ripley Wed, 26 Mar 2014 23:47:27 -0700

On 26/03/2014 20:01, Tomassini, Letizia wrote:

I would like to understand why the fastclus procedure in SAS is affected by the 
initial order of the data. So, with the same dataset, but sorted in a different 
way, I get different clusters rearrangements. I find this really disturbing. R 
seems to find the stable solution with the use of nstart=100 but I do not know 
how R does this and I do not know how to replicate this in SAS. All I know so 
far is that proc fastclus uses k-means as well.
Regarding R, for example, does the R software have a way of choosing always the 
same starting seeds? Does it reorganize the dataset according to an internal 
way of sorting the data before running kmeans?
I am interested in finding clusters with the best global minima and extract the 
seeds out of those. I need those seeds for following clustering number 
solutions (for example decide for lower number of clusters and use specific 
seeds). Overall I am better at using SAS, and I am trying to learn this piece 
of clustering design information from R to implement that in SAS.



Please let me know if you can help

We (unlike SAS) provide you with source code, which is the definitivedocumentation. Please read it: it answers all your questions. (Eventhose who contributed to the implementation of kmeans would need to doso to refresh their memories.)

As for why a SAS algorithm works the way it does: given the fees someoneis paying SAS on your behalf they should be willing to explain.


Letizia



________________________________________
Da: r-help-boun...@r-project.org [r-help-boun...@r-project.org] per conto di 
Ranjan Maitra [maitra.mbox.igno...@inbox.com]
Inviato: mercoledì 26 marzo 2014 12.48
A: r-h...@stat.math.ethz.ch
Oggetto: Re: [R] kmeans function

On Wed, 26 Mar 2014 18:35:34 +0000 "Tomassini, Letizia"
<tomass...@vetmed.wsu.edu> wrote:


Hello
I need to ask questions about the k-means clustering function. Mainly I would 
like to know why, with the use of nstart=enough number of times, kmeans always 
finds the same clustering arrangements; and this happens even when the input 
dataset is sorted in different ways or I take out few observations. I cannot 
seem to be able to recreate that when using SAS.


Do you understand what kmeans does? Why would you expect otherwise?
Besides, why does the function ahve to match SAS's output? (Do you
know how it goes about initializing the function in SAS?) In any
case, should it not be that it should provide the correct (best global
minima, if possible) answer?

Ranjan



--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kmeans function

Reply via email to