I have tried to read the source code but since I am not a computer engineer nor a computer programmer I was not able to fully understand it. I wonder if I should look for somebody here on campus (Washington State University) who may be able to read it for me. In any case, I think that David Carlson did a nice job explaining what the R function is doing. I can ask to SAS help as well for the SAS part. That is good idea!
________________________________________ Da: Prof Brian Ripley [[email protected]] Inviato: mercoledì 26 marzo 2014 23.45 A: Tomassini, Letizia; [email protected] Oggetto: Re: [R] kmeans function On 26/03/2014 20:01, Tomassini, Letizia wrote: > I would like to understand why the fastclus procedure in SAS is affected by > the initial order of the data. So, with the same dataset, but sorted in a > different way, I get different clusters rearrangements. I find this really > disturbing. R seems to find the stable solution with the use of nstart=100 > but I do not know how R does this and I do not know how to replicate this in > SAS. All I know so far is that proc fastclus uses k-means as well. > Regarding R, for example, does the R software have a way of choosing always > the same starting seeds? Does it reorganize the dataset according to an > internal way of sorting the data before running kmeans? > I am interested in finding clusters with the best global minima and extract > the seeds out of those. I need those seeds for following clustering number > solutions (for example decide for lower number of clusters and use specific > seeds). Overall I am better at using SAS, and I am trying to learn this piece > of clustering design information from R to implement that in SAS. > > > Please let me know if you can help We (unlike SAS) provide you with source code, which is the definitive documentation. Please read it: it answers all your questions. (Even those who contributed to the implementation of kmeans would need to do so to refresh their memories.) As for why a SAS algorithm works the way it does: given the fees someone is paying SAS on your behalf they should be willing to explain. > > Letizia > > > > ________________________________________ > Da: [email protected] [[email protected]] per conto di > Ranjan Maitra [[email protected]] > Inviato: mercoledì 26 marzo 2014 12.48 > A: [email protected] > Oggetto: Re: [R] kmeans function > > On Wed, 26 Mar 2014 18:35:34 +0000 "Tomassini, Letizia" > <[email protected]> wrote: > >> >> Hello >> I need to ask questions about the k-means clustering function. Mainly I >> would like to know why, with the use of nstart=enough number of times, >> kmeans always finds the same clustering arrangements; and this happens even >> when the input dataset is sorted in different ways or I take out few >> observations. I cannot seem to be able to recreate that when using SAS. > > Do you understand what kmeans does? Why would you expect otherwise? > Besides, why does the function ahve to match SAS's output? (Do you > know how it goes about initializing the function in SAS?) In any > case, should it not be that it should provide the correct (best global > minima, if possible) answer? > > Ranjan -- Brian D. Ripley, [email protected] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

