To add to Ranjan's reply, k-means can potentially find different results with large nstart= numbers in a large data set. But you are correct, with a large enough value, the results will be the same unless there are two solutions that have exactly the same between sum of squares (unlikely but not impossible). However, removing observations could easily change the results although it may not in your data. If you are comparing to SAS PROC FASTCLUS, the answer is that FASTCLUS does not appear to support multiple starts. You would have to run FASTCLUS nstart times and choose the result with the maximum between sum of squares to match the results in R.
------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ranjan Maitra Sent: Wednesday, March 26, 2014 2:48 PM To: r-h...@stat.math.ethz.ch Subject: Re: [R] kmeans function On Wed, 26 Mar 2014 18:35:34 +0000 "Tomassini, Letizia" <tomass...@vetmed.wsu.edu> wrote: > > Hello > I need to ask questions about the k-means clustering function. Mainly I would like to know why, with the use of nstart=enough number of times, kmeans always finds the same clustering arrangements; and this happens even when the input dataset is sorted in different ways or I take out few observations. I cannot seem to be able to recreate that when using SAS. Do you understand what kmeans does? Why would you expect otherwise? Besides, why does the function ahve to match SAS's output? (Do you know how it goes about initializing the function in SAS?) In any case, should it not be that it should provide the correct (best global minima, if possible) answer? Ranjan ____________________________________________________________ FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.