On 26/03/2014 20:01, Tomassini, Letizia wrote:
I would like to understand why the fastclus procedure in SAS is affected by the
initial order of the data. So, with the same dataset, but sorted in a different
way, I get different clusters rearrangements. I find this really disturbing. R
seems to find the stable solution with the use of nstart=100 but I do not know
how R does this and I do not know how to replicate this in SAS. All I know so
far is that proc fastclus uses k-means as well.
Regarding R, for example, does the R software have a way of choosing always the
same starting seeds? Does it reorganize the dataset according to an internal
way of sorting the data before running kmeans?
I am interested in finding clusters with the best global minima and extract the
seeds out of those. I need those seeds for following clustering number
solutions (for example decide for lower number of clusters and use specific
seeds). Overall I am better at using SAS, and I am trying to learn this piece
of clustering design information from R to implement that in SAS.
Please let me know if you can help
We (unlike SAS) provide you with source code, which is the definitive
documentation. Please read it: it answers all your questions. (Even
those who contributed to the implementation of kmeans would need to do
so to refresh their memories.)
As for why a SAS algorithm works the way it does: given the fees someone
is paying SAS on your behalf they should be willing to explain.
Letizia
________________________________________
Da: r-help-boun...@r-project.org [r-help-boun...@r-project.org] per conto di
Ranjan Maitra [maitra.mbox.igno...@inbox.com]
Inviato: mercoledì 26 marzo 2014 12.48
A: r-h...@stat.math.ethz.ch
Oggetto: Re: [R] kmeans function
On Wed, 26 Mar 2014 18:35:34 +0000 "Tomassini, Letizia"
<tomass...@vetmed.wsu.edu> wrote:
Hello
I need to ask questions about the k-means clustering function. Mainly I would
like to know why, with the use of nstart=enough number of times, kmeans always
finds the same clustering arrangements; and this happens even when the input
dataset is sorted in different ways or I take out few observations. I cannot
seem to be able to recreate that when using SAS.
Do you understand what kmeans does? Why would you expect otherwise?
Besides, why does the function ahve to match SAS's output? (Do you
know how it goes about initializing the function in SAS?) In any
case, should it not be that it should provide the correct (best global
minima, if possible) answer?
Ranjan
--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.