Re: [R] kmeans function

Tomassini, Letizia Thu, 27 Mar 2014 09:21:38 -0700

I have tried to read the source code but since I am not a computer engineer nor 
a computer programmer I was not able to fully understand it. I wonder if I 
should look for somebody here on campus (Washington State University) who may 
be able to read it for me. In any case, I think that David Carlson did a nice 
job explaining what the R function is doing.
I can ask to SAS help as well for the SAS part. That is good idea!


________________________________________
Da: Prof Brian Ripley [[email protected]]
Inviato: mercoledì 26 marzo 2014 23.45
A: Tomassini, Letizia; [email protected]
Oggetto: Re: [R] kmeans function

On 26/03/2014 20:01, Tomassini, Letizia wrote:
> I would like to understand why the fastclus procedure in SAS is affected by 
> the initial order of the data. So, with the same dataset, but sorted in a 
> different way, I get different clusters rearrangements. I find this really 
> disturbing. R seems to find the stable solution with the use of nstart=100 
> but I do not know how R does this and I do not know how to replicate this in 
> SAS. All I know so far is that proc fastclus uses k-means as well.
> Regarding R, for example, does the R software have a way of choosing always 
> the same starting seeds? Does it reorganize the dataset according to an 
> internal way of sorting the data before running kmeans?
> I am interested in finding clusters with the best global minima and extract 
> the seeds out of those. I need those seeds for following clustering number 
> solutions (for example decide for lower number of clusters and use specific 
> seeds). Overall I am better at using SAS, and I am trying to learn this piece 
> of clustering design information from R to implement that in SAS.
>
>
> Please let me know if you can help

We (unlike SAS) provide you with source code, which is the definitive
documentation.  Please read it: it answers all your questions.  (Even
those who contributed to the implementation of kmeans would need to do
so to refresh their memories.)

As for why a SAS algorithm works the way it does: given the fees someone
is paying SAS on your behalf they should be willing to explain.

>
> Letizia
>
>
>
> ________________________________________
> Da: [email protected] [[email protected]] per conto di 
> Ranjan Maitra [[email protected]]
> Inviato: mercoledì 26 marzo 2014 12.48
> A: [email protected]
> Oggetto: Re: [R] kmeans function
>
> On Wed, 26 Mar 2014 18:35:34 +0000 "Tomassini, Letizia"
> <[email protected]> wrote:
>
>>
>> Hello
>> I need to ask questions about the k-means clustering function. Mainly I 
>> would like to know why, with the use of nstart=enough number of times, 
>> kmeans always finds the same clustering arrangements; and this happens even 
>> when the input dataset is sorted in different ways or I take out few 
>> observations. I cannot seem to be able to recreate that when using SAS.
>
> Do you understand what kmeans does? Why would you expect otherwise?
> Besides, why does the function ahve to match SAS's output? (Do you
> know how it goes about initializing the function in SAS?) In any
> case, should it not be that it should provide the correct (best global
> minima, if possible) answer?
>
> Ranjan


--
Brian D. Ripley,                  [email protected]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kmeans function

Reply via email to