Hi,

no, don't use kmeans with factors.

The kmeans algorithm does, besides other things, calculate the mean of the k
clusters.
But you don't get a useful mean from factors, because the internally used
integers are arbitrary. In this case its 1,2 and 3. But it could be 42, 7
and 100000 as well, which would change any calculation of a mean.
Thats why the kmeans() function wants numeric matrices.
Maybe you should think about how kmeans works:
http://en.wikipedia.org/wiki/K-means_clustering

Christoph

2011/10/16 raji sankaran <raji.sanka...@gmail.com>

> Hi,
>
>  Thank you .. The information was very helpful.
>
> Yes.It was meant to be centers=3.Even with that , kmeans gives error if we
> give the index of Species columns.
>
> So, *is it ok to use kmeans for String data by using cbind*.But, kmeans*works 
> even if we give a column which contains distinct String values
> *.
> For example,a column which contains names like country names.How does this
> work in such cases? Is it expected behavior?
>
> Country
> -----------
> England
> Germany
> China
>
> Thanks,
> Raji
>
>
> On Mon, Oct 17, 2011 at 1:02 AM, Christoph Molnar <
> christoph.mol...@googlemail.com> wrote:
>
>> Hi,
>>
>> I suspect your column Species is of class "factor" (as it is in R's built
>> in iris dataset).
>> This means that in your case Species is an integer vector with the
>> additional information of the levels names. kmeans is internally calling
>> as.matrix(), which creates a character matrix of your dataframe, because one
>> column is factor and you get an error.
>>
>> After binding the columns with cbind, the result is an integer matrix with
>> the Species columns as the internal levels (1,2 and 3 instead of "setosa"
>> "versicolor" "virginica" ) and kmeans is not throwing a error any more.
>>
>> Furthermore kmeans wouldn't work in the first case, because there is no
>> "size=" - argument in kmeans. You probably meant centers=3.
>> For additional information try ?kmeans
>>
>> Christoph
>>
>>
>> 2011/10/16 Raji <raji.sanka...@gmail.com>
>>
>>> Hi All,
>>>
>>>  For executing kmeans for Iris, we found that there were 2 different
>>> ways.
>>>
>>> dataFrame <- read.csv("c:/Iris.csv",header=T)
>>>
>>> 1. kmeans_model<-kmeans(dataFrame[1:5],size=3)
>>>   *This gave an error as it had Species which is a String column as one
>>> of
>>> the inputs*
>>>
>>> 2.attach(dataFrame)
>>>
>>>
>>> kmeans_model<-kmeans(cbind(SepalLength,SepalWidth,PetalLength,PetalWidth,Species),3)
>>>
>>> * But this command worked and gave output.*
>>>
>>> Does this mean that kmeans can accept String inputs also?
>>>
>>> Can you please let me know how the second command works?
>>>
>>> Thanks in advance.
>>>
>>> Regards,
>>> Raji
>>>
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/Help-in-kmeans-tp3430433p3909552.html
>>>
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to