Hi, no, don't use kmeans with factors.
The kmeans algorithm does, besides other things, calculate the mean of the k clusters. But you don't get a useful mean from factors, because the internally used integers are arbitrary. In this case its 1,2 and 3. But it could be 42, 7 and 100000 as well, which would change any calculation of a mean. Thats why the kmeans() function wants numeric matrices. Maybe you should think about how kmeans works: http://en.wikipedia.org/wiki/K-means_clustering Christoph 2011/10/16 raji sankaran <raji.sanka...@gmail.com> > Hi, > > Thank you .. The information was very helpful. > > Yes.It was meant to be centers=3.Even with that , kmeans gives error if we > give the index of Species columns. > > So, *is it ok to use kmeans for String data by using cbind*.But, kmeans*works > even if we give a column which contains distinct String values > *. > For example,a column which contains names like country names.How does this > work in such cases? Is it expected behavior? > > Country > ----------- > England > Germany > China > > Thanks, > Raji > > > On Mon, Oct 17, 2011 at 1:02 AM, Christoph Molnar < > christoph.mol...@googlemail.com> wrote: > >> Hi, >> >> I suspect your column Species is of class "factor" (as it is in R's built >> in iris dataset). >> This means that in your case Species is an integer vector with the >> additional information of the levels names. kmeans is internally calling >> as.matrix(), which creates a character matrix of your dataframe, because one >> column is factor and you get an error. >> >> After binding the columns with cbind, the result is an integer matrix with >> the Species columns as the internal levels (1,2 and 3 instead of "setosa" >> "versicolor" "virginica" ) and kmeans is not throwing a error any more. >> >> Furthermore kmeans wouldn't work in the first case, because there is no >> "size=" - argument in kmeans. You probably meant centers=3. >> For additional information try ?kmeans >> >> Christoph >> >> >> 2011/10/16 Raji <raji.sanka...@gmail.com> >> >>> Hi All, >>> >>> For executing kmeans for Iris, we found that there were 2 different >>> ways. >>> >>> dataFrame <- read.csv("c:/Iris.csv",header=T) >>> >>> 1. kmeans_model<-kmeans(dataFrame[1:5],size=3) >>> *This gave an error as it had Species which is a String column as one >>> of >>> the inputs* >>> >>> 2.attach(dataFrame) >>> >>> >>> kmeans_model<-kmeans(cbind(SepalLength,SepalWidth,PetalLength,PetalWidth,Species),3) >>> >>> * But this command worked and gave output.* >>> >>> Does this mean that kmeans can accept String inputs also? >>> >>> Can you please let me know how the second command works? >>> >>> Thanks in advance. >>> >>> Regards, >>> Raji >>> >>> -- >>> View this message in context: >>> http://r.789695.n4.nabble.com/Help-in-kmeans-tp3430433p3909552.html >>> >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.