Hi,
no, don't use kmeans with factors.
The kmeans algorithm does, besides other things, calculate the mean of the k
clusters.
But you don't get a useful mean from factors, because the internally used
integers are arbitrary. In this case its 1,2 and 3. But it could be 42, 7
and 10 as well, whi
Hi,
Thank you .. The information was very helpful.
Yes.It was meant to be centers=3.Even with that , kmeans gives error if we
give the index of Species columns.
So, *is it ok to use kmeans for String data by using cbind*.But,
kmeans*works even if we give a column which contains distinct String
Hi,
I suspect your column Species is of class "factor" (as it is in R's built in
iris dataset).
This means that in your case Species is an integer vector with the
additional information of the levels names. kmeans is internally calling
as.matrix(), which creates a character matrix of your datafram
Hi All,
For executing kmeans for Iris, we found that there were 2 different ways.
dataFrame <- read.csv("c:/Iris.csv",header=T)
1. kmeans_model<-kmeans(dataFrame[1:5],size=3)
*This gave an error as it had Species which is a String column as one of
the inputs*
2.attach(dataFrame)
kmeans_mo
Hi,
I have herewith attached the results of the 2 commands.
> *set.seed(1234)
>
kmeans_model<-kmeans((SepalLength+SepalWidth+PetalLength+PetalWidth),centers=3)
> kmeans_model$cluster
* [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 2 3
Hi,
Thanks for the information.But , i am already using set.seed().My problem
is that, when i use column names instead of column indices, the result seems
to be less accurate consistently.Hence, we wanted to understand how kmeans
differentiates between column names and column indices. Is there a
I'm not going to comment on column names, but this is just to make you
aware that the results of k-means depend on random initialisation.
This means that it is possible that you get different results if you run
it several times. It basically gives you a local optimum and there may be
more than
Hi All,
I was using the following command for performing kmeans for Iris dataset.
Kmeans_model<-kmeans(dataFrame[,c(1,2,3,4)],centers=3)
This was giving proper results for me. But, in my application we generate
the R commands dynamically and there was a requirement that the column names
will b
8 matches
Mail list logo