Hi Raji,
I am quite sure that kmeans in general is not able to handle missing
values so most probably there wont be an option for this in R. Either
you omit the observations with NAs as William proposed or you search for
some algorithm that can handle missing values (not sure whether there is
any). Other alternatives would be to put mean values in the NA places.
This, however, biases the results.
HTH
Jannis
raji sankaran schrieb:
Hi Jannis,
Thank you for answering my question. I saw the option called na.omit when
i used nnet() and tried to classify Iris data with that. I wanted to know if
there is a similar option available in kmeans which can omit or in some way
consider the null/NA values and cluster the observations.Currently, kmeans
throws an error for the dataset with NULL/NA values.
>From your answer, i could understand that, the option of handling NULL/NA is
not available with kmeans. Please correct me if am wrong.
Thanks again :)
On Wed, Dec 15, 2010 at 6:50 PM, Jannis <bt_jan...@yahoo.de> wrote:
I do not really understand your question. You can use use kmeans but
without the observations that include the NA values (e.g. by deleting whole
rows in your observation matrix). If you want to keep the information in the
valid observations of those rows, I fear you need to look for a clustering
algorithm that can handle missing values. I doubt that there is a kmeans
version that can. Think about inserting means of all other observations into
the gaps, though this introduces bias as well.
Jannis
Raji schrieb:
Hi,
I am using k means algorithm for clustering.My data contains a few
null/NA
values.kmeans doesnt cluster with those values.Are there any option like
na.omit which can avoid these null values and cluster the remaining
values?
Thanks,
Raji
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.