Have your tried something like the following? > # put some data to cluster in a data.frame > d <- data.frame(x1=log(1:50), x2=sqrt(1:50), x3=1/(1:50)) > # put NA's in rows 1 and 3 > d[1,1] <- d[3,3] <- NA > # cluster the non-NA rows > tmp <- kmeans(na.omit(d), 3) # 3 clusters > # add cluster id vector to original dataset, aligned properly > d$cluster <- rep(NA, nrow(d)) > d[names(tmp$cluster), "cluster"] <- tmp$cluster > head(d) x1 x2 x3 cluster 1 NA 1.000000 1.0000000 NA 2 0.6931472 1.414214 0.5000000 3 3 1.0986123 1.732051 NA NA 4 1.3862944 2.000000 0.2500000 3 5 1.6094379 2.236068 0.2000000 3 6 1.7917595 2.449490 0.1666667 3 > tail(d) x1 x2 x3 cluster 45 3.806662 6.708204 0.02222222 1 46 3.828641 6.782330 0.02173913 1 47 3.850148 6.855655 0.02127660 1 48 3.871201 6.928203 0.02083333 1 49 3.891820 7.000000 0.02040816 1 50 3.912023 7.071068 0.02000000 1
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of raji sankaran > Sent: Wednesday, December 15, 2010 7:43 PM > To: Jannis > Cc: r-help@r-project.org > Subject: Re: [R] Reg : null values in kmeans > > Hi Jannis, > > Thank you for answering my question. I saw the option > called na.omit when > i used nnet() and tried to classify Iris data with that. I > wanted to know if > there is a similar option available in kmeans which can omit > or in some way > consider the null/NA values and cluster the > observations.Currently, kmeans > throws an error for the dataset with NULL/NA values. > > >From your answer, i could understand that, the option of > handling NULL/NA is > not available with kmeans. Please correct me if am wrong. > > Thanks again :) > > On Wed, Dec 15, 2010 at 6:50 PM, Jannis <bt_jan...@yahoo.de> wrote: > > > I do not really understand your question. You can use use kmeans but > > without the observations that include the NA values (e.g. > by deleting whole > > rows in your observation matrix). If you want to keep the > information in the > > valid observations of those rows, I fear you need to look > for a clustering > > algorithm that can handle missing values. I doubt that > there is a kmeans > > version that can. Think about inserting means of all other > observations into > > the gaps, though this introduces bias as well. > > > > > > Jannis > > > > Raji schrieb: > > > > Hi, > >> > >> I am using k means algorithm for clustering.My data contains a few > >> null/NA > >> values.kmeans doesnt cluster with those values.Are there > any option like > >> na.omit which can avoid these null values and cluster the remaining > >> values? > >> > >> Thanks, > >> Raji > >> > >> > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.