Hi,

i am hoping you can help me with my problem. I am trying to detect outliers
with use of the kmeans algorithm. First I perform the algorithm and choose
those object as possible outliers which have a big distance to their cluster
center. Instead of using the absolute distance I want to use the relative
distance, i.e. the ration of absolute distance of the object to the cluster
center and the average distance of all objects of the cluster to their
cluster center. The code for outlier detection based on absolute distance is
the following:

> # remove species from the data to cluster
> iris2 <- iris[,1:4]
> kmeans.result <- kmeans(iris2, centers=3)
> # cluster centers
> kmeans.result$centers
> # calculate distances between objects and cluster centers
> centers <- kmeans.result$centers[kmeans.result$cluster, ]
> distances <- sqrt(rowSums((iris2 - centers)^2))
> # pick top 5 largest distances
> outliers <- order(distances, decreasing=T)[1:5]
> # who are outliers
> print(outliers)

But how can I use the relative instead of the absolute distance to find
outliers?
Thanks in advance.

Mario



--
View this message in context: 
http://r.789695.n4.nabble.com/Outlier-Detection-with-k-Means-tp4690098.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to