[R] Remove error data and clustering analysis

wanggd1983 Fri, 27 Mar 2009 03:35:56 -0700

Hi, all,
I'd like to do the clustering analysis in my dataset. The example data are as 
follows:
 
Dataset 1:
500, 490, 486, 490, 491, 493, 480, 461, 504, 476, 434, 500, 470, 495, 3116, 
3142, 12836, 3062, 3091, 3141, 3177, 3150, 3114, 3149;
Dataset 2:
506, 473, 495, 494, 434, 459, 445, 475, 476, 128367, 470, 513, 466, 476,482, 
1201, 469, 502;
 
I had so many datasets like that. Basically, every dataset can classify one or 
two clusters (no more than 2), meanwhile, there have error data points, for 
example, 12836 is error data point in Dataset 1; and 128367, 1201 is error data 
points in dataset2.
 
The clustered data is following the normal distribution, the standard deviation 
was known. Thats mean the one cluster is following the normal distribution 
when the dataset classified one cluster like dataset2; the two clusters are 
following the normal distribution respectively when the dataset classified two 
clusters like dataset1. Error data are far away of the mean.
 
    I am wondering is there any mathematic pipeline/function can do the 
analysis that removing error data, and clustering the dataset in 1 or 2 
clusters?


    Thank you for your reply.

2009-03-27 



wanggd1983 

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Remove error data and clustering analysis

Reply via email to