This is a sure way to get a biased variance estimate.
Instead, use a robust dispersion (scale) estimator such as Gini's mean difference (average absolute difference between any two observations). The median is a robust location estimator. There are others. If your ultimate goal is a comparison you can use a robust nonparametric test.
You'll find that the word 'outlier' is hard to define so it's best left undefined and unused.
Frank Mao Jianfeng wrote:
Dear R-helpers, Very small amount of outliers can greatly affect the mean and many other statistic of a numeric variable. So, usually we must deal with the outliers properly in the process of data analysis. Here, I want to replace outliers with the group median of the variable. But, I can not construct a good way to do that efficiently, because of I am a newbie to R and programming. Can anybody share any R script to do that? I think that is also valuable to so many others who is doing numerical data analysis. Here is a dummy dataframe with a group variable (three levels) and a numeric one. I just want to know how to replace outliers by group median. population conlen3 YXPy01 8.6 YXPy01 8.1 YXPy01 7.6 YXPy01 7.6 YXPy01 23 YXPy01 7.6 YXPy01 7.6 BSPy01 7.5 BSPy01 6.4 BSPy01 5.4 BSPy01 15 BSPy01 6.6 BSPy01 5.5 YLPy01 5.4 YLPy01 5.4 YLPy01 5.6 YLPy01 21 YLPy01 5.4 YLPy01 5.4 YLPy01 5.4 YLPy01 4.9 Thank you a lot in advance. Best regards, Mao J-F [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.