giov, It sounds like you have approximately symmetric distributions. If that is so, and particularly if the standard deviation is less than about 20% of the mean, I'll stick my neck out and say I would assume underlying normality for outlier testing purposes unless there's a reason to do otherwise (eg if you're testing variances, normality would _not_ be a good assumption!).
The reason I'd do that is that is that it should not make a big difference to the outcome with near-symmetric distributions. If it does, your 'outliers' are borderline anyway. Similarly, although folk can get quite exercised over which test to use and what significance level to choose, the test you use isn't very important either, as long as the intention is just to screen data to make sure the most influential/extreme points are not mistakes. Given that, you can use any of the tests in library(outliers). You can also use boxplot.stats, and look at the $out list, like y<-c(rnorm(15,10), 25.1) #25.1 should be an outlier (bxs<-boxplot.stats(y)) #and locate the outliers in y: which(y %in% bxs$out) Another useful approach is to use robust estimates of mean and dispersion, like hubers() in the MASS package, and then calculate simple scores, with a z-like cutoff to identify outliers: require(MASS) hy<-hubers(y) hscore<-(y-hy$mu)/hy$s which(abs(hscore)>3) Using the 'mad' or iqr options in outliers::scores will be broadly similar in outcome. Most of the modelling tools in R also offer useful diagnostics for 'odd' points. I find examining the residuals from rlm in MASS particularly useful if you're seeking outliers in a regression context. A more important question is what you will do if you find any outliers. Outliers are just unusual compared to some expectation, not automatically 'wrong'. Screening data for anomalies is good practice; checking them to make sure they aren't mistakes is to be encouraged; correcting mistakes if you find them is a no-brainer. But throwing outliers away is something to think about very carefully, and on a case-by-case basis. Sometimes, outliers are a genuine feature of the process under study, or even the 'interesting' parts of the data. It's generally unsafe to throw them out without good reason. Steve E PS: Contrary to my earlier confident assertion of the non-existence of nonparametric outlier tests, Barnett and Lewis DOES include some general suggestions on 'nonparametric' outlier testing. But it also includes the note that this "... smacks of throwing out the bathwater before the baby has even been immersed". I guess they don't think much of the idea either. >>> giov <[EMAIL PROTECTED]> 13/08/2008 15:21:25 >>> Thank you so much, I have not much experience on outliers =), I thought that there were nonparametric distribution-free outliers test =(. What is the most general distribution I can use? I did histogram of my data set and sometimes normal distribution seems to occur, sometimes an uniform distribution seems to occur. So, I cannot understand what distribution I can use for my whole data set.... ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.