Steve, thank you so much for very very useful helps and comments!!!!! for my research I used a surrogate approach (number of surrogates=100) and I made a comparison among index values (very similar to a correlation index computed with a second known signal ) from my original data and from surrogates data , respectively. The idea is to find that the index value from the original data is very "different" (in statistical sense)from those surrogates. To evaluate this difference I thought that the best thing to do it was the evaluation if the index from original data can be considered an outlier value in comparison with the . Is it a correct approach?
thank you!!!!!! giov S Ellison wrote: > > giov, > > It sounds like you have approximately symmetric distributions. If that > is so, and particularly if the standard deviation is less than about 20% > of the mean, I'll stick my neck out and say I would assume underlying > normality for outlier testing purposes unless there's a reason to do > otherwise (eg if you're testing variances, normality would _not_ be a > good assumption!). > > The reason I'd do that is that is that it should not make a big > difference to the outcome with near-symmetric distributions. If it does, > your 'outliers' are borderline anyway. > Similarly, although folk can get quite exercised over which test to use > and what significance level to choose, the test you use isn't very > important either, as long as the intention is just to screen data to > make sure the most influential/extreme points are not mistakes. > > Given that, you can use any of the tests in library(outliers). You can > also use boxplot.stats, and look at the $out list, like > > y<-c(rnorm(15,10), 25.1) #25.1 should be an outlier > (bxs<-boxplot.stats(y)) > > #and locate the outliers in y: > which(y %in% bxs$out) > > Another useful approach is to use robust estimates of mean and > dispersion, like hubers() in the MASS package, and then calculate simple > scores, with a z-like cutoff to identify outliers: > > require(MASS) > hy<-hubers(y) > hscore<-(y-hy$mu)/hy$s > which(abs(hscore)>3) > > Using the 'mad' or iqr options in outliers::scores will be broadly > similar in outcome. > > Most of the modelling tools in R also offer useful diagnostics for > 'odd' points. I find examining the residuals from rlm in MASS > particularly useful if you're seeking outliers in a regression context. > > A more important question is what you will do if you find any outliers. > Outliers are just unusual compared to some expectation, not > automatically 'wrong'. Screening data for anomalies is good practice; > checking them to make sure they aren't mistakes is to be encouraged; > correcting mistakes if you find them is a no-brainer. But throwing > outliers away is something to think about very carefully, and on a > case-by-case basis. Sometimes, outliers are a genuine feature of the > process under study, or even the 'interesting' parts of the data. It's > generally unsafe to throw them out without good reason. > > Steve E > > > PS: Contrary to my earlier confident assertion of the non-existence of > nonparametric outlier tests, Barnett and Lewis DOES include some general > suggestions on 'nonparametric' outlier testing. But it also includes the > note that this "... smacks of throwing out the bathwater before the baby > has even been immersed". I guess they don't think much of the idea > either. > >>>> giov <[EMAIL PROTECTED]> 13/08/2008 15:21:25 >>> > > Thank you so much, I have not much experience on outliers =), I thought > that > there were nonparametric distribution-free outliers test =(. What is > the > most general distribution I can use? I did histogram of my data set > and > sometimes normal distribution seems to occur, sometimes an uniform > distribution seems to occur. So, I cannot understand what distribution > I can > use for my whole data set.... > > > > > ******************************************************************* > This email and any attachments are confidential. Any use...{{dropped:8}} > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/dixon-test-tp18940260p18980152.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.