Hi Bert, Another confrontational response from you...
You might have noticed that I use the word "outlier" carefully in this post and only in relation to the plotted ellipses. I do not know the underlying algorithm of geom_density_2d() and therefore I am having an issue of how to interpret the plot. I was hoping someone here knows that and can help me. Ana On Fri, Oct 9, 2020 at 11:31 AM Bert Gunter <bgunter.4...@gmail.com> wrote: > > I recommend that you consult with a local statistical expert. Much of what > you say (outliers?!?) seems to make little sense, and your statistical > knowledge seems minimal. Perhaps more to the point, none of your questions > can be properly answered without subject matter context, which this list is > not designed to provide. That's why I believe you need local expertise. > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Fri, Oct 9, 2020 at 8:25 AM Ana Marija <sokovic.anamar...@gmail.com> wrote: >> >> Hi Abby, >> >> thank you for getting back to me and for this useful information. >> >> I'm trying to detect the outliers in my distribution based of mean and >> variance. Can I see that from the plot I provided? Would outliers be >> outside of ellipses? If so how do I extract those from my data frame, >> based on which parameter? >> >> So I am trying to connect outliers based on what the plot is showing: >> s <- ggplot(SNP, mapping = aes(x = mean, y = var)) >> s <- s + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs") >> >> versus what is in the data: >> >> > head(SNP) >> mean var sd >> FQC.10090295 0.0327 0.002678 0.0517 >> FQC.10119363 0.0220 0.000978 0.0313 >> FQC.10132112 0.0275 0.002088 0.0457 >> FQC.10201128 0.0169 0.000289 0.0170 >> FQC.10208432 0.0443 0.004081 0.0639 >> FQC.10218466 0.0116 0.000131 0.0115 >> ... >> >> the distribution is not normal, it is right-skewed. >> >> Cheers, >> Ana >> >> On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle <spurdl...@gmail.com> wrote: >> > >> > > My understanding is that this represents bivariate normal >> > > approximation of the data which uses the kernel density function to >> > > test for inclusion within a level set. (please correct me) >> > >> > You can fit a bivariate normal distribution by computing five parameters. >> > Two means, two standard deviations (or two variances) and one >> > correlation (or covariance) coefficient. >> > The bivariate normal *has* elliptical contours. >> > >> > A kernel density estimate is usually regarded as an estimate of an >> > unknown density function. >> > Often they use a normal (or Gaussian) kernel, but I wouldn't describe >> > them as normal approximations. >> > In general, bivariate kernel density estimates do *not* have >> > elliptical contours. >> > But in saying that, if the data is close to normality, then contours >> > will be close to elliptical. >> > >> > Kernel density estimates do not test for inclusion, as such. >> > (But technically, there are some exceptions to that). >> > >> > I'm not sure what you're trying to achieve here. >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.