Hi Abby, Thanks for getting back to me, yes I believe I did that by doing this:
SNP$density <- get_density(SNP$mean, SNP$var) > summary(SNP$density) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 383 696 738 1170 1789 where get_density() is function from here: https://slowkow.com/notes/ggplot2-color-by-density/ and keep only entries with density > 400 a=SNP[SNP$density>400,] and plot it again: p <- ggplot(a, mapping = aes(x = mean, y = var)) p <- p + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPS_red") and probably I can increase that threshold... Any idea how do I interpret data points that are left contained within the ellipses? On Fri, Oct 9, 2020 at 6:09 PM Abby Spurdle <spurdl...@gmail.com> wrote: > > You could assign a density value to each point. > Maybe you've done that already...? > > Then trim the lowest n (number of) data points > Or trim the lowest p (proportion of) data points. > > e.g. > Remove the data points with the 20 lowest density values. > Or remove the data points with the lowest 5% of density values. > > I'll let you decide whether that is a good idea or a bad idea. > And if it's a good idea, then how much to trim. > > > On Sat, Oct 10, 2020 at 5:47 AM Ana Marija <sokovic.anamar...@gmail.com> > wrote: > > > > Hi Bert, > > > > Another confrontational response from you... > > > > You might have noticed that I use the word "outlier" carefully in this > > post and only in relation to the plotted ellipses. I do not know the > > underlying algorithm of geom_density_2d() and therefore I am having an > > issue of how to interpret the plot. I was hoping someone here knows > > that and can help me. > > > > Ana > > > > On Fri, Oct 9, 2020 at 11:31 AM Bert Gunter <bgunter.4...@gmail.com> wrote: > > > > > > I recommend that you consult with a local statistical expert. Much of > > > what you say (outliers?!?) seems to make little sense, and your > > > statistical knowledge seems minimal. Perhaps more to the point, none of > > > your questions can be properly answered without subject matter context, > > > which this list is not designed to provide. That's why I believe you need > > > local expertise. > > > > > > Bert Gunter > > > > > > "The trouble with having an open mind is that people keep coming along > > > and sticking things into it." > > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > > > > On Fri, Oct 9, 2020 at 8:25 AM Ana Marija <sokovic.anamar...@gmail.com> > > > wrote: > > >> > > >> Hi Abby, > > >> > > >> thank you for getting back to me and for this useful information. > > >> > > >> I'm trying to detect the outliers in my distribution based of mean and > > >> variance. Can I see that from the plot I provided? Would outliers be > > >> outside of ellipses? If so how do I extract those from my data frame, > > >> based on which parameter? > > >> > > >> So I am trying to connect outliers based on what the plot is showing: > > >> s <- ggplot(SNP, mapping = aes(x = mean, y = var)) > > >> s <- s + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs") > > >> > > >> versus what is in the data: > > >> > > >> > head(SNP) > > >> mean var sd > > >> FQC.10090295 0.0327 0.002678 0.0517 > > >> FQC.10119363 0.0220 0.000978 0.0313 > > >> FQC.10132112 0.0275 0.002088 0.0457 > > >> FQC.10201128 0.0169 0.000289 0.0170 > > >> FQC.10208432 0.0443 0.004081 0.0639 > > >> FQC.10218466 0.0116 0.000131 0.0115 > > >> ... > > >> > > >> the distribution is not normal, it is right-skewed. > > >> > > >> Cheers, > > >> Ana > > >> > > >> On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle <spurdl...@gmail.com> wrote: > > >> > > > >> > > My understanding is that this represents bivariate normal > > >> > > approximation of the data which uses the kernel density function to > > >> > > test for inclusion within a level set. (please correct me) > > >> > > > >> > You can fit a bivariate normal distribution by computing five > > >> > parameters. > > >> > Two means, two standard deviations (or two variances) and one > > >> > correlation (or covariance) coefficient. > > >> > The bivariate normal *has* elliptical contours. > > >> > > > >> > A kernel density estimate is usually regarded as an estimate of an > > >> > unknown density function. > > >> > Often they use a normal (or Gaussian) kernel, but I wouldn't describe > > >> > them as normal approximations. > > >> > In general, bivariate kernel density estimates do *not* have > > >> > elliptical contours. > > >> > But in saying that, if the data is close to normality, then contours > > >> > will be close to elliptical. > > >> > > > >> > Kernel density estimates do not test for inclusion, as such. > > >> > (But technically, there are some exceptions to that). > > >> > > > >> > I'm not sure what you're trying to achieve here. > > >> > > >> ______________________________________________ > > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > >> https://stat.ethz.ch/mailman/listinfo/r-help > > >> PLEASE do read the posting guide > > >> http://www.R-project.org/posting-guide.html > > >> and provide commented, minimal, self-contained, reproducible code.
snps_red.pdf
Description: Adobe PDF document
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.