Hi Abby,

Thanks for getting back to me, yes I believe I did that by doing this:

SNP$density <- get_density(SNP$mean, SNP$var)
> summary(SNP$density)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
      0     383     696     738    1170    1789

where get_density() is function from here:
https://slowkow.com/notes/ggplot2-color-by-density/

and keep only entries with density > 400

a=SNP[SNP$density>400,]

and plot it again:

p <- ggplot(a, mapping = aes(x = mean, y = var))
p <- p +  geom_density_2d() + geom_point() + my.theme + ggtitle("SNPS_red")

and probably I can increase that threshold...

Any idea how do I interpret data points that are left contained within
the ellipses?

On Fri, Oct 9, 2020 at 6:09 PM Abby Spurdle <spurdl...@gmail.com> wrote:
>
> You could assign a density value to each point.
> Maybe you've done that already...?
>
> Then trim the lowest n (number of) data points
> Or trim the lowest p (proportion of) data points.
>
> e.g.
> Remove the data points with the 20 lowest density values.
> Or remove the data points with the lowest 5% of density values.
>
> I'll let you decide whether that is a good idea or a bad idea.
> And if it's a good idea, then how much to trim.
>
>
> On Sat, Oct 10, 2020 at 5:47 AM Ana Marija <sokovic.anamar...@gmail.com> 
> wrote:
> >
> > Hi Bert,
> >
> > Another confrontational response from you...
> >
> > You might have noticed that I use the word "outlier" carefully in this
> > post and only in relation to the plotted ellipses. I do not know the
> > underlying algorithm of geom_density_2d() and therefore I am having an
> > issue of how to interpret the plot. I was hoping someone here knows
> > that and can help me.
> >
> > Ana
> >
> > On Fri, Oct 9, 2020 at 11:31 AM Bert Gunter <bgunter.4...@gmail.com> wrote:
> > >
> > > I recommend that you consult with a local statistical expert. Much of 
> > > what you say (outliers?!?) seems to make little sense, and your 
> > > statistical knowledge seems minimal. Perhaps more to the point, none of 
> > > your questions can be properly answered without subject matter context, 
> > > which this list is not designed to provide. That's why I believe you need 
> > > local expertise.
> > >
> > > Bert Gunter
> > >
> > > "The trouble with having an open mind is that people keep coming along 
> > > and sticking things into it."
> > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> > >
> > >
> > > On Fri, Oct 9, 2020 at 8:25 AM Ana Marija <sokovic.anamar...@gmail.com> 
> > > wrote:
> > >>
> > >> Hi Abby,
> > >>
> > >> thank you for getting back to me and for this useful information.
> > >>
> > >> I'm trying to detect the outliers in my distribution based of mean and
> > >> variance. Can I see that from the plot I provided? Would outliers be
> > >> outside of ellipses? If so how do I extract those from my data frame,
> > >> based on which parameter?
> > >>
> > >> So I am trying to connect outliers based on what the plot is showing:
> > >> s <- ggplot(SNP, mapping = aes(x = mean, y = var))
> > >> s <- s +  geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs")
> > >>
> > >> versus what is in the data:
> > >>
> > >> > head(SNP)
> > >>                mean      var     sd
> > >> FQC.10090295 0.0327 0.002678 0.0517
> > >> FQC.10119363 0.0220 0.000978 0.0313
> > >> FQC.10132112 0.0275 0.002088 0.0457
> > >> FQC.10201128 0.0169 0.000289 0.0170
> > >> FQC.10208432 0.0443 0.004081 0.0639
> > >> FQC.10218466 0.0116 0.000131 0.0115
> > >> ...
> > >>
> > >> the distribution is not normal, it is right-skewed.
> > >>
> > >> Cheers,
> > >> Ana
> > >>
> > >> On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle <spurdl...@gmail.com> wrote:
> > >> >
> > >> > > My understanding is that this represents bivariate normal
> > >> > > approximation of the data which uses the kernel density function to
> > >> > > test for inclusion within a level set. (please correct me)
> > >> >
> > >> > You can fit a bivariate normal distribution by computing five 
> > >> > parameters.
> > >> > Two means, two standard deviations (or two variances) and one
> > >> > correlation (or covariance) coefficient.
> > >> > The bivariate normal *has* elliptical contours.
> > >> >
> > >> > A kernel density estimate is usually regarded as an estimate of an
> > >> > unknown density function.
> > >> > Often they use a normal (or Gaussian) kernel, but I wouldn't describe
> > >> > them as normal approximations.
> > >> > In general, bivariate kernel density estimates do *not* have
> > >> > elliptical contours.
> > >> > But in saying that, if the data is close to normality, then contours
> > >> > will be close to elliptical.
> > >> >
> > >> > Kernel density estimates do not test for inclusion, as such.
> > >> > (But technically, there are some exceptions to that).
> > >> >
> > >> > I'm not sure what you're trying to achieve here.
> > >>
> > >> ______________________________________________
> > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide 
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.

Attachment: snps_red.pdf
Description: Adobe PDF document

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to