Hi Bert,

Another confrontational response from you...

You might have noticed that I use the word "outlier" carefully in this
post and only in relation to the plotted ellipses. I do not know the
underlying algorithm of geom_density_2d() and therefore I am having an
issue of how to interpret the plot. I was hoping someone here knows
that and can help me.

Ana

On Fri, Oct 9, 2020 at 11:31 AM Bert Gunter <bgunter.4...@gmail.com> wrote:
>
> I recommend that you consult with a local statistical expert. Much of what 
> you say (outliers?!?) seems to make little sense, and your statistical 
> knowledge seems minimal. Perhaps more to the point, none of your questions 
> can be properly answered without subject matter context, which this list is 
> not designed to provide. That's why I believe you need local expertise.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Oct 9, 2020 at 8:25 AM Ana Marija <sokovic.anamar...@gmail.com> wrote:
>>
>> Hi Abby,
>>
>> thank you for getting back to me and for this useful information.
>>
>> I'm trying to detect the outliers in my distribution based of mean and
>> variance. Can I see that from the plot I provided? Would outliers be
>> outside of ellipses? If so how do I extract those from my data frame,
>> based on which parameter?
>>
>> So I am trying to connect outliers based on what the plot is showing:
>> s <- ggplot(SNP, mapping = aes(x = mean, y = var))
>> s <- s +  geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs")
>>
>> versus what is in the data:
>>
>> > head(SNP)
>>                mean      var     sd
>> FQC.10090295 0.0327 0.002678 0.0517
>> FQC.10119363 0.0220 0.000978 0.0313
>> FQC.10132112 0.0275 0.002088 0.0457
>> FQC.10201128 0.0169 0.000289 0.0170
>> FQC.10208432 0.0443 0.004081 0.0639
>> FQC.10218466 0.0116 0.000131 0.0115
>> ...
>>
>> the distribution is not normal, it is right-skewed.
>>
>> Cheers,
>> Ana
>>
>> On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle <spurdl...@gmail.com> wrote:
>> >
>> > > My understanding is that this represents bivariate normal
>> > > approximation of the data which uses the kernel density function to
>> > > test for inclusion within a level set. (please correct me)
>> >
>> > You can fit a bivariate normal distribution by computing five parameters.
>> > Two means, two standard deviations (or two variances) and one
>> > correlation (or covariance) coefficient.
>> > The bivariate normal *has* elliptical contours.
>> >
>> > A kernel density estimate is usually regarded as an estimate of an
>> > unknown density function.
>> > Often they use a normal (or Gaussian) kernel, but I wouldn't describe
>> > them as normal approximations.
>> > In general, bivariate kernel density estimates do *not* have
>> > elliptical contours.
>> > But in saying that, if the data is close to normality, then contours
>> > will be close to elliptical.
>> >
>> > Kernel density estimates do not test for inclusion, as such.
>> > (But technically, there are some exceptions to that).
>> >
>> > I'm not sure what you're trying to achieve here.
>>
>> ______________________________________________
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to