Thanks for answering.

I already started hunting. But my first doubt was if I used prcomp
correctly (and this is in the moment my most important point). So far as I
understood your answer is yes. Is that correct?

I am puzzled by the fact that these "columns" are more or less in the
middle of my snp-data.

> However, it could also be a biological effect. Are your ids by any chance
from the same pedigree? If so, you might be seeing something like the
effect of a crossover event in a distant ancestor.

No there is no such pedigree scheme. Things like this are ruled out by
IBD-measurement. (Further, the data is checked by an EIGENSTRAT analysis.)

> (b) the scaling by sqrt(pi*(1-pi)) implicitly requiring Hardy-Weinberg
equilibrium, so if your data are all 0 or 2 (aa or AA) there will be
overdispersion.

This is a good point. But why do find such effects in the "middle" of my
data?

Thanks
Hermann




2013/10/3 peter dalgaard <pda...@gmail.com>

> It's not so obvious to me that this is an artifact. What prcomp() says is
> that some of the eigenvectors have a lot of "activity" in some relatively
> narrow ranges of SNPs (on the same chromosome, perhaps?). If something
> artificial is going on, I could imagine effects not so much of centering
> columns but maybe one of
> (a) imputing zero for missing values
> (b) the scaling by sqrt(pi*(1-pi)) implicitly requiring Hardy-Weinberg
> equilibrium, so if your data are all 0 or 2 (aa or AA) there will be
> overdispersion.
>
> However, it could also be a biological effect. Are your ids by any chance
> from the same pedigree? If so, you might be seeing something like the
> effect of a crossover event in a distant ancestor. (Talk to a geneticist, I
> just "play one on TV".)
>
> To investigate further, you could go looking at the individual scores and
> see who is having extreme values on component 2-4 and then go back and see
> if there is something peculiar about their SNPs in the "strange" region.
>
> Of course, you might have stumbled upon a bug in R, but I doubt so.
>
> Happy hunting!
>
> -pd
>
>
> On Oct 3, 2013, at 11:41 , Hermann Norpois wrote:
>
> > Hello,
> >
> > I did a pca with over 200000 snps for 340 observations (ids). If I plot
> the
> > eigenvectors (called rotation in prcomp) 2,3 and 4 (e.g. plot
> > (rotation[,2]) I see a strange "column" in my data (see attachment). I
> > suggest it is an artefact (but of what?).
> >
> > Suggestion:
> > I used prcomp this way: prcomp (mat), where mat is a matrix with the
> column
> > means already substracted followed by a normalisation procedure (see
> below
> > for details). Is that okay? Or does prcomp repeat substraction steps?
> >
> > Originally my approach was driven by the idea to compute a covariation
> > matrix followed by the use of eigen, but the covariation matrix was to
> huge
> > to handle. So I switched to prcomp.
> >
> > As I guess that the "columns" in my plots reflect some artefact
> production
> > I hope to get some help. For the case that my use of prcomp was not okay,
> > could you please give me instructions how to use it - including with the
> > normalisation procedure that I need to include before doing a pca.
> >
> > Thanks
> > Hermann
> >
> > #
> > # mat: matrix with genotypes coded as 0,1 and 2 (columns); IDs
> > (observations) as rows.
> > #
> > prcomp.snp <- function (mat)
> >  {
> >    m <- ncol (mat)
> >    n <- nrow (mat)
> >    snp.namen <- colnames (mat)
> >    for (i in 1:m)
> >                   {
> >                     # snps in columns
> >                     ui <- mat[,i]
> >                     n <- length (which (!is.na(ui)))
> >                     # see methods Price et al. as correction
> >                     pi <- (1+ sum(ui, na.rm=TRUE))/(2+2*n)
> >
> >                     # substract mean
> >                     ui <- ui - mean (ui, na.rm=TRUE)
> >                     # NAs set to zero
> >                     ui[is.na(ui)] <- 0
> >                     # normalisation of the genotype for each ID
> > important normalisation step
> >                     ui <- ui/ (sqrt (pi*(1-pi)))
> >                     # fill matrix with ui
> >                     mat[,i] <- ui
> >                   }
> >    mat <- prcomp (mat)
> >    return (mat)
> >   }
> > <rotplot.png>______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd....@cbs.dk  Priv: pda...@gmail.com
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to