Hello,

I did a pca with over 200000 snps for 340 observations (ids). If I plot the
eigenvectors (called rotation in prcomp) 2,3 and 4 (e.g. plot
(rotation[,2]) I see a strange "column" in my data (see attachment). I
suggest it is an artefact (but of what?).

Suggestion:
I used prcomp this way: prcomp (mat), where mat is a matrix with the column
means already substracted followed by a normalisation procedure (see below
for details). Is that okay? Or does prcomp repeat substraction steps?

Originally my approach was driven by the idea to compute a covariation
matrix followed by the use of eigen, but the covariation matrix was to huge
to handle. So I switched to prcomp.

As I guess that the "columns" in my plots reflect some artefact production
I hope to get some help. For the case that my use of prcomp was not okay,
could you please give me instructions how to use it - including with the
normalisation procedure that I need to include before doing a pca.

Thanks
Hermann

#
# mat: matrix with genotypes coded as 0,1 and 2 (columns); IDs
(observations) as rows.
#
prcomp.snp <- function (mat)
  {
    m <- ncol (mat)
    n <- nrow (mat)
    snp.namen <- colnames (mat)
    for (i in 1:m)
                   {
                     # snps in columns
                     ui <- mat[,i]
                     n <- length (which (!is.na(ui)))
                     # see methods Price et al. as correction
                     pi <- (1+ sum(ui, na.rm=TRUE))/(2+2*n)

                     # substract mean
                     ui <- ui - mean (ui, na.rm=TRUE)
                     # NAs set to zero
                     ui[is.na(ui)] <- 0
                     # normalisation of the genotype for each ID
important normalisation step
                     ui <- ui/ (sqrt (pi*(1-pi)))
                     # fill matrix with ui
                     mat[,i] <- ui
                   }
    mat <- prcomp (mat)
    return (mat)
   }

<<attachment: rotplot.png>>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to