Hello, I did a pca with over 200000 snps for 340 observations (ids). If I plot the eigenvectors (called rotation in prcomp) 2,3 and 4 (e.g. plot (rotation[,2]) I see a strange "column" in my data (see attachment). I suggest it is an artefact (but of what?).
Suggestion: I used prcomp this way: prcomp (mat), where mat is a matrix with the column means already substracted followed by a normalisation procedure (see below for details). Is that okay? Or does prcomp repeat substraction steps? Originally my approach was driven by the idea to compute a covariation matrix followed by the use of eigen, but the covariation matrix was to huge to handle. So I switched to prcomp. As I guess that the "columns" in my plots reflect some artefact production I hope to get some help. For the case that my use of prcomp was not okay, could you please give me instructions how to use it - including with the normalisation procedure that I need to include before doing a pca. Thanks Hermann # # mat: matrix with genotypes coded as 0,1 and 2 (columns); IDs (observations) as rows. # prcomp.snp <- function (mat) { m <- ncol (mat) n <- nrow (mat) snp.namen <- colnames (mat) for (i in 1:m) { # snps in columns ui <- mat[,i] n <- length (which (!is.na(ui))) # see methods Price et al. as correction pi <- (1+ sum(ui, na.rm=TRUE))/(2+2*n) # substract mean ui <- ui - mean (ui, na.rm=TRUE) # NAs set to zero ui[is.na(ui)] <- 0 # normalisation of the genotype for each ID important normalisation step ui <- ui/ (sqrt (pi*(1-pi))) # fill matrix with ui mat[,i] <- ui } mat <- prcomp (mat) return (mat) }
<<attachment: rotplot.png>>
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.