I'm working with proteomic data, helping a student who knows biology and has done analysis in R without understanding it in depth.
We have 3000 protein levels for 6 ages. I can treat this as 6 vectors in 3000-dimensional space, diagonalize a 6x6 covariance matrix and find 5 principal components, one zero eigenvalue. My student has worked with R in "Q mode" and he enters the transposed matrix as 3000 vectors in 6-dimensional space. In just a few seconds, R diagonalizes a 3000x3000 matrix! I can't imagine what that means, to diagonalize a 3000x3000 matrix. But, of course, there are only 5 degrees of freedom in the data, so only 5 of the eigenvalues are non-zero, and the other 2995 vectors are junk. Questions: a) Is there a relationship between the principal components of the 3000*6 matrix and the principal components of the transposed 6*3000 matrix? b) Is there a way to find the 5 meaningful eigenvectors without carrying the baggage of diagonalizing the huge 3000-dimensional matrix? c) The big question is which version to analyze and publish? My student tells me the transposed matrix is the common procedure. The two yield very different-looking plots. Thanks for your help. - Josh Mitteldorf [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.