In the manual page for prcomp(), it says that sdev is "the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix)." However, this is not what I'm finding. The values appear to be the standard deviations of a reprojection of the data, as in the example below.
> test=matrix(data=c(1,1,2,2,0,0,2,1,2,1,0,2,1,2,1,2,2,0),nrow=3,ncol=6) > test=standardize(test) > test [,1] [,2] [,3] [,4] [,5] [,6] [1,] -0.7071068 2.828427 0.8944272 0 -0.7071068 1.414214 [2,] -0.7071068 -1.414214 -1.7888544 -2 1.4142136 1.414214 [3,] 1.4142136 -1.414214 0.8944272 2 -0.7071068 -2.828427 > res1 Standard deviations: [1] 3.622043e+00 2.877639e+00 1.812987e-16 Rotation: PC1 PC2 PC3 [1,] 0.3330598 -0.07347383 0.1418171 [2,] -0.2319540 0.79957999 0.3837549 [3,] 0.2745904 0.41276093 -0.8541541 [4,] 0.5186794 0.23838204 0.2096053 [5,] -0.2170828 -0.32631616 -0.1153013 [6,] -0.6661196 0.14694766 -0.2140375 > res2=svd(test) >res2 $d [1] 5.122342e+00 4.069596e+00 1.153778e-15 $u [,1] [,2] [,3] [1,] -0.2800491 0.7669675 0.5773503 [2,] -0.5241888 -0.6260134 0.5773503 [3,] 0.8042379 -0.1409541 0.5773503 $v [,1] [,2] [,3] [1,] 0.3330598 -0.07347383 0.7608840 [2,] -0.2319540 0.79957999 0.3438537 [3,] 0.2745904 0.41276093 -0.4028185 [4,] 0.5186794 0.23838204 0.1543444 [5,] -0.2170828 -0.32631616 0.3236952 [6,] -0.6661196 0.14694766 0.1093469 Here, the values in $d are not the squares of the standard deviations. > res3=test %*% res2$v > res3 [,1] [,2] [,3] [1,] -1.434507 3.1212479 -7.494005e-16 [2,] -2.685074 -2.5476217 3.996803e-15 [3,] 4.119582 -0.5736263 -2.053913e-15 > apply(res3,2,sd) [1] 3.622043e+00 2.877639e+00 3.184320e-15 As shown above, taking the standard deviation of the columns of the reprojection of the eigenvectors on the original "data" gives the same values as the standard deviations given by prcomp(). Can anyone reconcile this example with the manual page for prcomp() or explain how the standard deviations are calculated? the code for standardize(): standardize<-function(data){ numSamples = nrow(data); # calculate mean for each marker mn=apply(data,2,mean); numMarkers = ncol(data); #standardize every entry for(j in 1:numMarkers){ for(k in 1:numSamples){ temp = sqrt(mn[j]/2*(1-mn[j]/2)); data[k,j] = (data[k,j] - mn[j])/temp; } } data; } Thank you, Sky ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.