The output of summary prcomp displays the cumulative amount of variance explained 
relative to the total variance explained by the principal components PRESENT in the 
object.  So, it is always guaranteed to be at 100% for the last principal component 
present.  You can see this from the code in summary.prcomp() (see this code with 
getAnywhere("summary.prcomp")).

Here's how to get the output you want (the last line in the transcript below):

set.seed(1)
summary(pc1 <- prcomp(x))
Importance of components:
                        PC1   PC2   PC3   PC4   PC5
Standard deviation     1.175 1.058 0.976 0.916 0.850
Proportion of Variance 0.275 0.223 0.190 0.167 0.144
Cumulative Proportion  0.275 0.498 0.688 0.856 1.000
summary(pc2 <- prcomp(x, tol=0.8))
Importance of components:
                       PC1   PC2   PC3
Standard deviation     1.17 1.058 0.976
Proportion of Variance 0.40 0.324 0.276
Cumulative Proportion  0.40 0.724 1.000
pc2$sdev
[1] 1.1749061 1.0581362 0.9759016
pc1$sdev
[1] 1.1749061 1.0581362 0.9759016 0.9164905 0.8503122
svd(scale(x, center=T, scale=F))$d / sqrt(nrow(x)-1)
[1] 1.1749061 1.0581362 0.9759016 0.9164905 0.8503122
cumsum(pc1$sdev^2) / sum((svd(scale(x, center=T, scale=F))$d / 
sqrt(nrow(x)-1))^2)
[1] 0.2752317 0.4984734 0.6883643 0.8558386 1.0000000

# output in terms of the cumulative % of the total variance
cumsum(pc2$sdev^2) / sum((svd(scale(x, center=T, scale=F))$d / 
sqrt(nrow(x)-1))^2)
[1] 0.2752317 0.4984734 0.6883643


It's probably better to get prcomp to compute all the components in the first 
place, because the SVD is the bulk of the computation anyway (so doing it again 
will be slower for large matrices.)  Then just look at the most important 
principal components.  However, there may be a shortcut for computing the 
values of D in the SVD of a matrix -- you could look for that if you have 
demanding computations (e.g., the sqrts of the eigen values of the covariance 
matrix of scaled x: sqrt(eigen(var(scale(x, center=T, scale=F)), 
only.values=T)$values)).

-- Tony Plate


zubin wrote:
Hello, not understanding the output of prcomp, I reduce the number of components and the output continues to show cumulative 100% of the variance explained, which can't be the case dropping from 8 components to 3. How do i get the output in terms of the cumulative % of the total variance, so when i go from total solution of 8 (8 variables in the data set), to a reduced number of components, i can evaluate % of variance explained, or am I missing something??

8 variables in the data set

 > princ = prcomp(df[,-1],rotate="varimax",scale=TRUE)
 > summary(princ)
Importance of components:
                         PC1   PC2   PC3   PC4   PC5   PC6    PC7    PC8
Standard deviation     1.381 1.247 1.211 0.994 0.927 0.764 0.6708 0.4366
Proportion of Variance 0.238 0.194 0.183 0.124 0.107 0.073 0.0562 0.0238
Cumulative Proportion  0.238 0.433 0.616 0.740 0.847 0.920 0.9762 *1.0000*

 > princ = prcomp(df[,-1],rotate="varimax",scale=TRUE,tol=.75)
 > summary(princ)

Importance of components:
                         PC1   PC2   PC3
Standard deviation     1.381 1.247 1.211
Proportion of Variance 0.387 0.316 0.297
Cumulative Proportion  0.387 0.703 *1.000*

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to