The output of summary prcomp displays the cumulative amount of variance explained
relative to the total variance explained by the principal components PRESENT in the
object. So, it is always guaranteed to be at 100% for the last principal component
present. You can see this from the code in summary.prcomp() (see this code with
getAnywhere("summary.prcomp")).
Here's how to get the output you want (the last line in the transcript below):
set.seed(1)
summary(pc1 <- prcomp(x))
Importance of components:
PC1 PC2 PC3 PC4 PC5
Standard deviation 1.175 1.058 0.976 0.916 0.850
Proportion of Variance 0.275 0.223 0.190 0.167 0.144
Cumulative Proportion 0.275 0.498 0.688 0.856 1.000
summary(pc2 <- prcomp(x, tol=0.8))
Importance of components:
PC1 PC2 PC3
Standard deviation 1.17 1.058 0.976
Proportion of Variance 0.40 0.324 0.276
Cumulative Proportion 0.40 0.724 1.000
pc2$sdev
[1] 1.1749061 1.0581362 0.9759016
pc1$sdev
[1] 1.1749061 1.0581362 0.9759016 0.9164905 0.8503122
svd(scale(x, center=T, scale=F))$d / sqrt(nrow(x)-1)
[1] 1.1749061 1.0581362 0.9759016 0.9164905 0.8503122
cumsum(pc1$sdev^2) / sum((svd(scale(x, center=T, scale=F))$d /
sqrt(nrow(x)-1))^2)
[1] 0.2752317 0.4984734 0.6883643 0.8558386 1.0000000
# output in terms of the cumulative % of the total variance
cumsum(pc2$sdev^2) / sum((svd(scale(x, center=T, scale=F))$d /
sqrt(nrow(x)-1))^2)
[1] 0.2752317 0.4984734 0.6883643
It's probably better to get prcomp to compute all the components in the first
place, because the SVD is the bulk of the computation anyway (so doing it again
will be slower for large matrices.) Then just look at the most important
principal components. However, there may be a shortcut for computing the
values of D in the SVD of a matrix -- you could look for that if you have
demanding computations (e.g., the sqrts of the eigen values of the covariance
matrix of scaled x: sqrt(eigen(var(scale(x, center=T, scale=F)),
only.values=T)$values)).
-- Tony Plate
zubin wrote:
Hello, not understanding the output of prcomp, I reduce the number of
components and the output continues to show cumulative 100% of the
variance explained, which can't be the case dropping from 8 components
to 3.
How do i get the output in terms of the cumulative % of the total
variance, so when i go from total solution of 8 (8 variables in the data
set), to a reduced number of components, i can evaluate % of variance
explained, or am I missing something??
8 variables in the data set
> princ = prcomp(df[,-1],rotate="varimax",scale=TRUE)
> summary(princ)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
Standard deviation 1.381 1.247 1.211 0.994 0.927 0.764 0.6708 0.4366
Proportion of Variance 0.238 0.194 0.183 0.124 0.107 0.073 0.0562 0.0238
Cumulative Proportion 0.238 0.433 0.616 0.740 0.847 0.920 0.9762 *1.0000*
> princ = prcomp(df[,-1],rotate="varimax",scale=TRUE,tol=.75)
> summary(princ)
Importance of components:
PC1 PC2 PC3
Standard deviation 1.381 1.247 1.211
Proportion of Variance 0.387 0.316 0.297
Cumulative Proportion 0.387 0.703 *1.000*
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.