[R] Fwd: efficient code - yet another question

steven wilson Wed, 30 Apr 2008 21:20:38 -0700

The code I sent before had some typos, here is the corrected one:

pca.nipals <- function(X, ncomp, iter = 50, toler = sqrt(.Machine$double.eps))
 # X...data matrix, ncomp...number of components,
 # iter...maximal number of iterations per component,
 # toler...precision tolerance for calculation of components
 {


     Xh <- scale(X, center = TRUE, scale = FALSE)
     nr <- 0
     T <- NULL; P <- NULL # matrix of scores and loadings
     for (h in 1:ncomp)
              {
                     th <- Xh[, 1]
                     ende <- FALSE
                     while (!ende)
                       {
                           nr <- nr + 1
                           ph <- t(crossprod(th, Xh) * as.vector(1 /
crossprod(th)))
                           ph <- ph * as.vector(1/sqrt(crossprod(ph)))
                           thnew <- t(tcrossprod(t(ph), Xh) *
as.vector(1/(crossprod(ph))))
                           prec <- crossprod(th-thnew)
                           th <- thnew
                           if (prec <= (toler^2)) ende <- TRUE
                           if (iter <= nr) ende <- TRUE # didn't converge
                       }

                     Xh <- Xh - tcrossprod(th)
                     T <- cbind(T, th); P <- cbind(P, ph)
                     nr <- 0
               }
     list(T = T, P = P)
 }

Thanks again


---------- Forwarded message ----------
From: steven wilson <[EMAIL PROTECTED]>
Date: Wed, Apr 30, 2008 at 11:56 PM
Subject: efficient code - yet another question
To: r-help@r-project.org


Dear list members;

 The code given below corresponds to the PCA-NIPALS (principal
 component analysis) algorithm adapted from the nipals function in the
 package chemometrics. The reason for using NIPALS instead of SVD is
 the ability of this algorithm to handle missing values, but that's a
 different story. I've been trying to find a way to improve (if
 possible) the efficiency of the code, especially when the number of
 components to calculate is higher than 100. I've been working with a
 data set of 500 rows x 700 variables. The code gets really slow when
 the number of PC to calculate is for example 600 (why that number of
 components?....because I need them to detect outliers using another
 algorithm). In my old laptop running Win XP and R 2.7.0 (1GB RAM) it
 takes around 6 or 7 minutes. That shouldn't be a problem for one
 analysis, but when cross-validation is added the time increases
 severely.....Although there are several examples on the R help list
 giving some with 'hints' to improve effciency the truth is that I
 don't know (or I can't see it) the part of the code that can be
 improved (but it is clear that the bottle neck is the for and while
 loops). I would really appreciate any ideas/comments/etc...

 Thanks

 #################################################################

 pca.nipals <- function(X, ncomp, iter = 50, toler = sqrt(.Machine$double.eps))
 # X...data matrix, ncomp...number of components,
 # iter...maximal number of iterations per component,
 # toler...precision tolerance for calculation of components
 {

     Xh <- scale(X, center = TRUE, scale = FALSE)
     nr <- 0
     T <- NULL; P <- NULL # matrix of scores and loadings
     for (h in 1:ncomp)
              {
                     th <- Xh[, 1]
                     ende <- FALSE
                     while (!ende)
                       {
                           nr <- nr + 1
                           ph <- t(crossprod(th, Xh) * as.vector(1 /
 crossprod(th)))
                           ph <- ph * as.vector(1/sqrt(crossprod(ph)))
                           thnew <- t(tcrossprod(t(ph), Xh) *
 as.vector(1/(crossprod(ph))))
                           prec <- crossprod(th-thnew)
                           th <- thnew
                           if (prec <= (tol^2)) ende <- TRUE
                           if (it <= nr) ende <- TRUE # didn't converge
                       }

                     Xh <- Xh - tcrossprod(th)
                     T <- cbind(T, th); P <- cbind(P, ph)
                     nr <- 0
               }
     list(T = T, P = P)
 }

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fwd: efficient code - yet another question

Reply via email to