Re: [Rd] Understanding svd usage and its necessity in generalized inverse calculation

Thomas Lumley Tue, 01 Jan 2013 15:01:40 -0800

On Thu, Dec 6, 2012 at 12:58 PM, Paul Johnson <pauljoh...@gmail.com> wrote:


> Dear R-devel:
>
> I could use some advice about matrix calculations and steps that might
> make for faster computation of generalized inverses. It appears in
> some projects there is a bottleneck at the use of svd in calculation
> of generalized inverses.
>
> Here's some Rprof output I need to understand.
>
> >   summaryRprof("Amelia.out")
> $by.self
>                              self.time self.pct total.time total.pct
> "La.svd"                        150.34    27.66     164.82     30.32
>
>
[snip]


> I *Think* this means that a bottlleneck here is svd, which is being
>

A bottleneck to a limited extent -- it's still only 30% of computation
time, so speeding it up can't save more than that


> called by this function that calculates generalized inverses:
>
> ## Moore-Penrose Inverse function (aka Generalized Inverse)
> ##   X:    symmetric matrix
> ##   tol:  convergence requirement
> mpinv <- function(X, tol = sqrt(.Machine$double.eps)) {
>   s <- svd(X)
>   e <- s$d
>   e[e > tol] <- 1/e[e > tol]
>   s$v %*% diag(e,nrow=length(e)) %*% t(s$u)
> }
>
> That is from the Amerlia package, which we like to use very much.
>
> Basically, I wonder if I should use a customized generalized inverse
> or svd calculator to make this faster.
>
>

If your matrix is produced as a covariance matrix, so that it really is
symmetric positive semidefinite up to rounding error, my understanding is
that the Cholesky approach is stable in the sense that it will reliably
produce an accurate inverse or a zero-to-within-tolerance pivot and error
message.

Now, if most of the matrices you are trying to invert are actually
invertible (as I would hope), it may be quicker to use the Cholesky
approach will a fallback to the SVD for semidefinite matrices. That is,
something like

tryCatch(
    chol2inv(chol(xx)),
    error=function(e) ginv(xx)
  )

Most of the  time you will get the Cholesky branch, which is much faster
(about five-fold for 10x10 matrices on my system).  On my system and using
a 10x10 matrix the overhead in the tryCatch() is much smaller than the time
taken by either set of linear algebra, so there is a net gain as long as
even a reasonably minority of the matrices are actually invertible.

You can probably do slightly better by replacing the chol2inv() with
backsolve(): solving just the systems of linear equations you need is
usually preferable to constructing a matrix inverse.


Note that this approach will give wrong answers without warning if the
matrices are not symmetric.

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Understanding svd usage and its necessity in generalized inverse calculation

Reply via email to