[Rd] Eigenvalue calculation of sparse matrices

2012-03-10 Thread Holger Diedrich

Dear all,

I am currently working on the calculation of eigenvalues (and -vectors) 
of large matrices. Since these are mostly sparse matrices and I remember 
some specific functionalities in MATLAB for sparse matrices, I started a 
research how to optimize the calculation of eigenvalues of a sparse matrix.
The function eigen itself works with the LAPACK library which has no 
special handling for sparse matrices, same for the EISPACK library. The 
ARPACK library is capable to work with sparse matrices but I couldn't 
find any useful R package using ARPACK.
The Matrix package can handle sparse matrices but has no further useful 
functionalities (concerning my tasks).


Does one of you have any advice how to optimize the eigenvalue 
calculation of sparse matrices in R?


Thanks in advance,
Holger

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] On R performance

2012-03-10 Thread Justin Talbot
>
> On 8 March 2012 at 11:06, Justin Talbot wrote:
> | I've been working on an R performance academic project for the last
> | couple years which has involved writing an interpreter for R from
> | scratch and a JIT for R vector operations.
>
> Cool.  I think John mention that once or twice and I promptly forgot.
>
> Can you share some numbers?
>

Sure, I'll give a quick summary. We're writing a paper on it right now
which will have more details.

We currently execute scalar R code (non-vectorized) through an
interpreter we wrote from scratch. We haven't put a whole lot of time
into it; it supports most of the important R semantics, but does not
yet implement most of the functions in src/names.c, which limits the
scope of real world code we can run. On a set of microbenchmarks
(careful what you conclude from microbenchmarks!) it runs about 4-5x
faster than Luke's bytecode interpreter.

The interpreter is still about 3-4x slower than LuaJIT's interpreter,
probably the fastest dynamic language interpreter out there, so there
is room for further improvement, but not a lot. (Lua is a much cleaner
language from the performance standpoint and LuaJIT's interpreter is
written in assembly. We don't anticipate doing that anytime soon.)

We execute vectorized code through a JIT that generates SSE code and
can parallelize across multiple cores. Performance here depends
greatly on the vector size and number of vectors since our performance
gain primarily comes from eliminating memory accesses. For long
vectors (1M+ elements) we've seen gains from about 5x-50x on a single
core for plausible workloads. We don't have good numbers on the
parallelization yet, but we have seen linear scalability out to 32
cores for a couple of our workloads. Scalability is clearly very task
dependent and we don't expect to get large numbers across the board.

One implication of having a JIT is that we now implement a lot of
functionality at the R level rather than in C functions. For example,
we implement matrix-vector multiplication as:

r <- 0
for(i in 1L:ncol(m)) {
   r <- r + m[,i]*v[[i]]
}

This beats R's built-in matrix-vector multiplication by a factor of 2
for "large" matrices (at least one dimension larger than 1000 or so)
and will parallelize without any more work from the programmer. With
more work to squeeze out our JIT overheads this could be effective
even for much smaller matrices.


> | So why is R performance poor now? I think the fundamental reason is
> | related to software engineering: R is nearly impossible to experiment
> | with, so no one tries out new performance techniques on it. There are
>
> Did you compare notes with the CXXR project by Andrew Runnalls and his
> student(s)?  See http://www.cs.kent.ac.uk/projects/cxxr/
>

I haven't talked to them, but I should! Looking at their slides it
looks like their approach will be effective at making the R core more
extensible, but it's somewhat antagonistic to pure interpreter
performance. I didn't see any performance numbers, but I would guess
that CXXR runs somewhat slower than the current interpreter.

The key for being able to experiment with performance is for the core
code to be small and well defined, not necessarily extensible.

> | I see little value is debating changes to the language semantics until
> | we've addressed this low hanging fruit and at least tried to make the
> | current R/S semantics run fast.
>
> Fully agree.
>
> I'd add that helping expand R via the FFI also works, though it is of course
> not as easy on the end user as making the core faster.
>

FFI is extremely important and Rcpp is a great step forward. I'll just
note that FFI and performance interact. An FFI like .Call/.External
exposes too much of R's internal implementation details to users,
making it difficult to improve performance in the core while
maintaining backwards compatibility. It would be much better if R's
high-performance FFI were something like Rcpp itself, hiding almost
all implementation details from the user.

Just one example on FFIs. .Call/.External lets users get raw pointers
to vector data (e.g. NUMERIC_POINTER). This is fine and dandy as long
as all implementations store vectors contiguously in memory. But, some
implementations may not want this. For example, Clojure gets
high-performance updates to its immutable arrays by storing them in a
tree data structure instead of flat in memory. This would be a nice
technique to port to R, but it breaks .Call packages. A better FFI
choice would have used something like NUMERIC_ELEMENT(x,i) to hide the
details of how element i is looked up in vector x. This would have
been just as fast for current packages while leaving a forward path
for more performance improvements.

Justin

Justin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] On R performance

2012-03-10 Thread Justin Talbot
>
> Isn't R much like Lisp under the covers? Afterall, it evolved from Scheme.
> Hasn't there been a great deal of work done on optimizing Lisp over the
> last 30 years? This suggests that instead of dropping the R/S semantics
> and moving to another language like Julia, the proposals of Ross Ihaka
> and Duncan Temple Lang could be followed to provide the familiar
> R/S syntax on top of an optimized Lisp engine.
>

I think R started off as a Lisp-like language, but since adopting S
semantics, it has diverged quite a ways. I think it's better to think
of R as a combination of two languages: a dynamically-typed high-level
language, much like Javascript or Lua, and an array language, like
APL. I think those are the right places to be looking to see how to
make R fast. Fortunately, all three of those languages have had a lot
of performance work done already that R could just steal from
wholesale.

> Another possibility is to implement R/S on top of an optimized virtual
> machine like the JVM, LLVM, etc.
>

I like this in theory. But in practice, I'm not sure how well it would
work for R. JVM implementations of dynamic languages, like JRuby and
Jython run marginally faster (30-40%) than their C interpreters. You
do get the Java ecosystem, which is nice, but the performance
improvements probably aren't enough to make it worthwhile. And, of
course, R already has a pretty good Java connection story.

LLVM is a better option; I know there's another group out there
looking at R on LLVM. But I'll just note that the really high
performance dynamic languages (e.g. Google's V8 implementation of
Javascript and Mike Pall's LuaJIT) are hand-rolled JITs. LLVM-based
implementations of dynamic languages, like Unladen Swallow, have not
been particularly successful. It remains to be seen how well R would
map to LLVM.

Justin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] On R performance

2012-03-10 Thread Dominick Samperi
On Fri, Mar 9, 2012 at 11:39 AM, Justin Talbot  wrote:
>>
>> Isn't R much like Lisp under the covers? Afterall, it evolved from Scheme.
>> Hasn't there been a great deal of work done on optimizing Lisp over the
>> last 30 years? This suggests that instead of dropping the R/S semantics
>> and moving to another language like Julia, the proposals of Ross Ihaka
>> and Duncan Temple Lang could be followed to provide the familiar
>> R/S syntax on top of an optimized Lisp engine.
>>
>
> I think R started off as a Lisp-like language, but since adopting S
> semantics, it has diverged quite a ways. I think it's better to think
> of R as a combination of two languages: a dynamically-typed high-level
> language, much like Javascript or Lua, and an array language, like
> APL. I think those are the right places to be looking to see how to
> make R fast. Fortunately, all three of those languages have had a lot
> of performance work done already that R could just steal from
> wholesale.

Thanks for the clarification Justin. What about the S4 classes
and methods? The design resembles CLOS, and currently this
is interpreted R code. Have you addressed performance issues
associated with this? What relative impact does this have compared
with other optimizations like vectorization?

Thanks,
Dominick

>> Another possibility is to implement R/S on top of an optimized virtual
>> machine like the JVM, LLVM, etc.
>>
>
> I like this in theory. But in practice, I'm not sure how well it would
> work for R. JVM implementations of dynamic languages, like JRuby and
> Jython run marginally faster (30-40%) than their C interpreters. You
> do get the Java ecosystem, which is nice, but the performance
> improvements probably aren't enough to make it worthwhile. And, of
> course, R already has a pretty good Java connection story.
>
> LLVM is a better option; I know there's another group out there
> looking at R on LLVM. But I'll just note that the really high
> performance dynamic languages (e.g. Google's V8 implementation of
> Javascript and Mike Pall's LuaJIT) are hand-rolled JITs. LLVM-based
> implementations of dynamic languages, like Unladen Swallow, have not
> been particularly successful. It remains to be seen how well R would
> map to LLVM.
>
> Justin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Task View on Numerical Analysis and Differential Equations

2012-03-10 Thread Achim Zeileis

Hans,

thanks for the suggestion and sorry for coming so late to the thread. I 
have only a little to add to Ravi's comments.


I like the idea overall.  Nothing prevents you from doing this, but you 
should be committed to getting it up and running and then keeping it 
updated.


Yes completely d'accord with all comments.

If you want to put together a .ctv, please have a look at

vignette("ctv-howto", package = "ctv")

and send me the resulting .ctv.

In addition to concurring with Berend's comments, I also would like to 
mention that numerical analysis (NA) is an extremely broad category.  I 
am not sure how to restrict that in order to make the taskview 
manageable.


I also completely agree with this. I'm not sure what the best way to focus 
the task view is. Maybe differential equations should be in a separate 
(relatively small) view? The experience is that very large task views 
become harder to maintain in the future because the number of packages 
will grow over time.


In any case, I look forward to seeing a new task view.
Thanks & best regards,
Z

How about including (i) convergence acceleration of sequences, (ii) 
fourier transforms/wavelets, (iii) function approximation and special 
functions of mathematical physics?


Best regards,
Ravi

P.S. I have a package called "eiginv" for solving certain types of inverse 
eigenvalue problems (e.g., generating a matrix of a specific structure with a given set 
of eigenvalues)


-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf Of Berend Hasselman
Sent: Thursday, March 08, 2012 9:59 AM
To: Hans W Borchers
Cc: r-devel@r-project.org
Subject: Re: [Rd] Task View on Numerical Analysis and Differential Equations


On 08-03-2012, at 12:16, Hans W Borchers wrote:


I am wondering if it would be time to have a new Task View, this time
for the subject of "Numerical Analysis and Differential Equations".
The list of packages possibly appearing in such a task view is already
quite long and could, for example, include:

Numerical Analysis and Linear Algebra

BesselBessel Functions Computations and Approximations
cubature  Adaptive multivariate integration over hypercubes
elliptic  Elliptic and related functions
expm  Matrix exponential, logarithm, etc.
fdim  Functions for calculating fractal dimension
gaussquad Collection of functions for Gaussian quadrature
gmp   Multiple precision arithmetic
gsl   Wrapper for the Gnu Scientific Library
hypergeo  The hypergeometric function
irlba Fast partial SVD by Lanczos bidiagonalization
matlabMATLAB emulation package
multipol  Multivariate polynomials
numDeriv  Accurate numerical derivatives
onion Octonions and quaternions
orthogonalsplinebasis  Orthogonal Bspline basis functions orthopolynom
Functions for orthogonal and orthonormal polynomials
polspline Polynomial spline routines
polynom   Implement a class for univariate polynomial manipulations
PolynomF  Polynomials in R
pracmaPractical numerical math functions
pspline   Penalized smoothing splines
quaternions   Arithmetics and linear algebra with quaternions
R2CubaMultidimensional numerical integration
RcppArmadillo Rcpp integration for Armadillo templated linear algebra library
RcppEigen Rcpp integration for the Eigen templated linear algebra 
library
RcppOctaveRcpp integration of Octave
R.matlab  Read and write of MAT files and R-to-Matlab connectivity
Rmpfr Multiple precision floating-point reliable
sparseGridSparse grid integration in R
spuRs Functions and datasets scientific programming and simulation
sspline   Smoothing splines on the sphere
stinepack Stineman: consistently well behaved method of interpolation
svd   Interfaces to various state-of-art SVD and eigensolvers
voronoi   Methods and applications related to Voronoi tessellations
wavelets...

Simulation and Differential Equations

bvpSolve  Solvers for boundary value problems of ODEs
ddesolve  Solver for Delay Differential Equations
deSolve   General solvers for initial value problems of ordinary
 differential equations (ODE), partial differential equations
 (PDE), differential algebraic equations (DAE), and delay
 differential equations (DDE)
deTestSet Testset for differential equations
odesolve  Solvers for Ordinary Differential Equations
PBSddesolve   Solver for Delay Differential Equations
rootSolve Root finding, equilibrium and steady-state analysis of ODEs
sde   Simulation and Inference for Stochastic Differential Equations
Sim.DiffProc  Simulation of diffusion processes
simecol   Simulation of ecological and other dynamic systems

and probably many more in the end. I left out the optimization
packages deliberately, but of course there would be a strong hint to
that task view.


Re: [Rd] Eigenvalue calculation of sparse matrices

2012-03-10 Thread Kjetil Halvorsen
Hola!

This can be done with the CRAN package igraph, which contains (part
of) the arpack
library for computing only some eigenvalues/eigenvectors of sparse
matrices. arpack gives you the option of computing a few of the
smallest or a few of the largest eigenvalues/vectors.

You will need yourself to wrte a function doing matrix-vector
multiplication, so the arpack
methods itself is independent of the implementation of your sparse matrix.

Kjetil

On Fri, Mar 9, 2012 at 9:09 AM, Holger Diedrich
 wrote:
> Dear all,
>
> I am currently working on the calculation of eigenvalues (and -vectors) of
> large matrices. Since these are mostly sparse matrices and I remember some
> specific functionalities in MATLAB for sparse matrices, I started a research
> how to optimize the calculation of eigenvalues of a sparse matrix.
> The function eigen itself works with the LAPACK library which has no special
> handling for sparse matrices, same for the EISPACK library. The ARPACK
> library is capable to work with sparse matrices but I couldn't find any
> useful R package using ARPACK.
> The Matrix package can handle sparse matrices but has no further useful
> functionalities (concerning my tasks).
>
> Does one of you have any advice how to optimize the eigenvalue calculation
> of sparse matrices in R?
>
> Thanks in advance,
> Holger
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel