[Rd] Eigenvalue calculation of sparse matrices
Dear all, I am currently working on the calculation of eigenvalues (and -vectors) of large matrices. Since these are mostly sparse matrices and I remember some specific functionalities in MATLAB for sparse matrices, I started a research how to optimize the calculation of eigenvalues of a sparse matrix. The function eigen itself works with the LAPACK library which has no special handling for sparse matrices, same for the EISPACK library. The ARPACK library is capable to work with sparse matrices but I couldn't find any useful R package using ARPACK. The Matrix package can handle sparse matrices but has no further useful functionalities (concerning my tasks). Does one of you have any advice how to optimize the eigenvalue calculation of sparse matrices in R? Thanks in advance, Holger __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] On R performance
> > On 8 March 2012 at 11:06, Justin Talbot wrote: > | I've been working on an R performance academic project for the last > | couple years which has involved writing an interpreter for R from > | scratch and a JIT for R vector operations. > > Cool. I think John mention that once or twice and I promptly forgot. > > Can you share some numbers? > Sure, I'll give a quick summary. We're writing a paper on it right now which will have more details. We currently execute scalar R code (non-vectorized) through an interpreter we wrote from scratch. We haven't put a whole lot of time into it; it supports most of the important R semantics, but does not yet implement most of the functions in src/names.c, which limits the scope of real world code we can run. On a set of microbenchmarks (careful what you conclude from microbenchmarks!) it runs about 4-5x faster than Luke's bytecode interpreter. The interpreter is still about 3-4x slower than LuaJIT's interpreter, probably the fastest dynamic language interpreter out there, so there is room for further improvement, but not a lot. (Lua is a much cleaner language from the performance standpoint and LuaJIT's interpreter is written in assembly. We don't anticipate doing that anytime soon.) We execute vectorized code through a JIT that generates SSE code and can parallelize across multiple cores. Performance here depends greatly on the vector size and number of vectors since our performance gain primarily comes from eliminating memory accesses. For long vectors (1M+ elements) we've seen gains from about 5x-50x on a single core for plausible workloads. We don't have good numbers on the parallelization yet, but we have seen linear scalability out to 32 cores for a couple of our workloads. Scalability is clearly very task dependent and we don't expect to get large numbers across the board. One implication of having a JIT is that we now implement a lot of functionality at the R level rather than in C functions. For example, we implement matrix-vector multiplication as: r <- 0 for(i in 1L:ncol(m)) { r <- r + m[,i]*v[[i]] } This beats R's built-in matrix-vector multiplication by a factor of 2 for "large" matrices (at least one dimension larger than 1000 or so) and will parallelize without any more work from the programmer. With more work to squeeze out our JIT overheads this could be effective even for much smaller matrices. > | So why is R performance poor now? I think the fundamental reason is > | related to software engineering: R is nearly impossible to experiment > | with, so no one tries out new performance techniques on it. There are > > Did you compare notes with the CXXR project by Andrew Runnalls and his > student(s)? See http://www.cs.kent.ac.uk/projects/cxxr/ > I haven't talked to them, but I should! Looking at their slides it looks like their approach will be effective at making the R core more extensible, but it's somewhat antagonistic to pure interpreter performance. I didn't see any performance numbers, but I would guess that CXXR runs somewhat slower than the current interpreter. The key for being able to experiment with performance is for the core code to be small and well defined, not necessarily extensible. > | I see little value is debating changes to the language semantics until > | we've addressed this low hanging fruit and at least tried to make the > | current R/S semantics run fast. > > Fully agree. > > I'd add that helping expand R via the FFI also works, though it is of course > not as easy on the end user as making the core faster. > FFI is extremely important and Rcpp is a great step forward. I'll just note that FFI and performance interact. An FFI like .Call/.External exposes too much of R's internal implementation details to users, making it difficult to improve performance in the core while maintaining backwards compatibility. It would be much better if R's high-performance FFI were something like Rcpp itself, hiding almost all implementation details from the user. Just one example on FFIs. .Call/.External lets users get raw pointers to vector data (e.g. NUMERIC_POINTER). This is fine and dandy as long as all implementations store vectors contiguously in memory. But, some implementations may not want this. For example, Clojure gets high-performance updates to its immutable arrays by storing them in a tree data structure instead of flat in memory. This would be a nice technique to port to R, but it breaks .Call packages. A better FFI choice would have used something like NUMERIC_ELEMENT(x,i) to hide the details of how element i is looked up in vector x. This would have been just as fast for current packages while leaving a forward path for more performance improvements. Justin Justin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] On R performance
> > Isn't R much like Lisp under the covers? Afterall, it evolved from Scheme. > Hasn't there been a great deal of work done on optimizing Lisp over the > last 30 years? This suggests that instead of dropping the R/S semantics > and moving to another language like Julia, the proposals of Ross Ihaka > and Duncan Temple Lang could be followed to provide the familiar > R/S syntax on top of an optimized Lisp engine. > I think R started off as a Lisp-like language, but since adopting S semantics, it has diverged quite a ways. I think it's better to think of R as a combination of two languages: a dynamically-typed high-level language, much like Javascript or Lua, and an array language, like APL. I think those are the right places to be looking to see how to make R fast. Fortunately, all three of those languages have had a lot of performance work done already that R could just steal from wholesale. > Another possibility is to implement R/S on top of an optimized virtual > machine like the JVM, LLVM, etc. > I like this in theory. But in practice, I'm not sure how well it would work for R. JVM implementations of dynamic languages, like JRuby and Jython run marginally faster (30-40%) than their C interpreters. You do get the Java ecosystem, which is nice, but the performance improvements probably aren't enough to make it worthwhile. And, of course, R already has a pretty good Java connection story. LLVM is a better option; I know there's another group out there looking at R on LLVM. But I'll just note that the really high performance dynamic languages (e.g. Google's V8 implementation of Javascript and Mike Pall's LuaJIT) are hand-rolled JITs. LLVM-based implementations of dynamic languages, like Unladen Swallow, have not been particularly successful. It remains to be seen how well R would map to LLVM. Justin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] On R performance
On Fri, Mar 9, 2012 at 11:39 AM, Justin Talbot wrote: >> >> Isn't R much like Lisp under the covers? Afterall, it evolved from Scheme. >> Hasn't there been a great deal of work done on optimizing Lisp over the >> last 30 years? This suggests that instead of dropping the R/S semantics >> and moving to another language like Julia, the proposals of Ross Ihaka >> and Duncan Temple Lang could be followed to provide the familiar >> R/S syntax on top of an optimized Lisp engine. >> > > I think R started off as a Lisp-like language, but since adopting S > semantics, it has diverged quite a ways. I think it's better to think > of R as a combination of two languages: a dynamically-typed high-level > language, much like Javascript or Lua, and an array language, like > APL. I think those are the right places to be looking to see how to > make R fast. Fortunately, all three of those languages have had a lot > of performance work done already that R could just steal from > wholesale. Thanks for the clarification Justin. What about the S4 classes and methods? The design resembles CLOS, and currently this is interpreted R code. Have you addressed performance issues associated with this? What relative impact does this have compared with other optimizations like vectorization? Thanks, Dominick >> Another possibility is to implement R/S on top of an optimized virtual >> machine like the JVM, LLVM, etc. >> > > I like this in theory. But in practice, I'm not sure how well it would > work for R. JVM implementations of dynamic languages, like JRuby and > Jython run marginally faster (30-40%) than their C interpreters. You > do get the Java ecosystem, which is nice, but the performance > improvements probably aren't enough to make it worthwhile. And, of > course, R already has a pretty good Java connection story. > > LLVM is a better option; I know there's another group out there > looking at R on LLVM. But I'll just note that the really high > performance dynamic languages (e.g. Google's V8 implementation of > Javascript and Mike Pall's LuaJIT) are hand-rolled JITs. LLVM-based > implementations of dynamic languages, like Unladen Swallow, have not > been particularly successful. It remains to be seen how well R would > map to LLVM. > > Justin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Task View on Numerical Analysis and Differential Equations
Hans, thanks for the suggestion and sorry for coming so late to the thread. I have only a little to add to Ravi's comments. I like the idea overall. Nothing prevents you from doing this, but you should be committed to getting it up and running and then keeping it updated. Yes completely d'accord with all comments. If you want to put together a .ctv, please have a look at vignette("ctv-howto", package = "ctv") and send me the resulting .ctv. In addition to concurring with Berend's comments, I also would like to mention that numerical analysis (NA) is an extremely broad category. I am not sure how to restrict that in order to make the taskview manageable. I also completely agree with this. I'm not sure what the best way to focus the task view is. Maybe differential equations should be in a separate (relatively small) view? The experience is that very large task views become harder to maintain in the future because the number of packages will grow over time. In any case, I look forward to seeing a new task view. Thanks & best regards, Z How about including (i) convergence acceleration of sequences, (ii) fourier transforms/wavelets, (iii) function approximation and special functions of mathematical physics? Best regards, Ravi P.S. I have a package called "eiginv" for solving certain types of inverse eigenvalue problems (e.g., generating a matrix of a specific structure with a given set of eigenvalues) -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Berend Hasselman Sent: Thursday, March 08, 2012 9:59 AM To: Hans W Borchers Cc: r-devel@r-project.org Subject: Re: [Rd] Task View on Numerical Analysis and Differential Equations On 08-03-2012, at 12:16, Hans W Borchers wrote: I am wondering if it would be time to have a new Task View, this time for the subject of "Numerical Analysis and Differential Equations". The list of packages possibly appearing in such a task view is already quite long and could, for example, include: Numerical Analysis and Linear Algebra BesselBessel Functions Computations and Approximations cubature Adaptive multivariate integration over hypercubes elliptic Elliptic and related functions expm Matrix exponential, logarithm, etc. fdim Functions for calculating fractal dimension gaussquad Collection of functions for Gaussian quadrature gmp Multiple precision arithmetic gsl Wrapper for the Gnu Scientific Library hypergeo The hypergeometric function irlba Fast partial SVD by Lanczos bidiagonalization matlabMATLAB emulation package multipol Multivariate polynomials numDeriv Accurate numerical derivatives onion Octonions and quaternions orthogonalsplinebasis Orthogonal Bspline basis functions orthopolynom Functions for orthogonal and orthonormal polynomials polspline Polynomial spline routines polynom Implement a class for univariate polynomial manipulations PolynomF Polynomials in R pracmaPractical numerical math functions pspline Penalized smoothing splines quaternions Arithmetics and linear algebra with quaternions R2CubaMultidimensional numerical integration RcppArmadillo Rcpp integration for Armadillo templated linear algebra library RcppEigen Rcpp integration for the Eigen templated linear algebra library RcppOctaveRcpp integration of Octave R.matlab Read and write of MAT files and R-to-Matlab connectivity Rmpfr Multiple precision floating-point reliable sparseGridSparse grid integration in R spuRs Functions and datasets scientific programming and simulation sspline Smoothing splines on the sphere stinepack Stineman: consistently well behaved method of interpolation svd Interfaces to various state-of-art SVD and eigensolvers voronoi Methods and applications related to Voronoi tessellations wavelets... Simulation and Differential Equations bvpSolve Solvers for boundary value problems of ODEs ddesolve Solver for Delay Differential Equations deSolve General solvers for initial value problems of ordinary differential equations (ODE), partial differential equations (PDE), differential algebraic equations (DAE), and delay differential equations (DDE) deTestSet Testset for differential equations odesolve Solvers for Ordinary Differential Equations PBSddesolve Solver for Delay Differential Equations rootSolve Root finding, equilibrium and steady-state analysis of ODEs sde Simulation and Inference for Stochastic Differential Equations Sim.DiffProc Simulation of diffusion processes simecol Simulation of ecological and other dynamic systems and probably many more in the end. I left out the optimization packages deliberately, but of course there would be a strong hint to that task view.
Re: [Rd] Eigenvalue calculation of sparse matrices
Hola! This can be done with the CRAN package igraph, which contains (part of) the arpack library for computing only some eigenvalues/eigenvectors of sparse matrices. arpack gives you the option of computing a few of the smallest or a few of the largest eigenvalues/vectors. You will need yourself to wrte a function doing matrix-vector multiplication, so the arpack methods itself is independent of the implementation of your sparse matrix. Kjetil On Fri, Mar 9, 2012 at 9:09 AM, Holger Diedrich wrote: > Dear all, > > I am currently working on the calculation of eigenvalues (and -vectors) of > large matrices. Since these are mostly sparse matrices and I remember some > specific functionalities in MATLAB for sparse matrices, I started a research > how to optimize the calculation of eigenvalues of a sparse matrix. > The function eigen itself works with the LAPACK library which has no special > handling for sparse matrices, same for the EISPACK library. The ARPACK > library is capable to work with sparse matrices but I couldn't find any > useful R package using ARPACK. > The Matrix package can handle sparse matrices but has no further useful > functionalities (concerning my tasks). > > Does one of you have any advice how to optimize the eigenvalue calculation > of sparse matrices in R? > > Thanks in advance, > Holger > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel