hi,
I can confirm that MKL even with gcc (and on AMD Opterons) is damn fast!
I tried R-benchmark-25 and MASS-ex
but Intel's own link advisor is rubbish, I mean look at this:

-Wl,--start-group $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a $(MKLROOT)/lib/intel64/libmkl_core.a $(MKLROOT)/lib/intel64/libmkl_gnu_thread.a -Wl,--end-group -ldl -lpthread -lm

above is what you get for mkl static + gnu + libgomp,
try it and it does not work

On 05/03/14 22:41, Anspach, Jonathan P wrote:
Simon,

Thanks for the information and links.  First of all, did you ever resolve your 
problem?  If not, did you file an issue in Intel Premier Support?  That's the 
best way to bring it to our attention.  If you don't want to do that I can try 
to get a compiler or MKL support engineer to look at your Intel Developer Zone 
discussion.  I have no experience with OS X, so I wouldn't be much help.

I got the benchmark script, which I've attached, from Texas Advanced Computing 
Center.  Here are my results (elapsed times, in secs):

                                                                                
                                              gcc build (default)               
  icc/MKL build
Creation, transp., deformation of a 5000x5000 matrix                            
    3.25                                          2.95
5000x5000 normal distributed random matrix ^1000                                
   5.13                                          1.52
Sorting of 14,000,000 random values                                             
                     1.61                                          1.64
5600x5600 cross-product matrix (b = a' * a)                                     
              97.44                                          0.56
Linear regr. over a 4000x4000 matrix (c = a \ b')                               
            46.06                                           0.49
FFT over 4,800,000 random values                                                
                       0.65                                           0.61
Eigenvalues of a 1200x1200 random matrix                                        
              5.55                                           1.37
Determinant of a 5000x5000 random matrix                                        
          34.18                                           0.55
Cholesky decomposition of a 6000x6000 matrix                                    
        37.07                                           0.47
Inverse of a 3200x3200 random matrix                                            
                 29.49                                           0.57
3,500,000 Fibonacci numbers calculation (vector calc)                           
       1.31                                            0.38
Creation of a 6000x6000 Hilbert matrix (matrix calc)                            
         0.77                                             0.99
Grand common divisors of 400,000 pairs (recursion)                              
      0.63                                             0.56
Creation of a 1000x1000 Toeplitz matrix (loops)                                 
            2.24                                             2.34
Escoufier's method on a 90x90 matrix (mixed)                                    
           9.55                                             6.02
Total                                                                           
                                                  274.93                        
                   21.01

Regards,
Jonathan Anspach
Sr. Software Engineer
Intel Corp.
jonathan.p.ansp...@intel.com
713-751-9460


-----Original Message-----
From: Simon Zehnder [mailto:szehn...@uni-bonn.de]
Sent: Wednesday, March 05, 2014 3:55 AM
To: Anspach, Jonathan P
Cc: r-help@r-project.org
Subject: Re: [R] Building R for better performance

Jonathan,

I myself tried something like this - comparing gcc, clang and intel on a Mac. 
From my experiences in HPC on the university cluster (where we also use the 
Xeon Phi, Landeshochleistungscluster University RWTH Aachen), the Intel 
compiler has better code optimization in regard to vectorisation, etc. (clang 
is up to now suffering from a not yet implemented OpenMP library).

Here is a revolutionanalytics article about this topic: 
http://blog.revolutionanalytics.com/2010/06/performance-benefits-of-multithreaded-r.html

As I usually use the Rcpp package for C++ extensions this could give me further 
performance. Though, I already failed when trying to compile R with the Intel 
compiler and linking against the MKL (see my topic in the Intel developer zone: 
http://software.intel.com/en-us/comment/1767418 and my threads on the R-User 
list: https://stat.ethz.ch/pipermail/r-sig-mac/2013-November/010472.html).

So, to your questions:

1) I think that most admins do not even use the Intel compiler to compile R - 
this seems to me rare. There are some people I know they do and I think they 
could be aware of it - but these are only a few. As R is growing in usage and I 
do know from regional user meetings that very large companies start using it in 
their BI units - this should be of interest.

2) I would really welcome this step because compilation with intel (especially 
on a Mac) and linking to the MKL seems to be delicate.

I am interested in the data - so if it is possible send it via the list or 
directly to my account. Further, could you show some code that you used for the 
computations?


Best

Simon


On 04 Mar 2014, at 22:44, Anspach, Jonathan P <jonathan.p.ansp...@intel.com> 
wrote:

Greetings,

I'm a software engineer with Intel.  Recently I've been investigating R performance on 
Intel Xeon and Xeon Phi processors and RH Linux.  I've also compared the performance of R 
built with the Intel compilers and Intel Math Kernel Library to a "default" 
build (no config options) that uses the GNU compilers.  To my dismay, I've found that the 
GNU build always runs on a single CPU core, even during matrix operations.  The Intel 
build runs matrix operations on multiple cores, so it is much faster on those operations. 
 Running the benchmark-2.5 on a 24 core Xeon system, the Intel build is 13x faster than 
the GNU build (21 seconds vs 275 seconds).  Unfortunately, this advantage is not 
documented anywhere that I can see.

Building with the Intel tools is very easy.  Assuming the tools are installed 
in /opt/intel/composerxe, the process is simply (in bash shell):

$ . /opt/intel/composerxe/bin/compilervars.sh intel64 $ ./configure
--with-blas="-L/opt/intel/composerxe/mkl/lib/intel64 -lmkl_intel_lp64
-lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm" --with-lapack
CC=icc CFLAGS=-O2 CXX=icpc CXXFLAGS=-O2 F77=ifort FFLAGS=-O2 FC=ifort
FCFLAGS=-O2 $ make $ make check

My questions are:
1) Do most system admins and/or R installers know about this performance 
difference, and use the Intel tools to build R?
2) Can we add information on the advantage of building with the Intel tools, 
and how to do it, to the installation instructions and FAQ?

I can post my data if anyone is interested.

Thanks,
Jonathan Anspach
Sr. Software Engineer
Intel Corp.
jonathan.p.ansp...@intel.com
713-751-9460

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to