Re: [R] x86 SSE* Pointer Favors

Prof Brian Ripley Fri, 13 Jun 2008 00:32:28 -0700

Let me pick up on

Enabling SSE instructions in addition while building R (yes, you have toenable them explicitly, see man gcc) is possible but does not help muchsince all maths is mostly done in BLAS.

The final part is not true for my 'maths', only for those doing linearalgebra. Enabling use of SSE registers can help with CPU scheduling, andso can have a suprisingly large effect, so if you only run R on a singleCPU type it is worth tuning the code to that CPU (e.g. -mtune=core2)alongside turning up optimization levels.



On Fri, 13 Jun 2008, Ivan Adzhubey wrote:

Hi Ivo,

On Friday 13 June 2008 12:23:06 am ivo welch wrote:

Dear Statisticians--- This is not even an R question, so please
forgive me.  I have so much ignorance in this matter that I do not
know where to begin.  I hope someone can point me to documentation
and/or a sample.


You will sure find some answers to your questions if you look into
R-admin.html file under "Building from source" section. Do a search on BLAS
and you will be presented with some options. Using a bit of R web site search
on the same keyword will give you even more food for thought.

I want to compute a covariance as quickly as non-humanly possible on
an Intel core processor (up to SSE4) under linux.  Alas, I have no
idea how to engage CPU vectorization.  Do I need to use special data
types, or is "double" correct?  Does SSE* understand NaN?  Should I
rely on gcc autodetection of the vectorized meaning of my code, or are
there specific libraries that I should call?


I use Goto BLAS library and it works great. Usually runs 3 to 30 times faster
than the stock R BLAS library, depending on your code. Enabling SSE
instructions in addition while building R (yes, you have to enable them
explicitly, see man gcc) is possible but does not help much since all maths
is mostly done in BLAS.

That said, optimized BLAS libraries give most speed increase with older
processors. Newer crop of multi-core CPUs with large shared caches is much
more difficult to hand-tune code for. You may want to subscribe to Goto BLAS
mailing list for an in-depth discussion. ATLAS community is also very helpful
(I use their code with our AMD CPUs).

What I want to learn about is as simple as it gets:
  typedef double Double;  // or whatever SSE* needs as close equivalent
  Double vector1[N], vector2[N];
  // then fill them with stuff.


R does not have types, everything that does not look like character string or
an integer is treated as double. All arithmetics are always done in double
precision.

  vector3= vector_mult(vector1,vector2, N);
  vector4= sum(vector1, N);

I just need a pointer and/or primer.  PS: If someone knows of a
superfast vectorized implementation of Gentleman's WLS algorithm,
please point me to it, too.  I am still using my old non-vectorized C
routines.


HTH,
Ivan

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] x86 SSE* Pointer Favors

Reply via email to