On 12/6/24 08:58, Avraham Adler wrote:
Sent from my iPhone

On Dec 5, 2024, at 4:11 PM, Sokol Serguei <serguei.so...@gmail.com> wrote:

Luc,

There can be many reasons explaining the difference in compiled code 
performances. Tuning such code to achieve a pick performance is generally a 
fine art.
Optimizations techniques can include but are not limited to:
  - SIMD instructions (and memory alignment for their optimal use);
  - instruction level parallelism;
  - unrolling loops;
  - cache level (mis-)hits;
  - multi-thread parallelism;
  - ...
Approaches in optimization are not the same depending on kind of application: 
CPU-bound, memory-bound or IO-bound.
Many of this techniques can be directly used (or not) by compiler depending on 
chosen options. Are you sure to use the same options and compiler that were 
used during R compilation?
And finally, the compared code could be plainly not the same. R can use BLAS call, e.g. 
OpenBLAS to multiply two matrices. This latter is heavily optimized for such operations 
and can achieve x10 acceleration compared to plain "naive" BLAS.
The R code you cite can be just the code for a fallback in case no BLAS was 
found during R compilation.
Look at what your sessionInfo() says about used BLAS.
That doesn’t always work. I build R on Windows (10) linking to a pre-compiled 
static OpenBLAS (3.28) and my sessionInfo has an empty string for BLAS. I 
reckon that is because I’m using Rblas.dll, it’s just that my Rblas isn’t 
vanilla.
Right, the BLAS/LAPACK detection in sessionInfo() is only implemented for Unix, tested on Linux and macOS.

Tomas


Avi

Best,
Serguei.

Le 05/12/2024 à 14:21, Luc De Wilde a écrit :
Dear package developers,

in creating a package lavaanC for use in lavaan, I need to perform some matrix 
computations involving matrix products and crossproducts. As far as I see I 
cannot directly call the C code in the R core. So I copied the code in the R 
core, but the same C/C++ code in a package is 2.5 à 3 times slower than 
executed directly in R :

C code in package :
   SEXP prod0(SEXP mat1, SEXP mat2) {
     SEXP u1 = Rf_getAttrib(mat1, R_DimSymbol);
     int m1 = INTEGER(u1)[0];
     int n1 = INTEGER(u1)[1];
     SEXP u2 = Rf_getAttrib(mat2, R_DimSymbol);
     int m2 = INTEGER(u2)[0];
     int n2 = INTEGER(u2)[1];
     if (n1 != m2) Rf_error("matrices not conforming");
     SEXP retval = PROTECT(Rf_allocMatrix(REALSXP, m1, n2));
     double* left = REAL(mat1);
     double* right = REAL(mat2);
     double* ret = REAL(retval);
     double werk = 0.0;
     for (int j = 0; j < n2; j++) {
       for (int i = 0; i < m1; i++) {
           werk = 0.0;
         for (int k = 0; k < n1; k++) werk += (left[i + m1 * k] * right[k + m2 
* j]);
         ret[j * m1 + i] =  werk;
       }
     }
     UNPROTECT(1);
     return retval;
   }

Test script :
m1 <- matrix(rnorm(300000), nrow = 60)
m2 <- matrix(rnorm(300000), ncol = 60)
print(microbenchmark::microbenchmark(
   m1 %*% m2, .Call("prod0", m1, m2), times = 100
))

Result on my pc:
Unit: milliseconds
                    expr     min      lq     mean  median       uq     max neval
               m1 %*% m2 10.5650 10.8967 11.13434 10.9449 11.02965 15.8397   100
  .Call("prod0", m1, m2) 29.3336 30.7868 32.05114 31.0408 33.85935 45.5321   100


Can anyone explain why the compiled code in the package is so much slower than 
in R core?

and

Is there a way to improve the performance in R package?


Best regards,

Luc De Wilde



______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to