Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table)

2023-10-24 Thread Helske, Jouni
Thanks for the help, I now tried resubmitting with 
Sys.setenv("OMP_THREAD_LIMIT" = 2) at the top of the exchange example, but I 
still get the same note:

Examples with CPU time > 2.5 times elapsed time
  user system elapsed ratio
exchange 1.196   0.04   0.159 7.774

Not sure what to try next.

Best,
Jouni

From: Ivan Krylov 
Sent: Friday, October 20, 2023 16:54
To: Helske, Jouni 
Cc: r-package-devel@r-project.org 
Subject: Re: [R-pkg-devel] Too many cores used in examples (not caused by 
data.table)

� Thu, 19 Oct 2023 05:57:54 +
"Helske, Jouni"  �:

> But I just realised that bssm uses Armadillo via RcppArmadillo, which
> uses OpenMP by default for some elementwise operations. So, I wonder
> if that could be the culprit?

I wasn't able to reproduce the NOTE either, despite manually setting
the environment variable
_R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD=2 before running R CMD
check, but I think I can see the code using OpenMP. Here's what I did:

0. Temporarily lower the system protections against capturing
performance traces of potentially sensitive parts:

echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid

(Set it back to 3 after you're done.)

1. Run the following command with the development version of the
package installed:

env OPENBLAS_NUM_THREADS=1 \
 perf record --call-graph drawf,4096 \
 R -e 'library(bssm); system.time(replicate(100, example(exchange)))'

OPENBLAS_NUM_THREADS=1 will prevent OpenBLAS from spawning worker
threads if you have it installed. (A different BLAS may need different
environment variables.)

2. Run `perf report` and browse collected call stack information.

The call stacks are hard to navigate, but I think they are not pointing
towards Armadillo. At least, setting ARMA_OPENMP_THREADS=1 doesn't
help, but setting OMP_THREAD_LIMIT=1 does.

--
Best regards,
Ivan

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table)

2023-10-24 Thread Greg Hunt
In my case recently, after an hour or so’s messing about I disabled some
tests and example executions to get rid of the offending times. I doubt
that i am the only one to do that.

On Tue, 24 Oct 2023 at 9:38 pm, Helske, Jouni  wrote:

> Thanks for the help, I now tried resubmitting with
> Sys.setenv("OMP_THREAD_LIMIT" = 2) at the top of the exchange example, but
> I still get the same note:
>
> Examples with CPU time > 2.5 times elapsed time
>   user system elapsed ratio
> exchange 1.196   0.04   0.159 7.774
>
> Not sure what to try next.
>
> Best,
> Jouni
> 
> From: Ivan Krylov 
> Sent: Friday, October 20, 2023 16:54
> To: Helske, Jouni 
> Cc: r-package-devel@r-project.org 
> Subject: Re: [R-pkg-devel] Too many cores used in examples (not caused by
> data.table)
>
> В Thu, 19 Oct 2023 05:57:54 +
> "Helske, Jouni"  пишет:
>
> > But I just realised that bssm uses Armadillo via RcppArmadillo, which
> > uses OpenMP by default for some elementwise operations. So, I wonder
> > if that could be the culprit?
>
> I wasn't able to reproduce the NOTE either, despite manually setting
> the environment variable
> _R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD=2 before running R CMD
> check, but I think I can see the code using OpenMP. Here's what I did:
>
> 0. Temporarily lower the system protections against capturing
> performance traces of potentially sensitive parts:
>
> echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
>
> (Set it back to 3 after you're done.)
>
> 1. Run the following command with the development version of the
> package installed:
>
> env OPENBLAS_NUM_THREADS=1 \
>  perf record --call-graph drawf,4096 \
>  R -e 'library(bssm); system.time(replicate(100, example(exchange)))'
>
> OPENBLAS_NUM_THREADS=1 will prevent OpenBLAS from spawning worker
> threads if you have it installed. (A different BLAS may need different
> environment variables.)
>
> 2. Run `perf report` and browse collected call stack information.
>
> The call stacks are hard to navigate, but I think they are not pointing
> towards Armadillo. At least, setting ARMA_OPENMP_THREADS=1 doesn't
> help, but setting OMP_THREAD_LIMIT=1 does.
>
> --
> Best regards,
> Ivan
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table)

2023-10-24 Thread Ivan Krylov
В Tue, 24 Oct 2023 10:37:48 +
"Helske, Jouni"  пишет:

> Examples with CPU time > 2.5 times elapsed time
>   user system elapsed ratio
> exchange 1.196   0.04   0.159 7.774

I've downloaded the archived copy of the package from the CRAN FTP
server, installed it and tried:

library(bssm)
Sys.setenv("OMP_THREAD_LIMIT" = 2)
data("exchange")
model <- svm(
 exchange, rho = uniform(0.97,-0.999,0.999),
 sd_ar = halfnormal(0.175, 2), mu = normal(-0.87, 0, 2)
)
system.time(particle_smoother(model, particles = 500))
#user  system elapsed
#   0.515   0.000   0.073

I set a breakpoint on clone() [*] and got quite a few calls creating
OpenMP threads with the following call stack:

#0  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:52
<...>
#4  0x77314e0a in GOMP_parallel () from
/usr/lib/x86_64-linux-gnu/libgomp.so.1
 <-- RcppArmadillo code below
#5 0x738f5f00 in
arma::eglue_core::apply,
arma::eOp, arma::eop_exp>,
arma::eop_scalar_times>, arma::eOp,
arma::eop_scalar_div_post>, arma::eop_square> > (outP=..., x=...) at
.../library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:69
#6 0x73a31246 in
arma::Mat::operator=,
arma::eop_exp>, arma::eop_scalar_times>,
arma::eOp, arma::eop_scalar_div_post>,
arma::eop_square>, arma::eglue_div> (X=..., this=0x7fff36f0) at
.../library/RcppArmadillo/include/armadillo_bits/Proxy.hpp:226
#7
arma::Col::operator=,
arma::eop_exp>, arma::eop_scalar_times>,
arma::eOp, arma::eop_scalar_div_post>,
arma::eop_square>, arma::eglue_div> > ( X=..., this=0x7fff36f0) at
.../library/RcppArmadillo/include/armadillo_bits/Col_meat.hpp:535
 <-- bssm code below
#8  ssm_ung::laplace_iter (this=0x7fff15e0, signal=...) at
model_ssm_ung.cpp:310
#9  0x73a36e9e in ssm_ung::approximate (this=0x7fff15e0) at
.../library/RcppArmadillo/include/armadillo_bits/arrayops_meat.hpp:27
#10 0x73a3b3d3 in ssm_ung::psi_filter
(this=this@entry=0x7fff15e0, nsim=nsim@entry=500, alpha=...,
weights=..., indices=...) at model_ssm_ung.cpp:517
#11 0x73948cd7 in psi_smoother (model_=..., nsim=nsim@entry=500,
seed=seed@entry=1092825895, model_type=model_type@entry=3) at
R_psi.cpp:131

What does arma::eglue_core do?

(gdb) list
/* reformatted a bit */
library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:64
 int n_threads = (std::min)(
  int(arma_config::mp_threads),
  int((std::max)(int(1), int(omp_get_max_threads(
 );
(gdb) p arma_config::mp_threads
$3 = 8
(gdb) p (int)omp_get_max_threads()
$4 = 16
(gdb) p (char*)getenv("OMP_THREAD_LIMIT")
$7 = 0x56576b91 "2"
(gdb) p /x (int)omp_get_thread_limit()
$9 = 0x7fff

Sorry for misinforming you about the OMP_THREAD_LIMIT environment
variable: the OpenMP specification requires the program to ignore
modifications to the environment variables after the program has
started [**], so it only works if R is started with OMP_THREAD_LIMIT
set. Additionally, the OpenMP thread limit is not supposed to be
adjusted at runtime at all [***].

Unfortunately for our situation, Armadillo is very insistent in setting
its own number of threads from arma_config::mp_threads (which is
constexpr 8 unless you set preprocessor directives while compiling it)
and omp_get_max_threads (which is the upper bound on the number of
threads that cannot be adjusted at runtime).

What I'm about to suggest is a terrible hack, but since Armadillo seems
to lack the option to set the number of threads at runtime, there might
be no other option.

Before you #include an Armadillo header, every time:

1. #include  so that the OpenMP functions are declared and the
#include guard is set

2. Define a static inline function get_number_of_threads returning the
desired number of threads as an int (e.g. referencing an extern int
number_of_threads stored elsewhere)

3. #define omp_get_max_threads get_number_of_threads

Now if you provide an API for the R code to get and set this number, it
should be possible to control the number of threads used by OpenMP code
in Armadillo. Basically, a data.table::setDTthreads() for the copy of
Armadillo inlined inside your package.

If you then compile your package with a large #define
ARMA_OPENMP_THREADS, it will both be able to use more than 8 threads
*and* limit itself when needed.

An alternative course of action is compiling your package with #define
ARMA_OPENMP_THREADS 2 and giving up on more OpenMP threads inside calls
to Armadillo.

-- 
Best regards,
Ivan

[*]
https://github.com/tidymodels/textrecipes/pull/251#issuecomment-1775549814

[**]
https://www.openmp.org/spec-html/5.2/openmpch21.html#x432-5921

[***]
https://www.openmp.org/wp-content/uploads/OpenMPRefCard-5-2-web.pdf#page=15

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table)

2023-10-24 Thread Spencer Graves
	  Chapter 15 in Wickham and Bryan, R Packages, discuss "Advanced 
Testing Techniques". Their current section "15.4.1 Skip a test" includes 
the following:



test_that("some long-running thing works", {
  skip_on_cran()
  # test code that can potentially take "a while" to run
})


	  Have you tried writing directly to Jennifer Bryan 
? She and Hadley might be able to get help from the 
CRAN maintainers in getting help with this particular problem AND 
getting more documentation on this in their book ;-)



  hope this helps.
  spencer graves


On 10/24/23 6:03 AM, Greg Hunt wrote:

In my case recently, after an hour or so’s messing about I disabled some
tests and example executions to get rid of the offending times. I doubt
that i am the only one to do that.

On Tue, 24 Oct 2023 at 9:38 pm, Helske, Jouni  wrote:


Thanks for the help, I now tried resubmitting with
Sys.setenv("OMP_THREAD_LIMIT" = 2) at the top of the exchange example, but
I still get the same note:

Examples with CPU time > 2.5 times elapsed time
   user system elapsed ratio
exchange 1.196   0.04   0.159 7.774

Not sure what to try next.

Best,
Jouni

From: Ivan Krylov 
Sent: Friday, October 20, 2023 16:54
To: Helske, Jouni 
Cc: r-package-devel@r-project.org 
Subject: Re: [R-pkg-devel] Too many cores used in examples (not caused by
data.table)

В Thu, 19 Oct 2023 05:57:54 +
"Helske, Jouni"  пишет:


But I just realised that bssm uses Armadillo via RcppArmadillo, which
uses OpenMP by default for some elementwise operations. So, I wonder
if that could be the culprit?


I wasn't able to reproduce the NOTE either, despite manually setting
the environment variable
_R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD=2 before running R CMD
check, but I think I can see the code using OpenMP. Here's what I did:

0. Temporarily lower the system protections against capturing
performance traces of potentially sensitive parts:

echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid

(Set it back to 3 after you're done.)

1. Run the following command with the development version of the
package installed:

env OPENBLAS_NUM_THREADS=1 \
  perf record --call-graph drawf,4096 \
  R -e 'library(bssm); system.time(replicate(100, example(exchange)))'

OPENBLAS_NUM_THREADS=1 will prevent OpenBLAS from spawning worker
threads if you have it installed. (A different BLAS may need different
environment variables.)

2. Run `perf report` and browse collected call stack information.

The call stacks are hard to navigate, but I think they are not pointing
towards Armadillo. At least, setting ARMA_OPENMP_THREADS=1 doesn't
help, but setting OMP_THREAD_LIMIT=1 does.

--
Best regards,
Ivan

 [[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table)

2023-10-24 Thread Dirk Eddelbuettel


On 24 October 2023 at 15:55, Ivan Krylov wrote:
| В Tue, 24 Oct 2023 10:37:48 +
| "Helske, Jouni"  пишет:
| 
| > Examples with CPU time > 2.5 times elapsed time
| >   user system elapsed ratio
| > exchange 1.196   0.04   0.159 7.774
| 
| I've downloaded the archived copy of the package from the CRAN FTP
| server, installed it and tried:
| 
| library(bssm)
| Sys.setenv("OMP_THREAD_LIMIT" = 2)
| data("exchange")
| model <- svm(
|  exchange, rho = uniform(0.97,-0.999,0.999),
|  sd_ar = halfnormal(0.175, 2), mu = normal(-0.87, 0, 2)
| )
| system.time(particle_smoother(model, particles = 500))
| #user  system elapsed
| #   0.515   0.000   0.073
| 
| I set a breakpoint on clone() [*] and got quite a few calls creating
| OpenMP threads with the following call stack:
| 
| #0  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:52
| <...>
| #4  0x77314e0a in GOMP_parallel () from
| /usr/lib/x86_64-linux-gnu/libgomp.so.1
|  <-- RcppArmadillo code below
| #5 0x738f5f00 in
| arma::eglue_core::apply,
| arma::eOp, arma::eop_exp>,
| arma::eop_scalar_times>, arma::eOp,
| arma::eop_scalar_div_post>, arma::eop_square> > (outP=..., x=...) at
| .../library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:69
| #6 0x73a31246 in
| arma::Mat::operator=,
| arma::eop_exp>, arma::eop_scalar_times>,
| arma::eOp, arma::eop_scalar_div_post>,
| arma::eop_square>, arma::eglue_div> (X=..., this=0x7fff36f0) at
| .../library/RcppArmadillo/include/armadillo_bits/Proxy.hpp:226
| #7
| 
arma::Col::operator=,
| arma::eop_exp>, arma::eop_scalar_times>,
| arma::eOp, arma::eop_scalar_div_post>,
| arma::eop_square>, arma::eglue_div> > ( X=..., this=0x7fff36f0) at
| .../library/RcppArmadillo/include/armadillo_bits/Col_meat.hpp:535
|  <-- bssm code below
| #8  ssm_ung::laplace_iter (this=0x7fff15e0, signal=...) at
| model_ssm_ung.cpp:310
| #9  0x73a36e9e in ssm_ung::approximate (this=0x7fff15e0) at
| .../library/RcppArmadillo/include/armadillo_bits/arrayops_meat.hpp:27
| #10 0x73a3b3d3 in ssm_ung::psi_filter
| (this=this@entry=0x7fff15e0, nsim=nsim@entry=500, alpha=...,
| weights=..., indices=...) at model_ssm_ung.cpp:517
| #11 0x73948cd7 in psi_smoother (model_=..., nsim=nsim@entry=500,
| seed=seed@entry=1092825895, model_type=model_type@entry=3) at
| R_psi.cpp:131
| 
| What does arma::eglue_core do?
| 
| (gdb) list
| /* reformatted a bit */
| library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:64
|  int n_threads = (std::min)(
|   int(arma_config::mp_threads),
|   int((std::max)(int(1), int(omp_get_max_threads(
|  );
| (gdb) p arma_config::mp_threads
| $3 = 8
| (gdb) p (int)omp_get_max_threads()
| $4 = 16
| (gdb) p (char*)getenv("OMP_THREAD_LIMIT")
| $7 = 0x56576b91 "2"
| (gdb) p /x (int)omp_get_thread_limit()
| $9 = 0x7fff
| 
| Sorry for misinforming you about the OMP_THREAD_LIMIT environment
| variable: the OpenMP specification requires the program to ignore
| modifications to the environment variables after the program has
| started [**], so it only works if R is started with OMP_THREAD_LIMIT
| set. Additionally, the OpenMP thread limit is not supposed to be
| adjusted at runtime at all [***].
| 
| Unfortunately for our situation, Armadillo is very insistent in setting
| its own number of threads from arma_config::mp_threads (which is
| constexpr 8 unless you set preprocessor directives while compiling it)
| and omp_get_max_threads (which is the upper bound on the number of
| threads that cannot be adjusted at runtime).
| 
| What I'm about to suggest is a terrible hack, but since Armadillo seems
| to lack the option to set the number of threads at runtime, there might
| be no other option.
| 
| Before you #include an Armadillo header, every time:
| 
| 1. #include  so that the OpenMP functions are declared and the
| #include guard is set
| 
| 2. Define a static inline function get_number_of_threads returning the
| desired number of threads as an int (e.g. referencing an extern int
| number_of_threads stored elsewhere)
| 
| 3. #define omp_get_max_threads get_number_of_threads
| 
| Now if you provide an API for the R code to get and set this number, it
| should be possible to control the number of threads used by OpenMP code
| in Armadillo. Basically, a data.table::setDTthreads() for the copy of
| Armadillo inlined inside your package.
| 
| If you then compile your package with a large #define
| ARMA_OPENMP_THREADS, it will both be able to use more than 8 threads
| *and* limit itself when needed.
| 
| An alternative course of action is compiling your package with #define
| ARMA_OPENMP_THREADS 2 and giving up on more OpenMP threads inside calls
| to Armadillo.

We should work on adding such a run-time setter of the number of cores to
RcppArmadillo so that examples can dial down to 2 cores.  I have been doing
just that in package tiledb (via a setting internal to the TileDB Core
library) for 'ages' now and RcppArmadillo could and should offer the s

Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table)

2023-10-24 Thread Jan van der Laan

You are not the only one; I did the same with some of my examples.

Would it be an option to ask for a default R-option, 'max.ncores', that 
specifies the maximum number of cores a process is allowed to use? CRAN 
could then require that that examples, tests and vignettes respect this 
option. That way there would be one uniform option to specify the 
maximum number of cores processes could use. That would also make it 
easier for system administrators to set default values for this (use the 
entire system; or use one code by default on a shared system).


Of course, we package maintainers could do this without involvement of 
R-code or CRAN. We only need to agree on a name and a default value for 
when the option is missing (0 = use all cores; 1 or 2; or ncores-1 ...).


Jan


On 24-10-2023 13:03, Greg Hunt wrote:

In my case recently, after an hour or so’s messing about I disabled some
tests and example executions to get rid of the offending times. I doubt
that i am the only one to do that.

On Tue, 24 Oct 2023 at 9:38 pm, Helske, Jouni  wrote:


Thanks for the help, I now tried resubmitting with
Sys.setenv("OMP_THREAD_LIMIT" = 2) at the top of the exchange example, but
I still get the same note:

Examples with CPU time > 2.5 times elapsed time
   user system elapsed ratio
exchange 1.196   0.04   0.159 7.774

Not sure what to try next.

Best,
Jouni

From: Ivan Krylov 
Sent: Friday, October 20, 2023 16:54
To: Helske, Jouni 
Cc: r-package-devel@r-project.org 
Subject: Re: [R-pkg-devel] Too many cores used in examples (not caused by
data.table)

В Thu, 19 Oct 2023 05:57:54 +
"Helske, Jouni"  пишет:


But I just realised that bssm uses Armadillo via RcppArmadillo, which
uses OpenMP by default for some elementwise operations. So, I wonder
if that could be the culprit?


I wasn't able to reproduce the NOTE either, despite manually setting
the environment variable
_R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD=2 before running R CMD
check, but I think I can see the code using OpenMP. Here's what I did:

0. Temporarily lower the system protections against capturing
performance traces of potentially sensitive parts:

echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid

(Set it back to 3 after you're done.)

1. Run the following command with the development version of the
package installed:

env OPENBLAS_NUM_THREADS=1 \
  perf record --call-graph drawf,4096 \
  R -e 'library(bssm); system.time(replicate(100, example(exchange)))'

OPENBLAS_NUM_THREADS=1 will prevent OpenBLAS from spawning worker
threads if you have it installed. (A different BLAS may need different
environment variables.)

2. Run `perf report` and browse collected call stack information.

The call stacks are hard to navigate, but I think they are not pointing
towards Armadillo. At least, setting ARMA_OPENMP_THREADS=1 doesn't
help, but setting OMP_THREAD_LIMIT=1 does.

--
Best regards,
Ivan

 [[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel