Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table)
Thanks for the help, I now tried resubmitting with Sys.setenv("OMP_THREAD_LIMIT" = 2) at the top of the exchange example, but I still get the same note: Examples with CPU time > 2.5 times elapsed time user system elapsed ratio exchange 1.196 0.04 0.159 7.774 Not sure what to try next. Best, Jouni From: Ivan Krylov Sent: Friday, October 20, 2023 16:54 To: Helske, Jouni Cc: r-package-devel@r-project.org Subject: Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table) � Thu, 19 Oct 2023 05:57:54 + "Helske, Jouni" �: > But I just realised that bssm uses Armadillo via RcppArmadillo, which > uses OpenMP by default for some elementwise operations. So, I wonder > if that could be the culprit? I wasn't able to reproduce the NOTE either, despite manually setting the environment variable _R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD=2 before running R CMD check, but I think I can see the code using OpenMP. Here's what I did: 0. Temporarily lower the system protections against capturing performance traces of potentially sensitive parts: echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid (Set it back to 3 after you're done.) 1. Run the following command with the development version of the package installed: env OPENBLAS_NUM_THREADS=1 \ perf record --call-graph drawf,4096 \ R -e 'library(bssm); system.time(replicate(100, example(exchange)))' OPENBLAS_NUM_THREADS=1 will prevent OpenBLAS from spawning worker threads if you have it installed. (A different BLAS may need different environment variables.) 2. Run `perf report` and browse collected call stack information. The call stacks are hard to navigate, but I think they are not pointing towards Armadillo. At least, setting ARMA_OPENMP_THREADS=1 doesn't help, but setting OMP_THREAD_LIMIT=1 does. -- Best regards, Ivan [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table)
In my case recently, after an hour or so’s messing about I disabled some tests and example executions to get rid of the offending times. I doubt that i am the only one to do that. On Tue, 24 Oct 2023 at 9:38 pm, Helske, Jouni wrote: > Thanks for the help, I now tried resubmitting with > Sys.setenv("OMP_THREAD_LIMIT" = 2) at the top of the exchange example, but > I still get the same note: > > Examples with CPU time > 2.5 times elapsed time > user system elapsed ratio > exchange 1.196 0.04 0.159 7.774 > > Not sure what to try next. > > Best, > Jouni > > From: Ivan Krylov > Sent: Friday, October 20, 2023 16:54 > To: Helske, Jouni > Cc: r-package-devel@r-project.org > Subject: Re: [R-pkg-devel] Too many cores used in examples (not caused by > data.table) > > В Thu, 19 Oct 2023 05:57:54 + > "Helske, Jouni" пишет: > > > But I just realised that bssm uses Armadillo via RcppArmadillo, which > > uses OpenMP by default for some elementwise operations. So, I wonder > > if that could be the culprit? > > I wasn't able to reproduce the NOTE either, despite manually setting > the environment variable > _R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD=2 before running R CMD > check, but I think I can see the code using OpenMP. Here's what I did: > > 0. Temporarily lower the system protections against capturing > performance traces of potentially sensitive parts: > > echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid > > (Set it back to 3 after you're done.) > > 1. Run the following command with the development version of the > package installed: > > env OPENBLAS_NUM_THREADS=1 \ > perf record --call-graph drawf,4096 \ > R -e 'library(bssm); system.time(replicate(100, example(exchange)))' > > OPENBLAS_NUM_THREADS=1 will prevent OpenBLAS from spawning worker > threads if you have it installed. (A different BLAS may need different > environment variables.) > > 2. Run `perf report` and browse collected call stack information. > > The call stacks are hard to navigate, but I think they are not pointing > towards Armadillo. At least, setting ARMA_OPENMP_THREADS=1 doesn't > help, but setting OMP_THREAD_LIMIT=1 does. > > -- > Best regards, > Ivan > > [[alternative HTML version deleted]] > > __ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel > [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table)
В Tue, 24 Oct 2023 10:37:48 + "Helske, Jouni" пишет: > Examples with CPU time > 2.5 times elapsed time > user system elapsed ratio > exchange 1.196 0.04 0.159 7.774 I've downloaded the archived copy of the package from the CRAN FTP server, installed it and tried: library(bssm) Sys.setenv("OMP_THREAD_LIMIT" = 2) data("exchange") model <- svm( exchange, rho = uniform(0.97,-0.999,0.999), sd_ar = halfnormal(0.175, 2), mu = normal(-0.87, 0, 2) ) system.time(particle_smoother(model, particles = 500)) #user system elapsed # 0.515 0.000 0.073 I set a breakpoint on clone() [*] and got quite a few calls creating OpenMP threads with the following call stack: #0 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:52 <...> #4 0x77314e0a in GOMP_parallel () from /usr/lib/x86_64-linux-gnu/libgomp.so.1 <-- RcppArmadillo code below #5 0x738f5f00 in arma::eglue_core::apply, arma::eOp, arma::eop_exp>, arma::eop_scalar_times>, arma::eOp, arma::eop_scalar_div_post>, arma::eop_square> > (outP=..., x=...) at .../library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:69 #6 0x73a31246 in arma::Mat::operator=, arma::eop_exp>, arma::eop_scalar_times>, arma::eOp, arma::eop_scalar_div_post>, arma::eop_square>, arma::eglue_div> (X=..., this=0x7fff36f0) at .../library/RcppArmadillo/include/armadillo_bits/Proxy.hpp:226 #7 arma::Col::operator=, arma::eop_exp>, arma::eop_scalar_times>, arma::eOp, arma::eop_scalar_div_post>, arma::eop_square>, arma::eglue_div> > ( X=..., this=0x7fff36f0) at .../library/RcppArmadillo/include/armadillo_bits/Col_meat.hpp:535 <-- bssm code below #8 ssm_ung::laplace_iter (this=0x7fff15e0, signal=...) at model_ssm_ung.cpp:310 #9 0x73a36e9e in ssm_ung::approximate (this=0x7fff15e0) at .../library/RcppArmadillo/include/armadillo_bits/arrayops_meat.hpp:27 #10 0x73a3b3d3 in ssm_ung::psi_filter (this=this@entry=0x7fff15e0, nsim=nsim@entry=500, alpha=..., weights=..., indices=...) at model_ssm_ung.cpp:517 #11 0x73948cd7 in psi_smoother (model_=..., nsim=nsim@entry=500, seed=seed@entry=1092825895, model_type=model_type@entry=3) at R_psi.cpp:131 What does arma::eglue_core do? (gdb) list /* reformatted a bit */ library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:64 int n_threads = (std::min)( int(arma_config::mp_threads), int((std::max)(int(1), int(omp_get_max_threads( ); (gdb) p arma_config::mp_threads $3 = 8 (gdb) p (int)omp_get_max_threads() $4 = 16 (gdb) p (char*)getenv("OMP_THREAD_LIMIT") $7 = 0x56576b91 "2" (gdb) p /x (int)omp_get_thread_limit() $9 = 0x7fff Sorry for misinforming you about the OMP_THREAD_LIMIT environment variable: the OpenMP specification requires the program to ignore modifications to the environment variables after the program has started [**], so it only works if R is started with OMP_THREAD_LIMIT set. Additionally, the OpenMP thread limit is not supposed to be adjusted at runtime at all [***]. Unfortunately for our situation, Armadillo is very insistent in setting its own number of threads from arma_config::mp_threads (which is constexpr 8 unless you set preprocessor directives while compiling it) and omp_get_max_threads (which is the upper bound on the number of threads that cannot be adjusted at runtime). What I'm about to suggest is a terrible hack, but since Armadillo seems to lack the option to set the number of threads at runtime, there might be no other option. Before you #include an Armadillo header, every time: 1. #include so that the OpenMP functions are declared and the #include guard is set 2. Define a static inline function get_number_of_threads returning the desired number of threads as an int (e.g. referencing an extern int number_of_threads stored elsewhere) 3. #define omp_get_max_threads get_number_of_threads Now if you provide an API for the R code to get and set this number, it should be possible to control the number of threads used by OpenMP code in Armadillo. Basically, a data.table::setDTthreads() for the copy of Armadillo inlined inside your package. If you then compile your package with a large #define ARMA_OPENMP_THREADS, it will both be able to use more than 8 threads *and* limit itself when needed. An alternative course of action is compiling your package with #define ARMA_OPENMP_THREADS 2 and giving up on more OpenMP threads inside calls to Armadillo. -- Best regards, Ivan [*] https://github.com/tidymodels/textrecipes/pull/251#issuecomment-1775549814 [**] https://www.openmp.org/spec-html/5.2/openmpch21.html#x432-5921 [***] https://www.openmp.org/wp-content/uploads/OpenMPRefCard-5-2-web.pdf#page=15 __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table)
Chapter 15 in Wickham and Bryan, R Packages, discuss "Advanced Testing Techniques". Their current section "15.4.1 Skip a test" includes the following: test_that("some long-running thing works", { skip_on_cran() # test code that can potentially take "a while" to run }) Have you tried writing directly to Jennifer Bryan ? She and Hadley might be able to get help from the CRAN maintainers in getting help with this particular problem AND getting more documentation on this in their book ;-) hope this helps. spencer graves On 10/24/23 6:03 AM, Greg Hunt wrote: In my case recently, after an hour or so’s messing about I disabled some tests and example executions to get rid of the offending times. I doubt that i am the only one to do that. On Tue, 24 Oct 2023 at 9:38 pm, Helske, Jouni wrote: Thanks for the help, I now tried resubmitting with Sys.setenv("OMP_THREAD_LIMIT" = 2) at the top of the exchange example, but I still get the same note: Examples with CPU time > 2.5 times elapsed time user system elapsed ratio exchange 1.196 0.04 0.159 7.774 Not sure what to try next. Best, Jouni From: Ivan Krylov Sent: Friday, October 20, 2023 16:54 To: Helske, Jouni Cc: r-package-devel@r-project.org Subject: Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table) В Thu, 19 Oct 2023 05:57:54 + "Helske, Jouni" пишет: But I just realised that bssm uses Armadillo via RcppArmadillo, which uses OpenMP by default for some elementwise operations. So, I wonder if that could be the culprit? I wasn't able to reproduce the NOTE either, despite manually setting the environment variable _R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD=2 before running R CMD check, but I think I can see the code using OpenMP. Here's what I did: 0. Temporarily lower the system protections against capturing performance traces of potentially sensitive parts: echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid (Set it back to 3 after you're done.) 1. Run the following command with the development version of the package installed: env OPENBLAS_NUM_THREADS=1 \ perf record --call-graph drawf,4096 \ R -e 'library(bssm); system.time(replicate(100, example(exchange)))' OPENBLAS_NUM_THREADS=1 will prevent OpenBLAS from spawning worker threads if you have it installed. (A different BLAS may need different environment variables.) 2. Run `perf report` and browse collected call stack information. The call stacks are hard to navigate, but I think they are not pointing towards Armadillo. At least, setting ARMA_OPENMP_THREADS=1 doesn't help, but setting OMP_THREAD_LIMIT=1 does. -- Best regards, Ivan [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table)
On 24 October 2023 at 15:55, Ivan Krylov wrote: | В Tue, 24 Oct 2023 10:37:48 + | "Helske, Jouni" пишет: | | > Examples with CPU time > 2.5 times elapsed time | > user system elapsed ratio | > exchange 1.196 0.04 0.159 7.774 | | I've downloaded the archived copy of the package from the CRAN FTP | server, installed it and tried: | | library(bssm) | Sys.setenv("OMP_THREAD_LIMIT" = 2) | data("exchange") | model <- svm( | exchange, rho = uniform(0.97,-0.999,0.999), | sd_ar = halfnormal(0.175, 2), mu = normal(-0.87, 0, 2) | ) | system.time(particle_smoother(model, particles = 500)) | #user system elapsed | # 0.515 0.000 0.073 | | I set a breakpoint on clone() [*] and got quite a few calls creating | OpenMP threads with the following call stack: | | #0 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:52 | <...> | #4 0x77314e0a in GOMP_parallel () from | /usr/lib/x86_64-linux-gnu/libgomp.so.1 | <-- RcppArmadillo code below | #5 0x738f5f00 in | arma::eglue_core::apply, | arma::eOp, arma::eop_exp>, | arma::eop_scalar_times>, arma::eOp, | arma::eop_scalar_div_post>, arma::eop_square> > (outP=..., x=...) at | .../library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:69 | #6 0x73a31246 in | arma::Mat::operator=, | arma::eop_exp>, arma::eop_scalar_times>, | arma::eOp, arma::eop_scalar_div_post>, | arma::eop_square>, arma::eglue_div> (X=..., this=0x7fff36f0) at | .../library/RcppArmadillo/include/armadillo_bits/Proxy.hpp:226 | #7 | arma::Col::operator=, | arma::eop_exp>, arma::eop_scalar_times>, | arma::eOp, arma::eop_scalar_div_post>, | arma::eop_square>, arma::eglue_div> > ( X=..., this=0x7fff36f0) at | .../library/RcppArmadillo/include/armadillo_bits/Col_meat.hpp:535 | <-- bssm code below | #8 ssm_ung::laplace_iter (this=0x7fff15e0, signal=...) at | model_ssm_ung.cpp:310 | #9 0x73a36e9e in ssm_ung::approximate (this=0x7fff15e0) at | .../library/RcppArmadillo/include/armadillo_bits/arrayops_meat.hpp:27 | #10 0x73a3b3d3 in ssm_ung::psi_filter | (this=this@entry=0x7fff15e0, nsim=nsim@entry=500, alpha=..., | weights=..., indices=...) at model_ssm_ung.cpp:517 | #11 0x73948cd7 in psi_smoother (model_=..., nsim=nsim@entry=500, | seed=seed@entry=1092825895, model_type=model_type@entry=3) at | R_psi.cpp:131 | | What does arma::eglue_core do? | | (gdb) list | /* reformatted a bit */ | library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:64 | int n_threads = (std::min)( | int(arma_config::mp_threads), | int((std::max)(int(1), int(omp_get_max_threads( | ); | (gdb) p arma_config::mp_threads | $3 = 8 | (gdb) p (int)omp_get_max_threads() | $4 = 16 | (gdb) p (char*)getenv("OMP_THREAD_LIMIT") | $7 = 0x56576b91 "2" | (gdb) p /x (int)omp_get_thread_limit() | $9 = 0x7fff | | Sorry for misinforming you about the OMP_THREAD_LIMIT environment | variable: the OpenMP specification requires the program to ignore | modifications to the environment variables after the program has | started [**], so it only works if R is started with OMP_THREAD_LIMIT | set. Additionally, the OpenMP thread limit is not supposed to be | adjusted at runtime at all [***]. | | Unfortunately for our situation, Armadillo is very insistent in setting | its own number of threads from arma_config::mp_threads (which is | constexpr 8 unless you set preprocessor directives while compiling it) | and omp_get_max_threads (which is the upper bound on the number of | threads that cannot be adjusted at runtime). | | What I'm about to suggest is a terrible hack, but since Armadillo seems | to lack the option to set the number of threads at runtime, there might | be no other option. | | Before you #include an Armadillo header, every time: | | 1. #include so that the OpenMP functions are declared and the | #include guard is set | | 2. Define a static inline function get_number_of_threads returning the | desired number of threads as an int (e.g. referencing an extern int | number_of_threads stored elsewhere) | | 3. #define omp_get_max_threads get_number_of_threads | | Now if you provide an API for the R code to get and set this number, it | should be possible to control the number of threads used by OpenMP code | in Armadillo. Basically, a data.table::setDTthreads() for the copy of | Armadillo inlined inside your package. | | If you then compile your package with a large #define | ARMA_OPENMP_THREADS, it will both be able to use more than 8 threads | *and* limit itself when needed. | | An alternative course of action is compiling your package with #define | ARMA_OPENMP_THREADS 2 and giving up on more OpenMP threads inside calls | to Armadillo. We should work on adding such a run-time setter of the number of cores to RcppArmadillo so that examples can dial down to 2 cores. I have been doing just that in package tiledb (via a setting internal to the TileDB Core library) for 'ages' now and RcppArmadillo could and should offer the s
Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table)
You are not the only one; I did the same with some of my examples. Would it be an option to ask for a default R-option, 'max.ncores', that specifies the maximum number of cores a process is allowed to use? CRAN could then require that that examples, tests and vignettes respect this option. That way there would be one uniform option to specify the maximum number of cores processes could use. That would also make it easier for system administrators to set default values for this (use the entire system; or use one code by default on a shared system). Of course, we package maintainers could do this without involvement of R-code or CRAN. We only need to agree on a name and a default value for when the option is missing (0 = use all cores; 1 or 2; or ncores-1 ...). Jan On 24-10-2023 13:03, Greg Hunt wrote: In my case recently, after an hour or so’s messing about I disabled some tests and example executions to get rid of the offending times. I doubt that i am the only one to do that. On Tue, 24 Oct 2023 at 9:38 pm, Helske, Jouni wrote: Thanks for the help, I now tried resubmitting with Sys.setenv("OMP_THREAD_LIMIT" = 2) at the top of the exchange example, but I still get the same note: Examples with CPU time > 2.5 times elapsed time user system elapsed ratio exchange 1.196 0.04 0.159 7.774 Not sure what to try next. Best, Jouni From: Ivan Krylov Sent: Friday, October 20, 2023 16:54 To: Helske, Jouni Cc: r-package-devel@r-project.org Subject: Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table) В Thu, 19 Oct 2023 05:57:54 + "Helske, Jouni" пишет: But I just realised that bssm uses Armadillo via RcppArmadillo, which uses OpenMP by default for some elementwise operations. So, I wonder if that could be the culprit? I wasn't able to reproduce the NOTE either, despite manually setting the environment variable _R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD=2 before running R CMD check, but I think I can see the code using OpenMP. Here's what I did: 0. Temporarily lower the system protections against capturing performance traces of potentially sensitive parts: echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid (Set it back to 3 after you're done.) 1. Run the following command with the development version of the package installed: env OPENBLAS_NUM_THREADS=1 \ perf record --call-graph drawf,4096 \ R -e 'library(bssm); system.time(replicate(100, example(exchange)))' OPENBLAS_NUM_THREADS=1 will prevent OpenBLAS from spawning worker threads if you have it installed. (A different BLAS may need different environment variables.) 2. Run `perf report` and browse collected call stack information. The call stacks are hard to navigate, but I think they are not pointing towards Armadillo. At least, setting ARMA_OPENMP_THREADS=1 doesn't help, but setting OMP_THREAD_LIMIT=1 does. -- Best regards, Ivan [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel