[R-pkg-devel] multithreading in packages
I am considering adding multithreading support in my package, and would appreciate any suggestions/comments/opinions on what is the right way to do this. * My understanding from reading documentation and source code is that there is no dedicated support in R yet, but there are packages that use multithreading. Are there any plans for multithreading support in future R versions ? * pthread or openmp ? I am particularly concerned about interaction with other packages. I have seen that using pthread and openmp libraries simultaneously can result in incorrectly pinned threads. * control of maximum number of threads. One can default to openmp environment variable, but these might vary between openmp implementations. thank you very much Vladimir Dergachev __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] [External] Formula modeling
On Fri, 8 Oct 2021, pikappa.de...@gmail.com wrote: Hi, The different environments can potentially be an issue in the future. I was not aware of the vector construction notation, and I think this is what I was mainly looking for. I could provide two initialization methods. One will use the ugly vector notation that one could use to bind the whole model with a particular environment. The second can be more user-friendly and use the comma-separated list of formulas. Essentially, the second will prepare the vector formula and call the first initialization method. The (|) operator comment makes sense, and I would also want to avoid this to the extent that it is feasible. So, I am currently thinking something along the line: c(d, s, p | subject | time) ~ c(p + x + y, p + w + y, z + y) From a perspective of a person that does not use formulas outside of xyplot() and glm(), this is a bit hard to parse visually. One could imagine making a mistake that s corresponds to x, rather than p+w+y. I wonder if there is a way to write something along the lines of ~c( d~p+x+y, s~p+w+y, p~z+y |subject | time ) A quick experiment with R shows that this is treated like a formula, so ~c becomes a way to group formulas. best Vladimir Dergachev This is very similar to how the function ?lme4::lmer uses the bar to separate expressions for design matrices from grouping factors. Actually, the subject and time variables are needed for subsetting prices for various operations required for the model matrix. Thanks for the suggestions; they are very helpful! Best, Pantelis -Original Message- From: Duncan Murdoch Sent: Friday, October 8, 2021 2:04 AM To: Richard M. Heiberger ; pikappa.de...@gmail.com Cc: r-package-devel@r-project.org Subject: Re: [R-pkg-devel] [External] Formula modeling On 07/10/2021 5:58 p.m., Duncan Murdoch wrote: I don't work with models like this, but I would find it more natural to express the multiple formulas in a list: list(d ~ p + x + y, s ~ p + w + y, p ~ z + y) I'd really have no idea how either of the proposals below should be parsed. There's a disadvantage to this proposal. I'd assume that "p" means the same in all 3 formulas, but with the notation I give, it could refer to 3 unrelated variables, because each of the formulas would have its own environment, and they could all be different. I guess you could make it a requirement that they all use the same environment, but that's likely going to be confusing to users, who won't know what it means. Another possibility that wouldn't have this problem (but in my opinion is kind of ugly) is to use R vector construction notation: c(d, s, p) ~ c(p + x + y, p + w + y, z + y) Duncan Murdoch Of course, if people working with models like this are used to working with notation like yours, that would be a strong argument to use your notation. Duncan Murdoch On 07/10/2021 5:51 p.m., Richard M. Heiberger wrote: I am responding to a subset of what you asked. There are packages which use multiple formulas in their argument sequence. What you have as a single formula with | as a separator q | p | subject | time | rho ~ p + x + y | p + w + y | z + y I think would be better as a comma-separated list of formulas q , p , subject , time , rho ~ p + x + y , p + w + y , z + y because in R notation | is usually an operator, not a separator. lattice uses formulas and the | is used as a conditioning operator. nlme and lme4 can have multiple formulas in the same calling sequence. lme4 is newer. from its ?lme4-package ‘lme4’ covers approximately the same ground as the earlier ‘nlme’ package. lme4 should probably be the modelyou are looking for for the package design. On Oct 07, 2021, at 17:20, pikappa.de...@gmail.com wrote: Dear R-package-devel subscribers, My question concerns a package design issue relating to the usage of formulas. I am interested in describing via formulas systems of the form: d = p + x + y s = p + w + y p = z + y q = min(d,s). The context in which I am working is that of market models with, primarily, panel data. In the above system, one may think of the first equation as demand, the second as supply, and the third as an equation (co-)determining prices. The fourth equation is implicitly used by the estimation method, and it does not need to be specified when programming the R formula. If you need more information bout the system, you may check the package diseq. Currently, I am using constructors to build market model objects. In a constructor call, I pass [i] the right-hand sides of the first three equations as strings, [ii] an argument indicating whether the equations of the system have correlated shocks, [iii] the identifiers of the used dataset (one for the subjects of the panel and one for time), and [iv] the quantity (q) and price (p) variables. These four arguments contain all the necessary inf
Re: [R-pkg-devel] multithreading in packages
On Sat, 9 Oct 2021, Ivan Krylov wrote: В Thu, 7 Oct 2021 21:58:08 -0400 (EDT) Vladimir Dergachev пишет: * My understanding from reading documentation and source code is that there is no dedicated support in R yet, but there are packages that use multithreading. Are there any plans for multithreading support in future R versions ? Shared memory multithreading is hard to get right in a memory-safe language (e.g. R), but there's the parallel package, which is a part of base R, which offers process-based parallelism and may run your code on multiple machines at the same time. There's no communication _between_ these machines, though. (But I think there's an MPI package on CRAN.) Well, the way I planned to use multitheading is to speedup processing of very large vectors, so one does not have to wait seconds for the command to return. Same could be done for many built-in R primitives. * pthread or openmp ? I am particularly concerned about interaction with other packages. I have seen that using pthread and openmp libraries simultaneously can result in incorrectly pinned threads. pthreads-based code could be harder to run on Windows (which is a first-class platform for R, expected to be supported by most packages). Gábor Csárdi pointed out that R is compiled with mingw on Windows and has pthread support - something I did not know either. OpenMP should be cross-platform, but Apple compilers are sometimes lacking; the latest Apple likely has been solved since I've heard about it. If your problem can be made embarrassingly parallel, you're welcome to use the parallel package. I used parallel before, it is very nice, but R-level only. I am looking for something to speedup response of individual package functions so they themselves can be used of part of more complicated code. * control of maximum number of threads. One can default to openmp environment variable, but these might vary between openmp implementations. Moreover, CRAN-facing tests aren't allowed to consume more than 200% CPU, so it's a good idea to leave the number of workers in control of the user. According to a reference guide I got from openmp.org, OpenMP implementations are expected to understand omp_set_num_threads() and the OMP_NUM_THREADS environment variable. Oh, this would never be run through CRAN tests, it is meant for data that is too big for CRAN. I seem to remember that the Intel compiler used a different environmental variable, but it could be this was fixed since the last time I used it. best Vladimir Dergachev -- Best regards, Ivan __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] multithreading in packages
On Sat, 9 Oct 2021, Gábor Csárdi wrote: On Sat, Oct 9, 2021 at 8:52 AM Ivan Krylov wrote: [...] * pthread or openmp ? I am particularly concerned about interaction with other packages. I have seen that using pthread and openmp libraries simultaneously can result in incorrectly pinned threads. pthreads-based code could be harder to run on Windows (which is a first-class platform for R, expected to be supported by most packages). R uses mingw on windows, and mingw supports pthreads, you don't need to do anything special on Windows. You don't even need a `Makevars`/`Makevars.win` or configure* file just for using pthreads. Great, thank you ! Some CRAN packages do this, you can search here: https://github.com/search?l=C&p=5&q=org%3Acran+pthread_create&type=Code (Some of these are from Unix-specific code, but not all.) Useful link ! I also did a search for cran+omp and this turned up some packages as well. Looks like both openmp and pthreads are used in packages that passed CRAN checks. thanks Vladimir Dergachev Gabor [...] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] [Tagged] Re: multithreading in packages
On Sat, 9 Oct 2021, Jeff Newmiller wrote: Keep in mind that by embedding this decision into your package you may be consuming a resource (cores) that may be more efficiently allocated by an application-level partitioning. of available resources. I for one am not a fan of this kind of thinking, and it makes system requirements for your package more complex even if you allow me to disable it. That's right, and this is why I was asking about any present or future plans for R support - if there was a way to find out how many threads R should use, I would use that. So far, it looks like the most portable way is to use OpenMP and let the user set an appropriate environment variable if they want to restrict thread usage. I could use the same OpenMP variable for pthreads as well. This is pretty common on clusters anyway, with openmp environment variables set automatically to the number of cores user requested. I would probably also add a function to the package to report the number of threads being used. Not sure whether it would be a good idea to report this during package loading (and not sure what is the right way to display a message during package load either). best Vladimir Dergachev __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] [Tagged] Re: multithreading in packages
On Sat, 9 Oct 2021, Ben Bolker wrote: FWIW there is some machinery in the glmmTMB package for querying, setting, etc. the number of OpenMP threads. https://github.com/glmmTMB/glmmTMB/search?q=omp Great, thank you ! Vladimir Dergachev On 10/9/21 11:45 AM, Vladimir Dergachev wrote: On Sat, 9 Oct 2021, Jeff Newmiller wrote: Keep in mind that by embedding this decision into your package you may be consuming a resource (cores) that may be more efficiently allocated by an application-level partitioning. of available resources. I for one am not a fan of this kind of thinking, and it makes system requirements for your package more complex even if you allow me to disable it. That's right, and this is why I was asking about any present or future plans for R support - if there was a way to find out how many threads R should use, I would use that. So far, it looks like the most portable way is to use OpenMP and let the user set an appropriate environment variable if they want to restrict thread usage. I could use the same OpenMP variable for pthreads as well. This is pretty common on clusters anyway, with openmp environment variables set automatically to the number of cores user requested. I would probably also add a function to the package to report the number of threads being used. Not sure whether it would be a good idea to report this during package loading (and not sure what is the right way to display a message during package load either). best Vladimir Dergachev __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel -- Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Director, School of Computational Science and Engineering Graduate chair, Mathematics & Statistics __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] [Tagged] Re: multithreading in packages
On Sat, 9 Oct 2021, Dirk Eddelbuettel wrote: On 9 October 2021 at 12:08, Ben Bolker wrote: |FWIW there is some machinery in the glmmTMB package for querying, | setting, etc. the number of OpenMP threads. | | https://github.com/glmmTMB/glmmTMB/search?q=omp https://cloud.r-project.org/package=RhpcBLASctl Very useful, thank you ! Tried it on my notebook, I can see OpenMP working. thanks Vladimir Dergachev Dirk -- https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] [Tagged] Re: multithreading in packages
On Sat, 9 Oct 2021, Viechtbauer, Wolfgang (SP) wrote: One thing I did not see mentioned in this thread (pun intended) so far: For what kind of computations is multithreading supposed to be used within the package being developed? If the computations involve a lot of linear/matrix algebra, then one could just use R with other linear algebra routines (e.g., OpenBLAS, Atlas, MKL, BLIS) and get the performance benefits of multicore processing of those computations without having to change a single line of code in the package (although in my experience, most of the performance benefits come from switching to something like OpenBLAS and using it single-threaded). This is meant for the RMVL package, which memory maps MVL format files for direct access. The package also provides database functionality. The files I am interested in are large. For example, the Gaia DR3 dataset is 500GB+. Plain linear algebra will likely not need multithreading - the computation will proceed at the speed of storage I/O (which is quite impressive nowadays). But it will be useful to multithread more involved code that builds or queries indices, and I was also thinking of some functions to assist with visualization - plot() and xyplot() were not meant for very long vectors. Ideally, one would be able to explore such large data sets interactively. And then do more interesting things on the cluster. This aside, I am personally more in favor of explicitly parallelizing those things that are known to be embarrassingly parallelizable using packages like parallel, future, etc. since a package author should know best when these situations arise and can take the necessary steps to parallelize those computations -- but making the use of parallel processing in these cases an option, not a default. I have seen way too many cases in HPC environments where jobs are being parallelized, the package is doing parallel processing, and multicore linear algebra routines are being used all simultaneously, which is just a disaster. Finally, I don't think the HPC task view has been mentioned so far: https://cran.r-project.org/web/views/HighPerformanceComputing.html Thanks for the link ! I see there is an OpenCL package, very interesting. best Vladimir Dergachev (not even by Dirk just now, who maintains it!) Best, Wolfgang -Original Message- From: R-package-devel [mailto:r-package-devel-boun...@r-project.org] On Behalf Of Dirk Eddelbuettel Sent: Saturday, 09 October, 2021 18:33 To: Ben Bolker Cc: r-package-devel@r-project.org Subject: Re: [R-pkg-devel] [Tagged] Re: multithreading in packages On 9 October 2021 at 12:08, Ben Bolker wrote: |FWIW there is some machinery in the glmmTMB package for querying, | setting, etc. the number of OpenMP threads. | | https://github.com/glmmTMB/glmmTMB/search?q=omp https://cloud.r-project.org/package=RhpcBLASctl Dirk -- https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] multithreading in packages
On Sat, 9 Oct 2021, Erin Hodgess wrote: Have you thought about using C or c++, please? Yes, indeed, the core of the package is written in C, with some C++ for sorting (which turned out to be rather interesting). Beyound writing optimized C there are two ways to speed up execution on a single computer - multithreading and vector instructions. Multithreading is easier here, because only one or two libraries are needed (libgomp or pthread) and because it is often hard to vectorize operations like sorting, hashing and the like. Also, to use vector instructions to full potential one typically needs a fair bit of black magic which is unlikely to pass CRAN tests. I am having enough trouble as it is getting a simple flexible array past address sanitizers. Also, there are packages called pbdDMAT from Drew Schmidt at U of Tenn which might help. Great, thanks for pointing this out ! Looks like pbdDMAT uses mpi. Also, it appears this package was removed from CRAN for failing to compile on macs, which seems rather unfair - I don't know of any clusters running mac os. Vladimir Dergachev __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] How does one install a libtool generated libfoo.so.1 file into ./libs/?
The simplest thing to try is to compile the library statically and link it into your package. No extra files - no trouble. You can also try renaming the file from *.so.1 to *.so. best Vladimir Dergachev __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Python module dependency
Have you consider translating ctef into R ? This would remove the dependencies and make your package much more robust. And would make it much easier to pass CRAN checks. Looking at ctef code it is pure Python and there aren't many lines. And in my experience, one line of R is worth 10 lines of Python :) Also, ctef has a dependency on KMeans, so translating ctef into R will remove that too. best Vladimir Dergachev On Fri, 1 Sep 2023, Hanyu Song wrote: Hello, I am writing an R package that depends on a very uncommonly used Python module named "ctef" and I have several questions about it: a. How shall I write examples for the functions that depend on the Python module? Shall I just do: #' @examplesIf reticulate::py_module_available('ctef') #' my_function_that_depends_on_ctef(arg1, arg2) in case the CRAN testing platform does not have the module? b. I read from the documentation of the R package "reticulate" that we should delay load the Python modules, but it is not entirely clear to me how to do it. Are the following lines of code sufficient for that purpose? Do I need to create any virtual environment? #' global reference to ctef #' #'@description #'`ctef` will be initialized in .onLoad. #' ctef <- NULL #' Delay load ctef module #' #' @description #' `.onLoad` delays loading ctef module (will only be loaded when accessed via $). #' #' @param libname Library name #' @param pkgname Package name .onLoad <- function(libname, pkgname) { ctef <<- reticulate::import("ctef", delay_load = TRUE) } c. How shall I import the module in my R code? For now I included the import function in my_function_that_depends_on_ctef; see below: my_function_that_depends_on_ctef <- function(X, k) { mod <- reticulate::import('ctef',delay_load = TRUE) input <- as.matrix(X) res <- mod$ctef$ctef(input,as.integer(k)) return(res) } Is this correct? There are not many R packages that depend on a Python module, so the resources are quite limited. Thank you for your help. Best, Hanyu Song [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Package bioOED has been removed from CRAN just for personal reasons
On Wed, 1 Nov 2023, David Hugh-Jones wrote: Aside from the package question, surely the other issue here is that Prof Ripley’s email is extraordinarily rude. Any paid employee would be sacked for that. I appreciate R and CRAN are volunteer-run organisations, but I don’t think that should be an excuse for this level of, frankly, toxicity. Why is he allowed to get away with it? So one thing to keep in mind that doing volunteer public facing work tends to expose people to all kinds of unreasonable requests. Those who endure often become more direct, and that's fine. Or, and people in commercial companies can be very direct too. One thing that helps is to be extra-polite to a person who is doing a lot of volunteer work, and who is likely way oversubscribed. Focusing on practical matters, if you take a step back things look pretty good: Your package has dependency on a package that you have not written and that is maintained outside CRAN. It was bound to break sooner or later. However, the last time you updated bioOED was in 2019 and there was no need to do anything for more than 3 years. That's amazing ! And probably made possible by being a little bit too direct on occasion. best Vladimir Dergachev David __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] [r-package-devel] Win.Metafile and package check - "Found the platform-specific device:"
On Fri, 3 Nov 2023, wayne.w.jo...@shell.com wrote: Dear R-Package-Devel, As part of GWSDAT package (https://github.com/waynegitshell/GWSDAT) we support the option to output plots to a WMF (https://r-graphics.org/recipe-output-vector-wmf) format if, and only if, the user is on Windows. However, when I run the package checks on here it complains about using a platform specific function with the following message: Found the platform-specific device: 'win.metafile' dev.new() is the preferred way to open a new device, in the unlikely event one is needed. In my opinion this is a false positive - and a similar issue has previously been reported here: https://stackoverflow.com/questions/70585796/unable-to-understand-1-note-in-devtoolscheck-caused-by-a-platform-specific-d Any ideas on how I modify the code and package submission to automatically pass the checks? Two suggestions: * let users specify the graphics device they want * reading manpage for dev.new() it accepts a bunch of options - there is probably a way to request the metafile device you want. But I could not find that in documentation. best Vladimir Dergachev Thanks, Wayne Wayne Jones Principal Data Scientist Decarbonisation Data Science Tel: +44 (0) 207 934 4330 Projects and Technology, Shell Research Limited, Shell Centre, York Road, London, SE1 7NA Email: wayne.w.jo...@shell.com<mailto:wayne.w.jo...@shell.com> Intranet: Shell.ai<https://eu001-sp.shell.com/sites/AAFAA6690/Shell.ai/homepage.aspx> Internet: www.shell.ai<http://www.shell.ai/> [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] RFC: an interface to manage use of parallelism in packages
On Wed, 25 Oct 2023, Ivan Krylov wrote: Summary: at the end of this message is a link to an R package implementing an interface for managing the use of execution units in R packages. As a package maintainer, would you agree to use something like this? Does it look sufficiently reasonable to become a part of R? Read on for why I made these particular interface choices. My understanding of the problem stated by Simon Urbanek and Uwe Ligges [1,2] is that we need a way to set and distribute the CPU core allowance between multiple packages that could be using very different methods to achieve parallel execution on the local machine, including threads and child processes. We could have multiple well-meaning packages, each of them calling each other using a different parallelism technology: imagine parallel::makeCluster(getOption('mc.cores')) combined with parallel::mclapply(mc.cores = getOption('mc.cores')) and with an OpenMP program that also spawns getOption('mc.cores') threads. A parallel BLAS or custom multi-threading using std::thread could add more fuel to the fire. Hi Ivan, Generally, I like the idea. A few comments: * from a package developer point of view, I would prefer to have a clear idea of how many threads I could use. So having a core R function like "getMaxThreads()" or similar would be useful. What that function returns could be governed by a package. In fact, it might be a good idea to allow to have several packages implementing "thread governors" for different situations. * it would make sense to think through whether we want (or not) to allow package developers to call omp_set_num_threads() or whether this is done by R. This is hairier than you might think. Allowing it forces every package to call omp_set_num_threads() before OMP block, because there is no way to know which packaged was called before. Not allowing to call omp_set_num_threads() might make it difficult to use all the threads, and force R to initialize OpenMP on startup. * Speaking of initialization of OpenMP, I have seen situations where spawning some regular pthread threads and then initializing OpenMP forces all pthread threads to a single CPU. I think this is because OpenMP sets thread affinity for all the process threads, but only distributes its own. * This also raises the question of how affinity is managed. If you have called makeForkCluster() to create 10 R instances and then each uses 2 OpenMP threads, you do not want those occupying only 2 cpu execution threads instead of 20. * From the user perspective, it might be useful to be able to limit number of threads per package by using patterns or regular expressions. Often, the reason for limiting number of threads is to reduce memory usage. * Speaking of memory usage, glibc has parameters like MALLOC_ARENA_MAX that have great impact on memory usage of multithreaded programs. I usually set it to 1, but then I take extra care to make as few memory allocation calls as possible within individual threads. best Vladimir Dergachev __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] [r-package-devel] Win.Metafile and package check - "Found the platform-specific device:"
On Sat, 4 Nov 2023, wayne.w.jo...@shell.com wrote: Hi Vladimir, Thanks for the suggestions. I've considered both but I can't see a way of doing what I'm trying to achieve without explicitly adding a call to win.metafile in my code. To explain a little more... GWSDAT is a Shiny App so we don't use the traditional R graphics device - so your suggestion #1 is not an option . Instead, the user is offered a range of different plot options in a shiny list box as illustrated here in this windows based example: https://user-images.githubusercontent.com/61183826/280452267-c4287f8d-73cc-42a3-881a-4643bdb31689.png The list of options presented to the users in the list box is modified according to the platform - see code here: https://github.com/WayneGitShell/GWSDAT/blob/master/R/server.R#L117-L118 So anyone using a non-windows platform will never be offered the choice of "wmf" to begin with. For example, see the online LINUX version, https://stats-glasgow.shinyapps.io/GWSDATV3-2/. You will see that this option doesn't exist in the list of choices - see https://user-images.githubusercontent.com/61183826/280453275-6da9b235-0387-47fd-b3a8-b0a949f0ec3e.png Any suggestions on how I can modify this approach to make it automatically pass the CRAN checks? I see. Ideally, the CRAN checks should adjust to your use case - but it is not obvious how. As you discovered in a later e-mail, the check just looks for the presence of win.metafile() so calling it in some other way satisfies the code. best Vladimir Dergachev Thanks, Wayne -Original Message- From: Vladimir Dergachev Sent: 03 November 2023 20:03 To: Jones, Wayne R GSUK-PTX/D/S Cc: r-package-devel@r-project.org Subject: Re: [R-pkg-devel] [r-package-devel] Win.Metafile and package check - "Found the platform-specific device:" Think Secure. This email is from an external source. On Fri, 3 Nov 2023, wayne.w.jo...@shell.com wrote: Dear R-Package-Devel, As part of GWSDAT package (https://github.com/waynegitshell/GWSDAT) we support the option to output plots to a WMF (https://r-graphics.org/recipe-output-vector-wmf) format if, and only if, the user is on Windows. However, when I run the package checks on here it complains about using a platform specific function with the following message: Found the platform-specific device: 'win.metafile' dev.new() is the preferred way to open a new device, in the unlikely event one is needed. In my opinion this is a false positive - and a similar issue has previously been reported here: https://stac/ koverflow.com%2Fquestions%2F70585796%2Funable-to-understand-1-note-in- devtoolscheck-caused-by-a-platform-specific-d&data=05%7C01%7CWayne.W.J ones%40shell.com%7C991ba83bef5648af7e8408dbdca7deea%7Cdb1e96a8a3da442a 930b235cac24cd5c%7C0%7C0%7C638346385789331058%7CUnknown%7CTWFpbGZsb3d8 eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3 000%7C%7C%7C&sdata=FLTuWH3yJq0sK%2FePWzv5ga%2FcXQGvHNmQOauw3x5RTXc%3D& reserved=0 Any ideas on how I modify the code and package submission to automatically pass the checks? Two suggestions: * let users specify the graphics device they want * reading manpage for dev.new() it accepts a bunch of options - there is probably a way to request the metafile device you want. But I could not find that in documentation. best Vladimir Dergachev Thanks, Wayne -- -- Wayne Jones Principal Data Scientist Decarbonisation Data Science Tel: +44 (0) 207 934 4330 Projects and Technology, Shell Research Limited, Shell Centre, York Road, London, SE1 7NA Email: wayne.w.jo...@shell.com<mailto:wayne.w.jo...@shell.com> Intranet: Shell.ai<https://eur02.safelinks.protection.outlook.com/?url=https%3A%25 2F%2Feu001-sp.shell.com%2Fsites%2FAAFAA6690%2FShell.ai%2Fhomepage.aspx &data=05%7C01%7CWayne.W.Jones%40shell.com%7C991ba83bef5648af7e8408dbdc a7deea%7Cdb1e96a8a3da442a930b235cac24cd5c%7C0%7C0%7C638346385789331058 %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=z9EBpBRKRttm2r5hJkEol54xSA 6hp76CTjZVvahYoso%3D&reserved=0> Internet: http://www.s/ hell.ai%2F&data=05%7C01%7CWayne.W.Jones%40shell.com%7C991ba83bef5648af 7e8408dbdca7deea%7Cdb1e96a8a3da442a930b235cac24cd5c%7C0%7C0%7C63834638 5789331058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zaHgal1ZfA5%2B0l FhpJoq2FEEJb1d5kQrogu9%2FLQt3Po%3D&reserved=0<https://eur02.safelinks/. protection.outlook.com/?url=http%3A%2F%2Fwww.shell.ai%2F&data=05%7C01% 7CWayne.W.Jones%40shell.com%7C991ba83bef5648af7e8408dbdca7deea%7Cdb1e9 6a8a3da442a930b235cac24cd5c%7C0%7C0%7C638346385789331058%7CUnknown%7CT WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI 6Mn0%3D%7C
Re: [R-pkg-devel] Wrong mailing list: Could the 100 byte path length limit be lifted?
On Wed, 13 Dec 2023, McGrath, Justin M wrote: On Windows, packages will be in "C:\Users\[User Name]\Documents\R\win-library\[R version]\[Package Name]". With a 150 byte limit, that leaves 70 bytes for the user name, R version and package name. That seems more than sufficient. If people are downloading the source files, that also leaves plenty of space regardless where they choose to extract the files. 70 bytes ?? My name is 18 characters long and there are plenty of people with longer names. I also saw practice on Windows systems to append the name of organization or department. Also, this restricts the length of package name which is arguably more important that internal package path names that the user never sees. That said, that Windows limitation is only for some programs, and the pertitent question is whether R and any software used by R has this limitation. I suspect the answer is no, but as all my systems are Linux I can not check. Vladimir Dergachev __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Additional Issues: Intel
On Wed, 17 Jan 2024, Hugh Parsonage wrote: My package grattan fails the Intel[1] check with Error: segfault from C stack overflow I am unable to immediately see where in the test suite this error has occurred. I seek advice on how to fix this error. The only hunch I have is that the package uses C code and includes structs with arrays on the stack, which perhaps are excessive for the Intel check machine, but am far from confident that's the issue. The repository is at <https://github.com/HughParsonage/grattan/> Two possibilities to look into: * your structures on the stack are large. Don't do this ! Your code might run faster and would be easier to debug if you use regular memory allocation instead. Since R does fair number of memory allocation calls itself, the extra overhead from your calls will not be that noticeable. * your stuctures are small, but you have a recursive function that is called too often. In this case, the solution is to reimplement the recurrence without doing function calls (using a loop, for example). Some recurrences can be implemented without using any accumulating state. Others need it and you can use heap memory for that. best Vladimir Dergachev [1]https://www.stats.ox.ac.uk/pub/bdr/Intel/grattan.out __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging
I use libunwind in my programs, works quite well, and simple to use. Happy to share the code if there is interest.. best Vladimir Dergachev On Mon, 4 Mar 2024, Ivan Krylov via R-package-devel wrote: On Sun, 3 Mar 2024 19:19:43 -0800 Kevin Ushey wrote: Would libSegFault be useful here? Glad to know it has been moved to <https://github.com/zatrazz/glibc-tools/tree/main/libSegFault> and not just removed altogether after the upstream commit <https://sourceware.org/git/?p=glibc.git;a=commit;h=65ccd641bacea33be23d51da737c2de7543d0f5e>. libSegFault is safer than, say, libsegfault [*] because it both supports SA_ONSTACK (for when a SIGSEGV is caused by stack overflow) and avoids functions like snprintf() (which depend on the locale code, which may have been the source of the crash). The only correctness problem that may still be unaddressed is potential memory allocations in backtrace() when it loads libgcc on first use. That should be easy to fix by calling backtrace() once in segfault_init(). Unfortunately, libSegFault is limited to glibc systems, so a different solution will be needed on Windows, macOS and Linux systems with the musl libc. Google-owned "backward" [**] tries to do most of this right, but (1) is designed to be compiled together with C++ programs, not injected into unrelated processes and (2) will exit the process if it survives raise(signum), which will interfere with both rJava (judging by the number of Java-related SIGSEGVs I saw while running R CMD check) and R's own stack overflow survival attempts. -- Best regards, Ivan [*] https://github.com/stass/libsegfault (Which doesn't compile out of the box on GNU/Linux due to missing pthread_np.h, although that should be easy to patch.) [**] https://github.com/bombela/backward-cpp __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging
Hi Ivan, Here is the piece of code I currently use: void backtrace_dump(void) { unw_cursor_tcursor; unw_context_t context; unw_getcontext(&context); unw_init_local(&cursor, &context); while (unw_step(&cursor) > 0) { unw_word_t offset, pc; charfname[64]; unw_get_reg(&cursor, UNW_REG_IP, &pc); fname[0] = '\0'; (void) unw_get_proc_name(&cursor, fname, 64, &offset); fprintf(stderr, "0x%016lx : (%s+0x%lx)\n", pc-(long)backtrace_dump, fname, offset); } } To make it safe, one can simply replace fprintf() with a function that stores information into a buffer. Several things to point out: * printing pc-(long)backtrace_dump works around address randomization, so that if you attach the debugger you can find the location again by using backtrace_dump+0 (it does not have to be backtrace_dump, any symbol will do) * this works even if the symbols are stripped, in which case it finds an offset relative to the nearest available symbol - there are always some from the loader. Of course, in this case you should use the offsets and the debugger to find out whats wrong * you can call backtrace_dump() from anywhere, does not have to be a signal handler. I've taken to calling it when my programs detect some abnormal situation, so I can see the call chain. * this should work as a package, but I am not sure whether the offsets between package symbols and R symbols would be static or not. For R it might be a good idea to also print a table of offsets between some R symbol and all the loaded C packages R_init_RMVL(), at least initially. * R ought to know where packages are loaded, we might want to be clever and print out information on which package contains which function, or there might be identical R_init_RMVL() printouts. best Vladimir Dergachev On Thu, 7 Mar 2024, Ivan Krylov wrote: On Tue, 5 Mar 2024 18:26:28 -0500 (EST) Vladimir Dergachev wrote: I use libunwind in my programs, works quite well, and simple to use. Happy to share the code if there is interest.. Do you mean that you use libunwind in signal handlers? An example on how to produce a backtrace without calling any async-signal-unsafe functions would indeed be greatly useful. Speaking of shared objects injected using LD_PRELOAD, I've experimented some more, and I think that none of them would work with R without additional adjustments. They install their signal handler very soon after the process starts up, and later, when R initialises, it installs its own signal handler, overwriting the previous one. For this scheme to work, either R would have to cooperate, remembering a pointer to the previous signal handler and calling it at some point (which sounds unsafe), or the injected shared object would have to override sigaction() and call R's signal handler from its own (which sounds extremely unsafe). Without that, if we want C-level backtraces, we either need to patch R to produce them (using backtrace() and limiting this to glibc systems or using libunwind and paying the dependency cost) or to use a debugger. -- Best regards, Ivan __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging
On Tue, 12 Mar 2024, Ivan Krylov wrote: Vladimir, Thank you for the example and for sharing the ideas regarding symbol-relative offsets! On Thu, 7 Mar 2024 09:38:18 -0500 (EST) Vladimir Dergachev wrote: unw_get_reg(&cursor, UNW_REG_IP, &pc); Is it ever possible for unw_get_reg() to fail (return non-zero) for UNW_REG_IP? The documentation isn't being obvious about this. Then again, if the process is so damaged it cannot even read the instruction pointer from its own stack frame, any attempts at self-debugging must be doomed. Not sure. I think it just returns what is in it, you will get a false reading if the stack is corrupted. The way that I see it - some printout is better than none, and having signs that stack is badly corrupted is a useful debugging clue. * this should work as a package, but I am not sure whether the offsets between package symbols and R symbols would be static or not. Since package shared objects are mmap()ed into the address space and (at least on Linux with ASLR enabled) mmap()s are supposed to be made unpredictable, this offset ends up not being static. On Linux, R seems to be normally built as a position-independent executable, so no matter whether there is a libR.so, both the R base address and the package shared object base address are randomised: $ cat ex.c #include #include void addr_diff(void) { ptrdiff_t diff = (char*)&addr_diff - (char*)&Rprintf; Rprintf("self - Rprintf = %td\n", diff); } $ R CMD SHLIB ex.c $ R-dynamic -q -s -e 'dyn.load("ex.so"); .C("addr_diff");' self - Rprintf = -9900928 $ R-dynamic -q -s -e 'dyn.load("ex.so"); .C("addr_diff");' self - Rprintf = -15561600 $ R-static -q -s -e 'dyn.load("ex.so"); .C("addr_diff");' self - Rprintf = 45537907472976 $ R-static -q -s -e 'dyn.load("ex.so"); .C("addr_diff");' self - Rprintf = 46527711447632 * R ought to know where packages are loaded, we might want to be clever and print out information on which package contains which function, or there might be identical R_init_RMVL() printouts. That's true. Informaion on all registered symbols is available from getLoadedDLLs(). Ok, so this is reasonably straighforward. best Vladimir Dergachev -- Best regards, Ivan __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] using portable simd instructions
I like assembler, and I do use SIMD intrinsincs in some of my code (not R), but sparingly. The issue is more than portability between platforms, but also portability between processors - if you write your optimized code using AVX, it might not take advantage of newer AVX512 cpus. In many cases your compiler will do the right thing and optimize your code. I suggest: * write your code in plain C, test it with some long computation and use "perf top" on Linux to observe the code hotspots and which assembler instructions are being used. * if you see instructions like "addps" these are vectorized. If you see instructions like "addss" these are *not* vectorized. * if you see a few instructions as hotspots with arguments in parenthesis "vmovaps %xmm1,(%r8)" then you are likely limited by memory access. * If you are not limited by memory access and the compiler produces a lot of "addss" or similar that are hotspots, then you need to look at your code and make it more parallelizable. * How to make your C code more parallelizable: You want to make easy to interpret loops like for(i=start;i You can help the compiler by using "restrict" keyword to indicate that arrays do not overlap, or (as a sledgehammer) "#pragma ivdep". But before using keywords check with "perf top" which code is actually a hotspot, as the compiler can generate good code without restrict keywords, by using multiple code paths. * You can create small temporary arrays to make your algorithm look more like loops above. The small arrays should be at least 16 wide, because AVX512 has instructions that operate on 16 floats at a time. * To allow use of small arrays you can unroll your loops. Note that compilers do unrolling themselves, so doing it manually is only helpful if this makes the inner body of the loop more parallelizable. * You can debug why the compiler does not parallelize your code by turning on diagnostics. For gcc the flag is "-fopt-info-vec-missed=vec_info.txt" * In very rare cases you use intrinsics. For me this is typically a situation when I need to find a value and the index of a maximum or minimum in an array - compilers do not optimize this well, at least for many different ways of coding this in C that I have tried many years ago. * If after all your work you got a factor of 2 speedup you are doing fine. If you want larger speedup change your algorithm. best Vladimir Dergachev On Wed, 27 Mar 2024, Dirk Eddelbuettel wrote: On 27 March 2024 at 08:48, jesse koops wrote: | Thank you, I was not aware of the easy way to search CRAN. I looked at | rcppsimdjson of course, but couldn't figure it out since it is done in | the simdjson library if interpret it correclty, not within the R | ecosystem and I didn't know how that would change things. Writing R | extensions assumes a lot of prior knowledge so I will have to work my | way up to there first. I think I have (at least) one other package doing something like this _in the library layer too_ as suggested by Tomas, namely crc32c as used by digest. You could study how crc32c [0] does this for x86_64 and arm64 to get hardware optimization. (This may be more specific cpu hardware optimization but at least the library and cmake files are small.) I decided as a teenager that assembler wasn't for me and haven't looked back, but I happily take advantage of it when bundled well. So strong second for the recommendation by Tomas to rely on this being done in an external and tested library. (Another interesting one there is highway [1]. Just packaging that would likely be an excellent contribution.) Dirk [0] repo: https://github.com/google/crc32c [1] repo: https://github.com/google/highway docs: https://google.github.io/highway/en/master/ | | Op di 26 mrt 2024 om 15:41 schreef Dirk Eddelbuettel : | > | > | > On 26 March 2024 at 10:53, jesse koops wrote: | > | How can I make this portable and CRAN-acceptable? | > | > But writing (or borrowing ?) some hardware detection via either configure / | > autoconf or cmake. This is no different than other tasks decided at install-time. | > | > Start with 'Writing R Extensions', as always, and work your way up from | > there. And if memory serves there are already a few other packages with SIMD | > at CRAN so you can also try to take advantage of the search for a 'token' | > (here: 'SIMD') at the (unofficial) CRAN mirror at GitHub: | > | >https://github.com/search?q=org%3Acran%20SIMD&type=code | > | > Hth, Dirk | > | > -- | > dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org -- dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit
On Fri, 3 May 2024, Ivan Krylov via R-package-devel wrote: Dear Maciej Nasinski, On Fri, 3 May 2024 11:37:57 +0200 Maciej Nasinski wrote: I believe we must conduct a comprehensive review of all existing CRAN packages. Why now? R packages are already code. You don't need poisoned RDS files to wreak havoc using an R package. On the other hand, R data files contain R objects, which contain code. You don't need exploits to smuggle code inside an R object. I think the confusion arises because users expect "R data files" to only contain data, i.e. numbers, but they can contain any R object, including functions. I, personally, never use them out of concern that accidentally saved function can override some functionality and be difficult to debug. And, of course, I never save R sessions. If you need to pass data it is a good idea to use some common format like tab-separated CSV files with column names. One can also use MVL files (RMVL package). best Vladimir Dergachev __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit
On Sat, 4 May 2024, Maciej Nasinski wrote: Thank you all for the discussion.Then, we should promote "code awareness" and count on the CRAN Team to continue their great work:) What do you think about promoting containers? Nowadays, containers are more accessible, with GitHub codespaces being more affordable (mostly free for students and the educational sector). I feel containers can help a little bit in making the R work more secure, but once more when used properly. I think it is not a good idea to focus on one use case. Some people will find containers more convenient some don't. If you want security, I am sure containers are not the right approach - get a separate physical computer instead. From a convenience point of view containers are only ok as long as you don't need to interface with outside software, then it gets tricky as the security keeping things containerized starts interfering with getting work done. (Prime example: firefox snap on ubuntu) One situation where containers can be helpful is distribution of commercial applications. Containers allow you to freeze library versions, so your app can still run with old C library or a specific version of Python. You can then _hope_ that containers will have fewer compatibility issues, or at least you can sell containers to your management on this idea. But this is not really a good thing for an open source project like R. best Vladimir Dergachev KR Maciej Nasinski University of Warsaw On Sat, 4 May 2024 at 07:17, Vladimir Dergachev wrote: On Fri, 3 May 2024, Ivan Krylov via R-package-devel wrote: > Dear Maciej Nasinski, > > On Fri, 3 May 2024 11:37:57 +0200 > Maciej Nasinski wrote: > >> I believe we must conduct a comprehensive review of all existing CRAN >> packages. > > Why now? R packages are already code. You don't need poisoned RDS files > to wreak havoc using an R package. > > On the other hand, R data files contain R objects, which contain code. > You don't need exploits to smuggle code inside an R object. > I think the confusion arises because users expect "R data files" to only contain data, i.e. numbers, but they can contain any R object, including functions. I, personally, never use them out of concern that accidentally saved function can override some functionality and be difficult to debug. And, of course, I never save R sessions. If you need to pass data it is a good idea to use some common format like tab-separated CSV files with column names. One can also use MVL files (RMVL package). best Vladimir Dergachev __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit
On Sat, 4 May 2024, Maciej Nasinski wrote: Hey Vladimir, Thank you for your answer. GitHub codespaces are "a separate computer" and are free for students and the educational sector. Hi Maciej, What I was suggesting is that instead of encapsulating the application in a container that runs on the same physical hardware as other containers, you would be more secure to use a dedicated computer for the application. best Vladimir Dergachev The GitHub codespaces are a cloud service that can be created anytime, with a specific setup behind it (Dockerfile, settings.json, renv.lock, ...). The machines GitHub codespaces offer are quite decent (4core 16GB RAM 32GB Memory). You can destroy and recreate it anytime you want to. You run GitHub codespaces from a web browser, but as Ivan stated, you may need a decent computer to handle them, even if all calculations are done on the cloud. I use GitHub codespaces for all my University projects with my friends. It is great that I do not have to explain many things nowadays to older stuff as many things are automatic on GitHub codespaces. KR Maciej Nasinski University of Warsaw On Sat, 4 May 2024 at 18:53, Vladimir Dergachev wrote: On Sat, 4 May 2024, Maciej Nasinski wrote: > Thank you all for the discussion.Then, we should promote "code awareness" and count on the CRAN Team to continue their great work:) > > What do you think about promoting containers? > Nowadays, containers are more accessible, with GitHub codespaces being more affordable (mostly free for students and the educational sector). > I feel containers can help a little bit in making the R work more secure, but once more when used properly. I think it is not a good idea to focus on one use case. Some people will find containers more convenient some don't. If you want security, I am sure containers are not the right approach - get a separate physical computer instead. >From a convenience point of view containers are only ok as long as you don't need to interface with outside software, then it gets tricky as the security keeping things containerized starts interfering with getting work done. (Prime example: firefox snap on ubuntu) One situation where containers can be helpful is distribution of commercial applications. Containers allow you to freeze library versions, so your app can still run with old C library or a specific version of Python. You can then _hope_ that containers will have fewer compatibility issues, or at least you can sell containers to your management on this idea. But this is not really a good thing for an open source project like R. best Vladimir Dergachev > > KR > Maciej Nasinski > University of Warsaw > > On Sat, 4 May 2024 at 07:17, Vladimir Dergachev wrote: > > > On Fri, 3 May 2024, Ivan Krylov via R-package-devel wrote: > > > Dear Maciej Nasinski, > > > > On Fri, 3 May 2024 11:37:57 +0200 > > Maciej Nasinski wrote: > > > >> I believe we must conduct a comprehensive review of all existing CRAN > >> packages. > > > > Why now? R packages are already code. You don't need poisoned RDS files > > to wreak havoc using an R package. > > > > On the other hand, R data files contain R objects, which contain code. > > You don't need exploits to smuggle code inside an R object. > > > > I think the confusion arises because users expect "R data files" to only > contain data, i.e. numbers, but they can contain any R object, including > functions. > > I, personally, never use them out of concern that accidentally saved > function can override some functionality and be difficult to debug. And, > of course, I never save R sessions. > > If you need to pass data it is a good idea to use some common format like > tab-separated CSV files with column names. One can also use MVL files > (RMVL package). > > best > > Vladimir Dergachev > > > __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[R-pkg-devel] SETLENGTH()
I noticed a note on RMVL package check page for development version of R: Found non-API call to R: ‘SETLENGTH’ Is this something that is work-in-progress for the development version, or has SETLENGTH() been deprecated ? What should I use instead ? thank you very much Vladimir Dergachev __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] [External] SETLENGTH()
On Sat, 4 May 2024, luke-tier...@uiowa.edu wrote: On Sat, 4 May 2024, Vladimir Dergachev wrote: [Some people who received this message don't often get email from volo...@mindspring.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] I noticed a note on RMVL package check page for development version of R: Found non-API call to R: ?SETLENGTH? Is this something that is work-in-progress for the development version, or has SETLENGTH() been deprecated ? What should I use instead ? SETLENGTH has never been part of the API. It is not safe to use except in a very, very limited set of circumstances. Using it in other settings will confuse the memory manager, leading at least to mis-calculation of memory use information and possibly to segfaults. For most uses I have seen, copying to a new vector of the right size is the only safe option. The one context where something along these lines might be OK is for growable vectors. This concept is emphatically not in the API at this point, and the way it is currently implemented in base is not robust enough to become an API (even though some packages have used it). It is possible that a proper API for this will be added; at that point SETLENGTH will be removed from the accessible entry points on platforms that allow this. So if you are getting a note about SETLENGTH, either stop using it or be prepared to make some changes at fairly short notice. [Similar considerations apply to SET_TRUELENGT. In most but not all cases using it is less dangerous, but you should still look for other options if you want your code to continue to work.] Great, thank you for the explanation ! I will rewrite the code to not use SETLENGTH(). My use case was to allocate a vector of some size N_max and then repeatedly populate it with variable number of elements. Since the vector was protected during the loop, I would have expected to save on memory allocation calls. best Vladimir Dergachev Best, luke thank you very much Vladimir Dergachev __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Overcoming CRAN's 5mb vendoring requirement
On Wed, 8 May 2024, Josiah Parry wrote: Yes, prqlr is a great Rust-based package! My other Rust based packages that are on CRAN are based, in part on prqlr. If there are many packages based on Rust that require common code, would it make sense to make a single "rust" compatibility package that they can depend on ? best Vladimir Dergachev __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Fast Matrix Serialization in R?
On Thu, 9 May 2024, Sameh Abdulah wrote: Hi, I need to serialize and save a 20K x 20K matrix as a binary file. This process is significantly slower in R compared to Python (4X slower). I'm not sure about the best approach to optimize the below code. Is it possible to parallelize the serialization function to enhance performance? Parallelization should not help - a single CPU thread should be able to saturate your disk or your network, assuming you have a typical computer. The problem is possibly the conversion to text, writing it as binary should be much faster. To add to other suggestions, you might want to try my package "RMVL" - aside from fast writes, it also gives you ability to share data between ultimate users of the package. best Vladimir Dergachev PS Example: library("RMVL") M<-mvl_open("test1.mvl", append=TRUE, create=TRUE) n <- 2^2 cat("Generating matrices ... ") INI.TIME <- proc.time() A <- matrix(runif(n), ncol = m) END_GEN.TIME <- proc.time() mvl_write(M, A, name="A") mvl_close(M) END_SER.TIME <- proc.time() # Use in another script: library("RMVL") M2<-mvl_open("test1.mvl") print(M2$A[1:10, 1:10]) __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Altrep header, MSVC, and STRUCT_SUBTYPES macro
On Wed, 15 May 2024, David Cortes wrote: I'm seeing some issues using R Altrep classes when compiling a package with the MSVC compiler on windows. While CRAN doesn't build windows binaries with this compiler, some packages such as Arrow and LightGBM have had some success in building their R packages with MSVC outside of CRAN, in order to enable functionalities that MinGW doesn't support. Out of curiousity - which functionalities are those ? One suggestion would be isolate MSVC-specific code in a library and then build a package linking to that - this might turn out to be more portable. thank you Vladimir Dergachev __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] handling documentation build tools
On Tue, 21 May 2024, Boylan, Ross via R-package-devel wrote: Thanks for the pointer. You may have been thrown off by some goofs I made in the intro, which said I would like to build the automatically, with requiring either users or repositories to have the tools. The intended meaning, with corrections in **, was I would like to build the *custom documentation* automatically, with*out* requiring either users or repositories to have the tools. So I want to build the document only locally, as you suggest, but am not sure how to accomplish that. I usually just create a Makefile. It can be something like this: all: documentation.pdf documentation.pdf: documentation.lyx lyx --export pdf4 documentation.lyx Then every time before you do R build, must run make in the directory with the Makefile. best Vladimir Dergachev Regarding the trick, I'm puzzled by what it gains. It seems like a complicated way to get the core pdf copied to inst/doc. Also, my main concern was how to automate production of the "core" pdf, using the language of the blog post. Ross From: Dirk Eddelbuettel Sent: Tuesday, May 21, 2024 2:15 PM To: Boylan, Ross Cc: r-package-devel@r-project.org Subject: Re: [R-pkg-devel] handling documentation build tools !---| This Message Is From an External Sender This message came from outside your organization. |---! As lyx is not listed in 'Writing R Extensions', the one (authorative) manual describing how to build packages for R, I would not assume it to be present on every CRAN machine building packages. Also note that several user recently had to ask here how to deal with less common fonts for style files for (pdf)latex. So I would recommend 'localising' the pdf creation to your own machine, and to ship the resulting pdf. You can have pre-made pdfs as core of a vignette, I trick I quite like to make package building simpler and more robust. See https://urldefense.com/v3/__https://www.r-bloggers.com/2019/01/add-a-static-pdf-vignette-to-an-r-package/__;!!LQC6Cpwp!vcNeLBuZJDE3hWqjhjwi0NVVeEkEHhrSe847H98Eqj9ZEEBspCetgb6g-F7a518JPRd35jL-7xkOlj0$ for details. Cheers, Dirk -- dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] warning: explicit assigning values of variable of type ....
On Thu, 6 Jun 2024, Iris Simmons wrote: Unless I'm misunderstanding, you're trying to pass a value by name to a function. That is not a thing in C nor C++. However if you want to name the arguments, you can do so with comments: /* print = */ print I would recommend against using comments in this fashion because while it tells you what you meant to do, the compiler does not know it. If you made an error the comment will make it harder to find. If you happen to have a function with a lot of arguments of similar types and putting arguments in order is a concern, you can instead convert them to a struct and pass a struct instead: typedef struct { int a; int b; } INPUT_TYPE1 void myfunc(INPUT_TYPE1 x); And somewhere: { INPUT_TYPE1 x; x.a=3; x.b=4; myfunc(x); } I don't know whether the modern compilers are smart enough to optimize this in the same way as passing an argument list. If this is a concern, probably some code restructuring is a good idea. best Vladimir Dergachev On Thu, Jun 6, 2024, 19:16 Søren Højsgaard via R-package-devel < r-package-devel@r-project.org> wrote: Dear all, From CRAN maintainers I recieve: Flavor: r-devel-linux-x86_64-debian-gcc Check: whether package can be installed, Result: WARNING Found the following significant warnings: grips_fit_ips.cpp:149:45: warning: explicitly assigning value of variable of type 'int' to itself [-Wself-assign] grips_fit_ips.cpp:213:16: warning: explicitly assigning value of variable of type 'int' to itself [-Wself-assign] grips_fit_ips.cpp:254:10: warning: explicitly assigning value of variable of type 'int' to itself [-Wself-assign] grips_fit_ips.cpp:254:21: warning: explicitly assigning value of variable of type 'double' to itself [-Wself-assign] The first warning pertains to the line: conips_inner_(S, K, elst0, clist0, print=print); print on lhs of "=" is the formal name and print on rhs of "=" the name of a variable. Does the compiler think I assign an integer to itself? Like if I write int a=7; a=a; Can anyone help me throw light on this? Thanks in advance Søren __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Options "reset" when options(opts)
On Thu, 11 Jul 2024, David Hugh-Jones wrote: This surprised me, even though it shouldn’t have done. (My false internal model of the world was that oo <- options(); … options(oo) would overwrite the entire options list with the old values.) I wonder if it would be worth pointing out explicitly in ?options. Arguably, it would be nice to have a parameter like "reset", so that one can call options(oo, reset=TRUE) and any options not explicitly passed by oo are set to NULL. This way there are two modes of operation - bulk setting of subset of options with reset=FALSE, and restoring full options set with reset=TRUE. best Vladimir Dergachev Writing: wyclif.substack.com Book: www.wyclifsdust.com On Thu, 11 Jul 2024 at 08:03, Greg Jefferis wrote: Dear John, You need to collect the return value when setting options. This will include an explicit NULL value for an option that was previously NULL. Best, Greg Jefferis. options(digits.secs = NULL) noset2 = function() { opts <- options(digits.secs = 3) on.exit(options(opts)) print(opts) } getOption("digits.secs") NULL noset2() $digits.secs NULL getOption("digits.secs") NULL Gregory Jefferis Division of Neurobiology MRC Laboratory of Molecular Biology Francis Crick Avenue Cambridge Biomedical Campus Cambridge, CB2 OQH, UK http://www2.mrc-lmb.cam.ac.uk/group-leaders/h-to-m/g-jefferis http://jefferislab.org https://www.zoo.cam.ac.uk/research/groups/connectomics On 11 Jul 2024, at 06:08, John Muschelli wrote: When setting options in a function, I have always used the following: opts <- options() on.exit(options(opts), add = TRUE) and assumed it "reset" options to what they were prior to running the function. But for some options that are set to NULL, it does not seem to reset them. Specifically, I have found digits.secs to be set after this simple example below. Is this expected behavior/documented? Overall, this specific example (the one I encountered in the wild) is not that harmful, but I wanted to ask before I set a fix for this in our work noset = function() { opts = options() print(opts$digits.secs) on.exit(options(opts)) options(digits.secs = 3) } getOption("digits.secs") #> NULL noset() #> NULL getOption("digits.secs") #> [1] 3 John Muschelli, PhD Associate Research Professor Department of Biostatistics Johns Hopkins Bloomberg School of Public Health [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] How to get arbitrary precise inputs from R for an Rcpp package?
I see there are existing extended precision packages: Ryacas and Rmpfr, you might want to take a look at them and their representation of numbers with higher precision than a double. best Vladimir Dergachev On Fri, 19 Jul 2024, Khue Tran wrote: Hi, I am trying to create an Rcpp package that involves arbitrary precise calculations. The function to calculate e^x below with 100 digits precision works well with integers, but for decimals, since the input is a double, the result differs a lot from the arbitrary precise result I got on Wolfram. I understand the results are different since 0.1 cannot be represented precisely in binary with limited bits. It is possible to enter 1 then 10 and get the multiprecision division of these two integers to attain a more precise 0.1 in C++, but this method won't work on a large scale. Thus, I am looking for a general solution to get more precise inputs? __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] How to get arbitrary precise inputs from R for an Rcpp package?
Do you need to run eigen() on an arbitrary matrix or symmetric one ? best Vladimir Dergachev On Fri, 19 Jul 2024, Khue Tran wrote: Thank you Simon! This is very helpful! Regarding eigen, I found in the Boost library the following example for arbitrary precision matrix solver: https://github.com/boostorg/multiprecision/blob/develop/example/eigen_example.cpp. I am not sure if the precision is fully preserved throughout the process, but this example motivated me to try coding with the Boost library. Best, Khue Tran On Fri, Jul 19, 2024 at 9:50 AM Simon Urbanek wrote: Khue, On 19/07/2024, at 11:32 AM, Khue Tran wrote: Thank you for the suggestion, Denes, Vladimir, and Dirk. I have indeed looked into Rmpfr and while the package can interface GNU MPFR with R smoothly, as of right now, it doesn't have all the functions I need (ie. eigen for mpfr class) and when one input decimals, say 0.1 to mpfr(), the precision is still limited by R's default double precision. Don't use doubles, use decimal fractions: Rmpfr::mpfr(gmp::as.bigq(1,10), 512) 1 'mpfr' number of precision 512 bits [1] 0.1002 As for eigen() - I'm not aware of an arbitrary precision solver, so I think the inputs are your least problem - most tools out there use LAPACK which doesn't support arbitrary precision so your input precision is likely irrelevant in this case. Cheers, Simon Thank you for the note, Dirk. I will keep in mind to send any future questions regarding Rcpp to the Rcpp-devel mailing list. I understand that the type used in the Boost library for precision is not one of the types supported by SEXP, so it will be more complicated to map between the cpp codes and R. Given Rmpfr doesn't provide all necessary mpfr calculations (and embarking on interfacing Eigen with Rmpfr is not a small task), does taking input as strings seem like the best option for me to get precise inputs? Sincerely, Khue On Fri, Jul 19, 2024 at 8:29 AM Dirk Eddelbuettel wrote: Hi Khue, On 19 July 2024 at 06:29, Khue Tran wrote: | I am currently trying to get precise inputs by taking strings instead of | numbers then writing a function to decompose the string into a rational | with the denominator in the form of 10^(-n) where n is the number of | decimal places. I am not sure if this is the only way or if there is a | better method out there that I do not know of, so if you can think of a | general way to get precise inputs from users, it will be greatly | appreciated! That is one possible way. The constraint really is that the .Call() interface we use for all [1] extensions to R only knowns SEXP types which map to a small set of known types: double, int, string, bool, ... The type used by the Boost library you are using is not among them, so you have to add code to map back and forth. Rcpp makes that easier; it is still far from automatic. R has packages such as Rmpfr interfacing GNU MPFR based on GMP. Maybe that is good enough? Also note that Rcpp has a dedicated (low volume and friendly) mailing list where questions such as this one may be better suited. Cheers, Dirk [1] A slight generalisation. There are others but they are less common / not recommended. -- dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] How to get arbitrary precise inputs from R for an Rcpp package?
On Fri, 19 Jul 2024, Khue Tran wrote: I will need to run eigen() on a symmetric matrix, but I want to get arbitrary precise eigenvalues since we will need those eigenvalues for our further calculations. Does that make sense? For a symmetric matrix there is a nice algorithm that is simple to implement and that converges fairly fast. What you do is find the largest in absolute value off-diagonal entry. The you construct 2x2 rotation matrix that zeros it: A --> R A R^-1, where R is a rotation matrix. The rotation matrix has mostly 1's on the main diagonal except for rows and columns of the large off-diagonal entry. Repeat the procedure until the off-diagonal entries are as small as you want. Because you are picking the largest value every time the algorithm is fairly stable. The only disadvantage is that this algorithm plays havoc with sparse matrices by creating too many new non-zero entries, so for those you need to use a different method. best Vladimir Dergachev Best, Khue Tran On Fri, Jul 19, 2024 at 12:14 PM Vladimir Dergachev wrote: Do you need to run eigen() on an arbitrary matrix or symmetric one ? best Vladimir Dergachev On Fri, 19 Jul 2024, Khue Tran wrote: > Thank you Simon! This is very helpful! Regarding eigen, I found in the > Boost library the following example for arbitrary precision matrix solver: > https://github.com/boostorg/multiprecision/blob/develop/example/eigen_example.cpp. > I am not sure if the precision is fully preserved throughout the process, > but this example motivated me to try coding with the Boost library. > > Best, > Khue Tran > > On Fri, Jul 19, 2024 at 9:50 AM Simon Urbanek > wrote: > >> Khue, >> >> >>> On 19/07/2024, at 11:32 AM, Khue Tran wrote: >>> >>> Thank you for the suggestion, Denes, Vladimir, and Dirk. I have indeed >>> looked into Rmpfr and while the package can interface GNU MPFR with R >>> smoothly, as of right now, it doesn't have all the functions I need (ie. >>> eigen for mpfr class) and when one input decimals, say 0.1 to mpfr(), the >>> precision is still limited by R's default double precision. >>> >> >> >> Don't use doubles, use decimal fractions: >> >>> Rmpfr::mpfr(gmp::as.bigq(1,10), 512) >> 1 'mpfr' number of precision 512 bits >> [1] >> 0.1002 >> >> As for eigen() - I'm not aware of an arbitrary precision solver, so I >> think the inputs are your least problem - most tools out there use LAPACK >> which doesn't support arbitrary precision so your input precision is likely >> irrelevant in this case. >> >> Cheers, >> Simon >> >> >> >>> Thank you for the note, Dirk. I will keep in mind to send any future >>> questions regarding Rcpp to the Rcpp-devel mailing list. I understand >> that >>> the type used in the Boost library for precision is not one of the types >>> supported by SEXP, so it will be more complicated to map between the cpp >>> codes and R. Given Rmpfr doesn't provide all necessary mpfr calculations >>> (and embarking on interfacing Eigen with Rmpfr is not a small task), does >>> taking input as strings seem like the best option for me to get precise >>> inputs? >>> >>> Sincerely, >>> Khue >>> >>> On Fri, Jul 19, 2024 at 8:29 AM Dirk Eddelbuettel >> wrote: >>> >>>> >>>> Hi Khue, >>>> >>>> On 19 July 2024 at 06:29, Khue Tran wrote: >>>> | I am currently trying to get precise inputs by taking strings instead >> of >>>> | numbers then writing a function to decompose the string into a >> rational >>>> | with the denominator in the form of 10^(-n) where n is the number of >>>> | decimal places. I am not sure if this is the only way or if there is a >>>> | better method out there that I do not know of, so if you can think of >> a >>>> | general way to get precise inputs from users, it will be greatly >>>> | appreciated
Re: [R-pkg-devel] New package with C++ code causes R abort in RStudio, not in R console.
Hi Luc, On Tue, 12 Nov 2024, Luc De Wilde wrote: Dear Vladimir, thank you for your reply. The model syntax is not simple though and the parser needs to look at the meaning in SEM terms to accept or reject certain things. What do you mean by "SEM terms" ? I have not seen a situation yet that a grammar could not be handled by bison, it does have mechanism to deal with exceptions to pure LR syntax. Moreover, this is only a first step and later other calculations need to be done in C++, which is why I find it important to know exactly why the code works in R console but not in RStudio, and of course what can be done to make it work in RStudio also. My thought was that perhaps some sort of memory issues occurs because of the hand-written parser. For example, one possibility is that stack sizes in R and Rstudio could be different. So if you are parsing something recursively it might work in one and not another. The parsers generated by flex and bison are designed to handle arbitrary length inputs. best Vladimir Dergachev Kind regards, Luc De Wilde Van: Vladimir Dergachev Verzonden: dinsdag 12 november 2024 18:15 Aan: Luc De Wilde CC: r-package-devel@r-project.org ; Yves Rosseel Onderwerp: Re: [R-pkg-devel] New package with C++ code causes R abort in RStudio, not in R console. Hi Luc, The standard tools for writing parsers are "flex" and "bison" - they generate code automatically and so can save you a lot of effort. For language with simple syntax you can get away with just using "flex". Here are some examples: Flex: https://westes.github.io/flex/manual/Simple-Examples.html#Simple-Examples Bison: https://www.gnu.org/software/bison/manual/bison.html#Infix-Calc best Vladimir Dergachev On Tue, 12 Nov 2024, Luc De Wilde wrote: Dear R package developers, I'm helping with the development of the lavaan package (see https://lavaan.ugent.be/) and currently writing a C++ version of the parser of the model syntax in lavaan. The package with C++ code is in https://github.com/lucdw/lavaanC. When testing with a bunch of models, there is one model that causes an abort of the R session in RStudio (on Windows), but in the R console or in a batch job it causes no errors. The model is the following : model <- ' F1 =~ "a b"*X1 F2 =~ a * X1 + 3*X2 # dat is hier een beetje commentaar # efa block 2 efa("efa2")*f3 + efa("efa2")*f4 =~ y1 + y2 + y3 + y1:y3 f4 := 3.14159 * F2 F1 ~ start(0.76)*F2 + a*F2 a == (b + f3)^2 b1 > exp(b2 + b3) ' and the translation can be tested - after installing lavaanC - with lavaanC::lav_parse_model_string_c(model) As mentioned, this causes an abort of the R session when executed in RStudio on Windows (10 or 11), but passes without problem in the R console or a batch job. Because many users are using RStudio I 'd like to tackle this problem, but don't know how to pinpoint the cause of the problem. I hope some of you have an idea how to handle this problem ... All the best, Luc De Wilde __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] New package with C++ code causes R abort in RStudio, not in R console.
Hi Luc, The standard tools for writing parsers are "flex" and "bison" - they generate code automatically and so can save you a lot of effort. For language with simple syntax you can get away with just using "flex". Here are some examples: Flex: https://westes.github.io/flex/manual/Simple-Examples.html#Simple-Examples Bison: https://www.gnu.org/software/bison/manual/bison.html#Infix-Calc best Vladimir Dergachev On Tue, 12 Nov 2024, Luc De Wilde wrote: Dear R package developers, I'm helping with the development of the lavaan package (see https://lavaan.ugent.be/) and currently writing a C++ version of the parser of the model syntax in lavaan. The package with C++ code is in https://github.com/lucdw/lavaanC. When testing with a bunch of models, there is one model that causes an abort of the R session in RStudio (on Windows), but in the R console or in a batch job it causes no errors. The model is the following : model <- ' F1 =~ "a b"*X1 F2 =~ a * X1 + 3*X2 # dat is hier een beetje commentaar # efa block 2 efa("efa2")*f3 + efa("efa2")*f4 =~ y1 + y2 + y3 + y1:y3 f4 := 3.14159 * F2 F1 ~ start(0.76)*F2 + a*F2 a == (b + f3)^2 b1 > exp(b2 + b3) ' and the translation can be tested - after installing lavaanC - with lavaanC::lav_parse_model_string_c(model) As mentioned, this causes an abort of the R session when executed in RStudio on Windows (10 or 11), but passes without problem in the R console or a batch job. Because many users are using RStudio I 'd like to tackle this problem, but don't know how to pinpoint the cause of the problem. I hope some of you have an idea how to handle this problem ... All the best, Luc De Wilde __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Removing packages files
Hi Lluís, Just wanted to add to the discussion that it would be good to consider users that are disconnected or behind a firewall and are installing the package from file. An option to point the package to a separately downloaded file would be useful. best Vladimir Dergachev On Thu, 2 Jan 2025, Lluís Revilla wrote: Hi list, I am developing a package that will download some data, and I'd like to store it locally to not recalculate it often. The CRAN policy requires tools::R_user_dir to be used and "the contents are actively managed (including removing outdated material)" or using TMPDIR but "such usage should be cleaned up". When loading a package there is .onLoad or .onAttach to fill or check those files and other settings required for a package. Is there something for when a package is removed? I found some related functions like .Last or reg.fnalizer and setHook or packageEvent but they are about closing a session or don't have a specific event for when uninstalling packages via (remove.packages). I appreciate any feedback, thanks in advance. Best wishes and a happy new year, Lluís __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel