[R-pkg-devel] multithreading in packages

2021-10-08 Thread Vladimir Dergachev



 I am considering adding multithreading support in my package, and would 
appreciate any suggestions/comments/opinions on what is the right way to 
do this.


  * My understanding from reading documentation and source code is that 
there is no dedicated support in R yet, but there are packages that use 
multithreading. Are there any plans for multithreading support in future R 
versions ?


  * pthread or openmp ? I am particularly concerned about interaction with 
other packages. I have seen that using pthread and openmp libraries 
simultaneously can result in incorrectly pinned threads.


  * control of maximum number of threads. One can default to openmp 
environment variable, but these might vary between openmp implementations.


thank you very much

Vladimir Dergachev

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [External] Formula modeling

2021-10-08 Thread Vladimir Dergachev




On Fri, 8 Oct 2021, pikappa.de...@gmail.com wrote:


Hi,

The different environments can potentially be an issue in the future. I was not 
aware of the vector construction notation, and I think this is what I was 
mainly looking for.

I could provide two initialization methods. One will use the ugly vector 
notation that one could use to bind the whole model with a particular 
environment. The second can be more user-friendly and use the comma-separated 
list of formulas. Essentially, the second will prepare the vector formula and 
call the first initialization method.

The (|) operator comment makes sense, and I would also want to avoid this to 
the extent that it is feasible.  So, I am currently thinking something along 
the line:

c(d, s, p | subject | time) ~ c(p + x + y, p + w + y, z + y)


From a perspective of a person that does not use formulas outside of 
xyplot() and glm(), this is a bit hard to parse visually. One could 
imagine making a mistake that s corresponds to x, rather than p+w+y.


I wonder if there is a way to write something along the lines of

~c( d~p+x+y,
s~p+w+y,
p~z+y |subject | time
   )

A quick experiment with R shows that this is treated like a formula, so ~c 
becomes a way to group formulas.


best

Vladimir Dergachev



This is very similar to how the function ?lme4::lmer uses the bar to separate 
expressions for design matrices from grouping factors. Actually, the subject 
and time variables are needed for subsetting prices for various operations 
required for the model matrix.

Thanks for the suggestions; they are very helpful!

Best,
Pantelis

-Original Message-
From: Duncan Murdoch 
Sent: Friday, October 8, 2021 2:04 AM
To: Richard M. Heiberger ; pikappa.de...@gmail.com
Cc: r-package-devel@r-project.org
Subject: Re: [R-pkg-devel] [External] Formula modeling

On 07/10/2021 5:58 p.m., Duncan Murdoch wrote:

I don't work with models like this, but I would find it more natural
to express the multiple formulas in a list:

list(d ~ p + x + y, s ~ p + w + y, p ~ z + y)

I'd really have no idea how either of the proposals below should be parsed.


There's a disadvantage to this proposal.  I'd assume that "p" means the same in 
all 3 formulas, but with the notation I give, it could refer to
3 unrelated variables, because each of the formulas would have its own 
environment, and they could all be different.  I guess you could make it a 
requirement that they all use the same environment, but that's likely going to 
be confusing to users, who won't know what it means.

Another possibility that wouldn't have this problem (but in my opinion is kind 
of ugly) is to use R vector construction notation:

  c(d, s, p) ~ c(p + x + y, p + w + y, z + y)

Duncan Murdoch



Of course, if people working with models like this are used to working
with notation like yours, that would be a strong argument to use your
notation.

Duncan Murdoch

On 07/10/2021 5:51 p.m., Richard M. Heiberger wrote:

I am responding to a subset of what you asked.  There are packages
which use multiple formulas in their argument sequence.


What you have as a single formula with | as a separator q | p |
subject | time | rho ~ p + x + y | p + w + y | z + y I think would be
better as a comma-separated list of formulas

q , p , subject , time , rho ~ p + x + y , p + w + y , z + y

because in R notation | is usually an operator, not a separator.

lattice uses formulas and the | is used as a conditioning operator.

nlme and lme4 can have multiple formulas in the same calling sequence.

lme4 is newer.  from its ?lme4-package ‘lme4’ covers approximately
the same ground as the earlier ‘nlme’
   package.

lme4 should probably be the modelyou are looking for for the package design.


On Oct 07, 2021, at 17:20, pikappa.de...@gmail.com wrote:

Dear R-package-devel subscribers,



My question concerns a package design issue relating to the usage of
formulas.



I am interested in describing via formulas systems of the form:



d = p + x + y

s = p + w + y

p = z + y

q = min(d,s).



The context in which I am working is that of market models with,
primarily, panel data. In the above system, one may think of the
first equation as demand, the second as supply, and the third as an
equation (co-)determining prices. The fourth equation is implicitly
used by the estimation method, and it does not need to be specified
when programming the R formula. If you need more information bout the system, 
you may check the package diseq.
Currently, I am using constructors to build market model objects. In
a constructor call, I pass [i] the right-hand sides of the first
three equations as strings, [ii] an argument indicating whether the
equations of the system have correlated shocks, [iii] the
identifiers of the used dataset (one for the subjects of the panel
and one for time), and [iv] the quantity
(q) and price (p) variables. These four arguments contain all the
necessary inf

Re: [R-pkg-devel] multithreading in packages

2021-10-09 Thread Vladimir Dergachev




On Sat, 9 Oct 2021, Ivan Krylov wrote:


В Thu, 7 Oct 2021 21:58:08 -0400 (EDT)
Vladimir Dergachev  пишет:


   * My understanding from reading documentation and source code is
that there is no dedicated support in R yet, but there are packages
that use multithreading. Are there any plans for multithreading
support in future R versions ?


Shared memory multithreading is hard to get right in a memory-safe
language (e.g. R), but there's the parallel package, which is a part of
base R, which offers process-based parallelism and may run your code on
multiple machines at the same time. There's no communication _between_
these machines, though. (But I think there's an MPI package on CRAN.)


Well, the way I planned to use multitheading is to speedup processing of 
very large vectors, so one does not have to wait seconds for the command 
to return. Same could be done for many built-in R primitives.





   * pthread or openmp ? I am particularly concerned about
interaction with other packages. I have seen that using pthread and
openmp libraries simultaneously can result in incorrectly pinned
threads.


pthreads-based code could be harder to run on Windows (which is a
first-class platform for R, expected to be supported by most packages).


Gábor Csárdi pointed out that R is compiled with mingw on Windows and 
has pthread support - something I did not know either.



OpenMP should be cross-platform, but Apple compilers are sometimes
lacking; the latest Apple likely has been solved since I've heard about
it. If your problem can be made embarrassingly parallel, you're welcome
to use the parallel package.


I used parallel before, it is very nice, but R-level only. I am looking 
for something to speedup response of individual package functions so they 
themselves can be used of part of more complicated code.





   * control of maximum number of threads. One can default to openmp
environment variable, but these might vary between openmp
implementations.


Moreover, CRAN-facing tests aren't allowed to consume more than 200%
CPU, so it's a good idea to leave the number of workers in control of
the user. According to a reference guide I got from openmp.org, OpenMP
implementations are expected to understand omp_set_num_threads() and
the OMP_NUM_THREADS environment variable.


Oh, this would never be run through CRAN tests, it is meant for data that 
is too big for CRAN.


I seem to remember that the Intel compiler used a different environmental 
variable, but it could be this was fixed since the last time I used it.


best

Vladimir Dergachev



--
Best regards,
Ivan


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] multithreading in packages

2021-10-09 Thread Vladimir Dergachev




On Sat, 9 Oct 2021, Gábor Csárdi wrote:


On Sat, Oct 9, 2021 at 8:52 AM Ivan Krylov  wrote:
[...]

   * pthread or openmp ? I am particularly concerned about
interaction with other packages. I have seen that using pthread and
openmp libraries simultaneously can result in incorrectly pinned
threads.


pthreads-based code could be harder to run on Windows (which is a
first-class platform for R, expected to be supported by most packages).


R uses mingw on windows, and mingw supports pthreads, you don't need
to do anything special on Windows. You don't even need a
`Makevars`/`Makevars.win` or configure* file just for using pthreads.


Great, thank you !



Some CRAN packages do this, you can search here:
https://github.com/search?l=C&p=5&q=org%3Acran+pthread_create&type=Code
(Some of these are from Unix-specific code, but not all.)


Useful link ! I also did a search for cran+omp and this turned up some 
packages as well.


Looks like both openmp and pthreads are used in packages that passed CRAN 
checks.


thanks

Vladimir Dergachev



Gabor

[...]


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [Tagged] Re: multithreading in packages

2021-10-09 Thread Vladimir Dergachev




On Sat, 9 Oct 2021, Jeff Newmiller wrote:

Keep in mind that by embedding this decision into your package you may 
be consuming a resource (cores) that may be more efficiently allocated 
by an application-level partitioning. of available resources. I for one 
am not a fan of this kind of thinking, and it makes system requirements 
for your package more complex even if you allow me to disable it.


That's right, and this is why I was asking about any present or future 
plans for R support - if there was a way to find out how many threads R 
should use, I would use that.


So far, it looks like the most portable way is to use OpenMP and let the 
user set an appropriate environment variable if they want to restrict 
thread usage. I could use the same OpenMP variable for pthreads as well.


This is pretty common on clusters anyway, with openmp environment 
variables set automatically to the number of cores user requested.


I would probably also add a function to the package to report the number 
of threads being used. Not sure whether it would be a good idea to report 
this during package loading (and not sure what is the right way to display 
a message during package load either).


best

Vladimir Dergachev

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [Tagged] Re: multithreading in packages

2021-10-09 Thread Vladimir Dergachev




On Sat, 9 Oct 2021, Ben Bolker wrote:

 FWIW there is some machinery in the glmmTMB package for querying, setting, 
etc. the number of OpenMP threads.


https://github.com/glmmTMB/glmmTMB/search?q=omp


Great, thank you !

Vladimir Dergachev



On 10/9/21 11:45 AM, Vladimir Dergachev wrote:



On Sat, 9 Oct 2021, Jeff Newmiller wrote:

Keep in mind that by embedding this decision into your package you may be 
consuming a resource (cores) that may be more efficiently allocated by an 
application-level partitioning. of available resources. I for one am not a 
fan of this kind of thinking, and it makes system requirements for your 
package more complex even if you allow me to disable it.


That's right, and this is why I was asking about any present or future 
plans for R support - if there was a way to find out how many threads R 
should use, I would use that.


So far, it looks like the most portable way is to use OpenMP and let the 
user set an appropriate environment variable if they want to restrict 
thread usage. I could use the same OpenMP variable for pthreads as well.


This is pretty common on clusters anyway, with openmp environment variables 
set automatically to the number of cores user requested.


I would probably also add a function to the package to report the number of 
threads being used. Not sure whether it would be a good idea to report this 
during package loading (and not sure what is the right way to display a 
message during package load either).


best

Vladimir Dergachev

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [Tagged] Re: multithreading in packages

2021-10-09 Thread Vladimir Dergachev




On Sat, 9 Oct 2021, Dirk Eddelbuettel wrote:



On 9 October 2021 at 12:08, Ben Bolker wrote:
|FWIW there is some machinery in the glmmTMB package for querying,
| setting, etc. the number of OpenMP threads.
|
| https://github.com/glmmTMB/glmmTMB/search?q=omp

https://cloud.r-project.org/package=RhpcBLASctl


Very useful, thank you ! Tried it on my notebook, I can see OpenMP 
working.


thanks

Vladimir Dergachev



Dirk

--
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [Tagged] Re: multithreading in packages

2021-10-09 Thread Vladimir Dergachev




On Sat, 9 Oct 2021, Viechtbauer, Wolfgang (SP) wrote:


One thing I did not see mentioned in this thread (pun intended) so far:

For what kind of computations is multithreading supposed to be used within the 
package being developed? If the computations involve a lot of linear/matrix 
algebra, then one could just use R with other linear algebra routines (e.g., 
OpenBLAS, Atlas, MKL, BLIS) and get the performance benefits of multicore 
processing of those computations without having to change a single line of code 
in the package (although in my experience, most of the performance benefits 
come from switching to something like OpenBLAS and using it single-threaded).


This is meant for the RMVL package, which memory maps MVL format files for 
direct access. The package also provides database functionality.


The files I am interested in are large. For example, the Gaia DR3 dataset 
is 500GB+.


Plain linear algebra will likely not need multithreading - the computation 
will proceed at the speed of storage I/O (which is quite impressive 
nowadays).


But it will be useful to multithread more involved code that builds or 
queries indices, and I was also thinking of some functions to assist with 
visualization - plot() and xyplot() were not meant for very long vectors.


Ideally, one would be able to explore such large data sets interactively.
And then do more interesting things on the cluster.



This aside, I am personally more in favor of explicitly parallelizing those 
things that are known to be embarrassingly parallelizable using packages like 
parallel, future, etc. since a package author should know best when these 
situations arise and can take the necessary steps to parallelize those 
computations -- but making the use of parallel processing in these cases an 
option, not a default. I have seen way too many cases in HPC environments where 
jobs are being parallelized, the package is doing parallel processing, and 
multicore linear algebra routines are being used all simultaneously, which is 
just a disaster.

Finally, I don't think the HPC task view has been mentioned so far:

https://cran.r-project.org/web/views/HighPerformanceComputing.html


Thanks for the link !

I see there is an OpenCL package, very interesting.

best

Vladimir Dergachev



(not even by Dirk just now, who maintains it!)

Best,
Wolfgang


-Original Message-
From: R-package-devel [mailto:r-package-devel-boun...@r-project.org] On Behalf 
Of
Dirk Eddelbuettel
Sent: Saturday, 09 October, 2021 18:33
To: Ben Bolker
Cc: r-package-devel@r-project.org
Subject: Re: [R-pkg-devel] [Tagged] Re: multithreading in packages


On 9 October 2021 at 12:08, Ben Bolker wrote:
|FWIW there is some machinery in the glmmTMB package for querying,
| setting, etc. the number of OpenMP threads.
|
| https://github.com/glmmTMB/glmmTMB/search?q=omp

https://cloud.r-project.org/package=RhpcBLASctl

Dirk

--
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] multithreading in packages

2021-10-09 Thread Vladimir Dergachev




On Sat, 9 Oct 2021, Erin Hodgess wrote:


Have you thought about using C or c++, please? 


Yes, indeed, the core of the package is written in C, with some C++ for 
sorting (which turned out to be rather interesting).


Beyound writing optimized C there are two ways to speed up execution on a 
single computer - multithreading and vector instructions.


Multithreading is easier here, because only one or two libraries are 
needed (libgomp or pthread) and because it is often hard to vectorize 
operations like sorting, hashing and the like.


Also, to use vector instructions to full potential one typically needs a 
fair bit of black magic which is unlikely to pass CRAN tests. I am having 
enough trouble as it is getting a simple flexible array past address 
sanitizers.



Also, there are packages called pbdDMAT from Drew Schmidt at U of Tenn which 
might help.


Great, thanks for pointing this out ! Looks like pbdDMAT uses mpi.

Also, it appears this package was removed from CRAN for failing to compile 
on macs, which seems rather unfair - I don't know of any clusters running 
mac os.


Vladimir Dergachev
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] How does one install a libtool generated libfoo.so.1 file into ./libs/?

2021-10-19 Thread Vladimir Dergachev



The simplest thing to try is to compile the library statically and link it 
into your package. No extra files - no trouble.


You can also try renaming the file from *.so.1 to *.so.

best

Vladimir Dergachev

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Python module dependency

2023-09-01 Thread Vladimir Dergachev



Have you consider translating ctef into R ?

This would remove the dependencies and make your package much more robust.
And would make it much easier to pass CRAN checks.

Looking at ctef code it is pure Python and there aren't many lines. And in 
my experience, one line of R is worth 10 lines of Python :)


Also, ctef has a dependency on KMeans, so translating ctef into R will 
remove that too.


best

Vladimir Dergachev

On Fri, 1 Sep 2023, Hanyu Song wrote:


Hello,

I am writing an R package that depends on a very uncommonly used Python module named 
"ctef" and I have several questions about it:

a. How shall I write examples for the functions that depend on the Python 
module? Shall I just do:
#' @examplesIf reticulate::py_module_available('ctef')
#' my_function_that_depends_on_ctef(arg1, arg2)
in case the CRAN testing platform does not have the module?

b. I read from the documentation of the R package "reticulate" that we should 
delay load the Python modules, but it is not entirely clear to me how to do it.

Are the following lines of code sufficient for that purpose? Do I need to 
create any virtual environment?

#'  global reference to ctef
#'
#'@description
#'`ctef` will be initialized in .onLoad.
#'
ctef <- NULL

#' Delay load ctef module
#'
#' @description
#' `.onLoad` delays loading ctef module (will only be loaded when accessed via 
$).
#'
#' @param libname Library name
#' @param pkgname Package name
.onLoad <- function(libname, pkgname) {
   ctef <<- reticulate::import("ctef", delay_load = TRUE)

}

c. How shall I import the module in my R code? For now I included the import 
function in my_function_that_depends_on_ctef; see below:
my_function_that_depends_on_ctef <- function(X, k) {
mod <- reticulate::import('ctef',delay_load = TRUE)
 input <- as.matrix(X)
 res <- mod$ctef$ctef(input,as.integer(k))
return(res)
}

Is this correct?

There are not many R packages that depend on a Python module, so the resources 
are quite limited. Thank you for your help.


Best,
Hanyu Song


[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Package bioOED has been removed from CRAN just for personal reasons

2023-11-03 Thread Vladimir Dergachev




On Wed, 1 Nov 2023, David Hugh-Jones wrote:


Aside from the package question, surely the other issue here is that Prof
Ripley’s email is extraordinarily rude. Any paid employee would be sacked
for that. I appreciate R and CRAN are volunteer-run organisations, but I
don’t think that should be an excuse for this level of, frankly, toxicity.
Why is he allowed to get away with it?


So one thing to keep in mind that doing volunteer public facing work tends 
to expose people to all kinds of unreasonable requests.


Those who endure often become more direct, and that's fine. Or, and 
people in commercial companies can be very direct too.


One thing that helps is to be extra-polite to a person who is doing a lot 
of volunteer work, and who is likely way oversubscribed.


Focusing on practical matters, if you take a step back things look 
pretty good:


Your package has dependency on a package that you have not written and
that is maintained outside CRAN. It was bound to break sooner or later.

However, the last time you updated bioOED was in 2019 and there was no 
need to do anything for more than 3 years. That's amazing ! And probably 
made possible by being a little bit too direct on occasion.


best

Vladimir Dergachev



David


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [r-package-devel] Win.Metafile and package check - "Found the platform-specific device:"

2023-11-03 Thread Vladimir Dergachev




On Fri, 3 Nov 2023, wayne.w.jo...@shell.com wrote:


Dear R-Package-Devel,

As part of GWSDAT package (https://github.com/waynegitshell/GWSDAT) we support 
the option to output plots to a WMF 
(https://r-graphics.org/recipe-output-vector-wmf) format if, and only if,  the 
user is on Windows. However, when I run the package checks on here it complains 
about using a platform specific function with the following message:

 Found the platform-specific device:
   'win.metafile'
 dev.new() is the preferred way to open a new device, in the unlikely
 event one is needed.

In my opinion this is a false positive - and a similar issue has previously 
been reported here: 
https://stackoverflow.com/questions/70585796/unable-to-understand-1-note-in-devtoolscheck-caused-by-a-platform-specific-d

Any ideas on how I modify the code and package submission to automatically pass 
the checks?


Two suggestions:

  * let users specify the graphics device they want

  * reading manpage for dev.new() it accepts a bunch of options - there is 
probably a way to request the metafile device you want. But I could not 
find that in documentation.


best

Vladimir Dergachev



Thanks,

Wayne




Wayne Jones
Principal Data Scientist
Decarbonisation Data Science

Tel: +44 (0) 207 934 4330
Projects and Technology, Shell Research Limited, Shell Centre, York Road, 
London, SE1 7NA
Email: wayne.w.jo...@shell.com<mailto:wayne.w.jo...@shell.com>
Intranet: 
Shell.ai<https://eu001-sp.shell.com/sites/AAFAA6690/Shell.ai/homepage.aspx>
Internet: www.shell.ai<http://www.shell.ai/>


[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: an interface to manage use of parallelism in packages

2023-11-03 Thread Vladimir Dergachev




On Wed, 25 Oct 2023, Ivan Krylov wrote:


Summary: at the end of this message is a link to an R package
implementing an interface for managing the use of execution units in R
packages. As a package maintainer, would you agree to use something
like this? Does it look sufficiently reasonable to become a part of R?
Read on for why I made these particular interface choices.

My understanding of the problem stated by Simon Urbanek and Uwe Ligges
[1,2] is that we need a way to set and distribute the CPU core
allowance between multiple packages that could be using very different
methods to achieve parallel execution on the local machine, including
threads and child processes. We could have multiple well-meaning
packages, each of them calling each other using a different parallelism
technology: imagine parallel::makeCluster(getOption('mc.cores'))
combined with parallel::mclapply(mc.cores = getOption('mc.cores')) and
with an OpenMP program that also spawns getOption('mc.cores') threads.
A parallel BLAS or custom multi-threading using std::thread could add
more fuel to the fire.



Hi Ivan,

  Generally, I like the idea. A few comments:

  * from a package developer point of view, I would prefer to have a clear 
idea of how many threads I could use. So having a core R function like 
"getMaxThreads()" or similar would be useful. What that function returns 
could be governed by a package.


  In fact, it might be a good idea to allow to have several packages 
implementing "thread governors" for different situations.


  * it would make sense to think through whether we want (or not) to allow 
package developers to call omp_set_num_threads() or whether this is done 
by R.


  This is hairier than you might think. Allowing it forces every package 
to call omp_set_num_threads() before OMP block, because there is no way to 
know which packaged was called before.


  Not allowing to call omp_set_num_threads() might make it difficult to 
use all the threads, and force R to initialize OpenMP on startup.


 * Speaking of initialization of OpenMP, I have seen situations where 
spawning some regular pthread threads and then initializing OpenMP forces 
all pthread threads to a single CPU.


  I think this is because OpenMP sets thread affinity for all the process 
threads, but only distributes its own.


 * This also raises the question of how affinity is managed. If you have 
called makeForkCluster() to create 10 R instances and then each uses 2 
OpenMP threads, you do not want those occupying only 2 cpu execution 
threads instead of 20.


 * From the user perspective, it might be useful to be able to limit 
number of threads per package by using patterns or regular expressions.
Often, the reason for limiting number of threads is to reduce memory 
usage.


 * Speaking of memory usage, glibc has parameters like MALLOC_ARENA_MAX 
that have great impact on memory usage of multithreaded programs. I 
usually set it to 1, but then I take extra care to make as few memory 
allocation calls as possible within individual threads.


best

Vladimir Dergachev

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [r-package-devel] Win.Metafile and package check - "Found the platform-specific device:"

2023-11-05 Thread Vladimir Dergachev




On Sat, 4 Nov 2023, wayne.w.jo...@shell.com wrote:


Hi Vladimir,

Thanks for the suggestions. I've considered both but I can't see a way of doing 
what I'm trying to achieve without explicitly adding a call to win.metafile in 
my code.

To explain a little more...

GWSDAT is a Shiny App so we don't use the traditional R graphics device - so 
your suggestion #1 is not an option . Instead, the user is offered a range of 
different plot options in a shiny list box as illustrated here in this windows 
based example:
https://user-images.githubusercontent.com/61183826/280452267-c4287f8d-73cc-42a3-881a-4643bdb31689.png

The list of options presented to the users in the list box is modified 
according to the platform - see code here:  
https://github.com/WayneGitShell/GWSDAT/blob/master/R/server.R#L117-L118

So anyone using a non-windows platform will never be offered the choice of 
"wmf"  to begin with. For example, see the online LINUX  version, 
https://stats-glasgow.shinyapps.io/GWSDATV3-2/.  You will see that this option doesn't 
exist in the list of choices - see 
https://user-images.githubusercontent.com/61183826/280453275-6da9b235-0387-47fd-b3a8-b0a949f0ec3e.png

Any suggestions on how I can modify this approach to make it automatically pass 
the CRAN checks?


I see. Ideally, the CRAN checks should adjust to your use case - but it is 
not obvious how.


As you discovered in a later e-mail, the check just looks for the presence 
of win.metafile() so calling it in some other way satisfies the code.


best

Vladimir Dergachev



Thanks,

Wayne

-Original Message-
From: Vladimir Dergachev 
Sent: 03 November 2023 20:03
To: Jones, Wayne R GSUK-PTX/D/S 
Cc: r-package-devel@r-project.org
Subject: Re: [R-pkg-devel] [r-package-devel] Win.Metafile and package check - "Found 
the platform-specific device:"

Think Secure. This email is from an external source.

On Fri, 3 Nov 2023, wayne.w.jo...@shell.com wrote:


Dear R-Package-Devel,

As part of GWSDAT package (https://github.com/waynegitshell/GWSDAT) we support 
the option to output plots to a WMF 
(https://r-graphics.org/recipe-output-vector-wmf) format if, and only if,  the 
user is on Windows. However, when I run the package checks on here it complains 
about using a platform specific function with the following message:

 Found the platform-specific device:
   'win.metafile'
 dev.new() is the preferred way to open a new device, in the unlikely
event one is needed.

In my opinion this is a false positive - and a similar issue has
previously been reported here:
https://stac/
koverflow.com%2Fquestions%2F70585796%2Funable-to-understand-1-note-in-
devtoolscheck-caused-by-a-platform-specific-d&data=05%7C01%7CWayne.W.J
ones%40shell.com%7C991ba83bef5648af7e8408dbdca7deea%7Cdb1e96a8a3da442a
930b235cac24cd5c%7C0%7C0%7C638346385789331058%7CUnknown%7CTWFpbGZsb3d8
eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3
000%7C%7C%7C&sdata=FLTuWH3yJq0sK%2FePWzv5ga%2FcXQGvHNmQOauw3x5RTXc%3D&
reserved=0

Any ideas on how I modify the code and package submission to automatically pass 
the checks?


Two suggestions:

  * let users specify the graphics device they want

  * reading manpage for dev.new() it accepts a bunch of options - there is 
probably a way to request the metafile device you want. But I could not find 
that in documentation.

best

Vladimir Dergachev



Thanks,

Wayne



--
--
Wayne Jones
Principal Data Scientist
Decarbonisation Data Science

Tel: +44 (0) 207 934 4330
Projects and Technology, Shell Research Limited, Shell Centre, York
Road, London, SE1 7NA
Email: wayne.w.jo...@shell.com<mailto:wayne.w.jo...@shell.com>
Intranet:
Shell.ai<https://eur02.safelinks.protection.outlook.com/?url=https%3A%25
2F%2Feu001-sp.shell.com%2Fsites%2FAAFAA6690%2FShell.ai%2Fhomepage.aspx
&data=05%7C01%7CWayne.W.Jones%40shell.com%7C991ba83bef5648af7e8408dbdc
a7deea%7Cdb1e96a8a3da442a930b235cac24cd5c%7C0%7C0%7C638346385789331058
%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I
k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=z9EBpBRKRttm2r5hJkEol54xSA
6hp76CTjZVvahYoso%3D&reserved=0>
Internet:
http://www.s/
hell.ai%2F&data=05%7C01%7CWayne.W.Jones%40shell.com%7C991ba83bef5648af
7e8408dbdca7deea%7Cdb1e96a8a3da442a930b235cac24cd5c%7C0%7C0%7C63834638
5789331058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zaHgal1ZfA5%2B0l
FhpJoq2FEEJb1d5kQrogu9%2FLQt3Po%3D&reserved=0<https://eur02.safelinks/.
protection.outlook.com/?url=http%3A%2F%2Fwww.shell.ai%2F&data=05%7C01%
7CWayne.W.Jones%40shell.com%7C991ba83bef5648af7e8408dbdca7deea%7Cdb1e9
6a8a3da442a930b235cac24cd5c%7C0%7C0%7C638346385789331058%7CUnknown%7CT
WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
6Mn0%3D%7C

Re: [R-pkg-devel] Wrong mailing list: Could the 100 byte path length limit be lifted?

2023-12-14 Thread Vladimir Dergachev




On Wed, 13 Dec 2023, McGrath, Justin M wrote:


On Windows, packages will be in "C:\Users\[User Name]\Documents\R\win-library\[R 
version]\[Package Name]".

With a 150 byte limit, that leaves 70 bytes for the user name, R version 
and package name. That seems more than sufficient. If people are 
downloading the source files, that also leaves plenty of space 
regardless where they choose to extract the files.




70 bytes ?? My name is 18 characters long and there are plenty of people 
with longer names. I also saw practice on Windows systems to append the 
name of organization or department.


Also, this restricts the length of package name which is arguably more 
important that internal package path names that the user never sees.


That said, that Windows limitation is only for some programs, and the 
pertitent question is whether R and any software used by R has this 
limitation. I suspect the answer is no, but as all my systems are Linux

I can not check.

Vladimir Dergachev

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Additional Issues: Intel

2024-01-16 Thread Vladimir Dergachev




On Wed, 17 Jan 2024, Hugh Parsonage wrote:


My package grattan fails the Intel[1] check with

 Error: segfault from C stack overflow

I am unable to immediately see where in the test suite this error has
occurred.  I seek advice on how to fix this error.  The only hunch I
have is that the package uses C code and includes structs with arrays
on the stack, which perhaps are excessive for the Intel check machine,
but am far from confident that's the issue.  The repository is at
<https://github.com/HughParsonage/grattan/>


Two possibilities to look into:

   * your structures on the stack are large. Don't do this ! Your code 
might run faster and would be easier to debug if you use regular memory 
allocation instead. Since R does fair number of memory allocation calls 
itself, the extra overhead from your calls will not be that noticeable.


   * your stuctures are small, but you have a recursive function that is 
called too often. In this case, the solution is to reimplement the 
recurrence without doing function calls (using a loop, for example). Some 
recurrences can be implemented without using any accumulating state. 
Others need it and you can use heap memory for that.


best

Vladimir Dergachev



[1]https://www.stats.ox.ac.uk/pub/bdr/Intel/grattan.out

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-05 Thread Vladimir Dergachev



I use libunwind in my programs, works quite well, and simple to use.

Happy to share the code if there is interest..

best

Vladimir Dergachev

On Mon, 4 Mar 2024, Ivan Krylov via R-package-devel wrote:


On Sun, 3 Mar 2024 19:19:43 -0800
Kevin Ushey  wrote:


Would libSegFault be useful here?


Glad to know it has been moved to
<https://github.com/zatrazz/glibc-tools/tree/main/libSegFault> and not
just removed altogether after the upstream commit
<https://sourceware.org/git/?p=glibc.git;a=commit;h=65ccd641bacea33be23d51da737c2de7543d0f5e>.

libSegFault is safer than, say, libsegfault [*] because it both
supports SA_ONSTACK (for when a SIGSEGV is caused by stack overflow)
and avoids functions like snprintf() (which depend on the locale code,
which may have been the source of the crash). The only correctness
problem that may still be unaddressed is potential memory allocations
in backtrace() when it loads libgcc on first use. That should be easy
to fix by calling backtrace() once in segfault_init(). Unfortunately,
libSegFault is limited to glibc systems, so a different solution will
be needed on Windows, macOS and Linux systems with the musl libc.

Google-owned "backward" [**] tries to do most of this right, but (1) is
designed to be compiled together with C++ programs, not injected into
unrelated processes and (2) will exit the process if it survives
raise(signum), which will interfere with both rJava (judging by the
number of Java-related SIGSEGVs I saw while running R CMD check) and R's
own stack overflow survival attempts.

--
Best regards,
Ivan

[*] https://github.com/stass/libsegfault
(Which doesn't compile out of the box on GNU/Linux due to missing
pthread_np.h, although that should be easy to patch.)

[**] https://github.com/bombela/backward-cpp

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-07 Thread Vladimir Dergachev



Hi Ivan,

Here is the piece of code I currently use:

void backtrace_dump(void)
{
unw_cursor_tcursor;
unw_context_t   context;

unw_getcontext(&context);
unw_init_local(&cursor, &context);

while (unw_step(&cursor) > 0)
{
unw_word_t  offset, pc;
charfname[64];

unw_get_reg(&cursor, UNW_REG_IP, &pc);

fname[0] = '\0';
(void) unw_get_proc_name(&cursor, fname, 64, &offset);

fprintf(stderr, "0x%016lx : (%s+0x%lx)\n", pc-(long)backtrace_dump, 
fname, offset);
}
}

To make it safe, one can simply replace fprintf() with a function that 
stores information into a buffer.


Several things to point out:

  * printing pc-(long)backtrace_dump works around address randomization, 
so that if you attach the debugger you can find the location again by 
using backtrace_dump+0 (it does not have to be backtrace_dump, any 
symbol will do)


  * this works even if the symbols are stripped, in which case it finds an 
offset relative to the nearest available symbol - there are always some 
from the loader. Of course, in this case you should use the offsets and 
the debugger to find out whats wrong


  * you can call backtrace_dump() from anywhere, does not have to be a 
signal handler. I've taken to calling it when my programs detect some 
abnormal situation, so I can see the call chain.


  * this should work as a package, but I am not sure whether the offsets 
between package symbols and R symbols would be static or not. For R it 
might be a good idea to also print a table of offsets between some R 
symbol and all the loaded C packages R_init_RMVL(), at least initially.


  * R ought to know where packages are loaded, we might want to be clever 
and print out information on which package contains which function, or 
there might be identical R_init_RMVL() printouts.


best

Vladimir Dergachev

On Thu, 7 Mar 2024, Ivan Krylov wrote:


On Tue, 5 Mar 2024 18:26:28 -0500 (EST)
Vladimir Dergachev  wrote:


I use libunwind in my programs, works quite well, and simple to use.

Happy to share the code if there is interest..


Do you mean that you use libunwind in signal handlers? An example on
how to produce a backtrace without calling any async-signal-unsafe
functions would indeed be greatly useful.

Speaking of shared objects injected using LD_PRELOAD, I've experimented
some more, and I think that none of them would work with R without
additional adjustments. They install their signal handler very soon
after the process starts up, and later, when R initialises, it
installs its own signal handler, overwriting the previous one. For this
scheme to work, either R would have to cooperate, remembering a pointer
to the previous signal handler and calling it at some point (which
sounds unsafe), or the injected shared object would have to override
sigaction() and call R's signal handler from its own (which sounds
extremely unsafe).

Without that, if we want C-level backtraces, we either need to patch R
to produce them (using backtrace() and limiting this to glibc systems
or using libunwind and paying the dependency cost) or to use a debugger.

--
Best regards,
Ivan



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-11 Thread Vladimir Dergachev




On Tue, 12 Mar 2024, Ivan Krylov wrote:


Vladimir,

Thank you for the example and for sharing the ideas regarding
symbol-relative offsets!

On Thu, 7 Mar 2024 09:38:18 -0500 (EST)
Vladimir Dergachev  wrote:


 unw_get_reg(&cursor, UNW_REG_IP, &pc);


Is it ever possible for unw_get_reg() to fail (return non-zero) for
UNW_REG_IP? The documentation isn't being obvious about this. Then
again, if the process is so damaged it cannot even read the instruction
pointer from its own stack frame, any attempts at self-debugging must
be doomed.


Not sure. I think it just returns what is in it, you will get a false 
reading if the stack is corrupted. The way that I see it - some printout 
is better than none, and having signs that stack is badly corrupted is a 
useful debugging clue.





   * this should work as a package, but I am not sure whether the
offsets between package symbols and R symbols would be static or not.


Since package shared objects are mmap()ed into the address space and
(at least on Linux with ASLR enabled) mmap()s are supposed to be made
unpredictable, this offset ends up not being static. On Linux, R seems
to be normally built as a position-independent executable, so no matter
whether there is a libR.so, both the R base address and the package
shared object base address are randomised:

$ cat ex.c
#include 
#include 
void addr_diff(void) {
ptrdiff_t diff = (char*)&addr_diff - (char*)&Rprintf;
Rprintf("self - Rprintf = %td\n", diff);
}
$ R CMD SHLIB ex.c
$ R-dynamic -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = -9900928
$ R-dynamic -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = -15561600
$ R-static -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = 45537907472976
$ R-static -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = 46527711447632


   * R ought to know where packages are loaded, we might want to be
clever and print out information on which package contains which
function, or there might be identical R_init_RMVL() printouts.


That's true. Informaion on all registered symbols is available from
getLoadedDLLs().


Ok, so this is reasonably straighforward.

best

Vladimir Dergachev



--
Best regards,
Ivan



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] using portable simd instructions

2024-03-27 Thread Vladimir Dergachev



I like assembler, and I do use SIMD intrinsincs in some of my code (not 
R), but sparingly.


The issue is more than portability between platforms, but also portability 
between processors - if you write your optimized code using AVX, it might 
not take advantage of newer AVX512 cpus.


In many cases your compiler will do the right thing and optimize your 
code.


I suggest:

   * write your code in plain C, test it with some long computation and 
use "perf top" on Linux to observe the code hotspots and which assembler 
instructions are being used.


   * if you see instructions like "addps" these are vectorized. If you see 
instructions like "addss" these are *not* vectorized.


   * if you see a few instructions as hotspots with arguments in 
parenthesis "vmovaps %xmm1,(%r8)" then you are likely limited by memory 
access.


   * If you are not limited by memory access and the compiler produces a 
lot of "addss" or similar that are hotspots, then you need to look at your 
code and make it more parallelizable.


   * How to make your C code more parallelizable:

   You want to make easy to interpret loops like

 for(i=start;i   You can help the compiler by using "restrict" keyword to indicate that 
arrays do not overlap, or (as a sledgehammer) "#pragma ivdep". But before 
using keywords check with "perf top" which code is actually a hotspot, as 
the compiler can generate good code without restrict keywords, by using 
multiple code paths.


   * You can create small temporary arrays to make your algorithm look 
more like loops above. The small arrays should be at least 16 wide, 
because AVX512 has instructions that operate on 16 floats at a time.


* To allow use of small arrays you can unroll your loops. Note that 
compilers do unrolling themselves, so doing it manually is only helpful if 
this makes the inner body of the loop more parallelizable.


* You can debug why the compiler does not parallelize your code by 
turning on diagnostics. For gcc the flag is "-fopt-info-vec-missed=vec_info.txt"


* In very rare cases you use intrinsics. For me this is typically a 
situation when I need to find a value and the index of a maximum or 
minimum in an array - compilers do not optimize this well, at least for 
many different ways of coding this in C that I have tried many years ago.


* If after all your work you got a factor of 2 speedup you are doing 
fine. If you want larger speedup change your algorithm.


best

Vladimir Dergachev

On Wed, 27 Mar 2024, Dirk Eddelbuettel wrote:



On 27 March 2024 at 08:48, jesse koops wrote:
| Thank you, I was not aware of the easy way to search CRAN. I looked at
| rcppsimdjson of course, but couldn't figure it out since it is done in
| the simdjson library if interpret it correclty, not within the R
| ecosystem and I didn't know how that would change things. Writing R
| extensions assumes a lot of  prior knowledge so I will have to work my
| way up to there first.

I think I have (at least) one other package doing something like this _in the
library layer too_ as suggested by Tomas, namely crc32c as used by digest.
You could study how crc32c [0] does this for x86_64 and arm64 to get hardware
optimization. (This may be more specific cpu hardware optimization but at
least the library and cmake files are small.)

I decided as a teenager that assembler wasn't for me and haven't looked back,
but I happily take advantage of it when bundled well. So strong second for
the recommendation by Tomas to rely on this being done in an external and
tested library.

(Another interesting one there is highway [1]. Just packaging that would
likely be an excellent contribution.)

Dirk

[0] repo: https://github.com/google/crc32c
[1] repo: https://github.com/google/highway
   docs: https://google.github.io/highway/en/master/


|
| Op di 26 mrt 2024 om 15:41 schreef Dirk Eddelbuettel :
| >
| >
| > On 26 March 2024 at 10:53, jesse koops wrote:
| > | How can I make this portable and CRAN-acceptable?
| >
| > But writing (or borrowing ?) some hardware detection via either configure /
| > autoconf or cmake. This is no different than other tasks decided at 
install-time.
| >
| > Start with 'Writing R Extensions', as always, and work your way up from
| > there. And if memory serves there are already a few other packages with SIMD
| > at CRAN so you can also try to take advantage of the search for a 'token'
| > (here: 'SIMD') at the (unofficial) CRAN mirror at GitHub:
| >
| >https://github.com/search?q=org%3Acran%20SIMD&type=code
| >
| > Hth, Dirk
| >
| > --
| > dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

--
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

2024-05-03 Thread Vladimir Dergachev




On Fri, 3 May 2024, Ivan Krylov via R-package-devel wrote:


Dear Maciej Nasinski,

On Fri, 3 May 2024 11:37:57 +0200
Maciej Nasinski  wrote:


I believe we must conduct a comprehensive review of all existing CRAN
packages.


Why now? R packages are already code. You don't need poisoned RDS files
to wreak havoc using an R package.

On the other hand, R data files contain R objects, which contain code.
You don't need exploits to smuggle code inside an R object.



I think the confusion arises because users expect "R data files" to only 
contain data, i.e. numbers, but they can contain any R object, including 
functions.


I, personally, never use them out of concern that accidentally saved 
function can override some functionality and be difficult to debug. And, 
of course, I never save R sessions.


If you need to pass data it is a good idea to use some common format like 
tab-separated CSV files with column names. One can also use MVL files 
(RMVL package).


best

Vladimir Dergachev

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

2024-05-04 Thread Vladimir Dergachev




On Sat, 4 May 2024, Maciej Nasinski wrote:


Thank you all for the discussion.Then, we should promote "code awareness" and 
count on the CRAN Team to continue their great work:)

What do you think about promoting containers?
Nowadays, containers are more accessible, with GitHub codespaces being more 
affordable (mostly free for students and the educational sector).
I feel containers can help a little bit in making the R work more secure, but 
once more when used properly.


I think it is not a good idea to focus on one use case. Some people will 
find containers more convenient some don't.


If you want security, I am sure containers are not the right approach - 
get a separate physical computer instead.


From a convenience point of view containers are only ok as long as you 
don't need to interface with outside software, then it gets tricky as the 
security keeping things containerized starts interfering with getting work 
done. (Prime example: firefox snap on ubuntu)


One situation where containers can be helpful is distribution of 
commercial applications. Containers allow you to freeze library versions, 
so your app can still run with old C library or a specific version of 
Python. You can then _hope_ that containers will have fewer compatibility 
issues, or at least you can sell containers to your management on this 
idea.


But this is not really a good thing for an open source project like R.

best

Vladimir Dergachev



KR
Maciej Nasinski
University of Warsaw

On Sat, 4 May 2024 at 07:17, Vladimir Dergachev  wrote:


  On Fri, 3 May 2024, Ivan Krylov via R-package-devel wrote:

  > Dear Maciej Nasinski,
  >
  > On Fri, 3 May 2024 11:37:57 +0200
  > Maciej Nasinski  wrote:
  >
  >> I believe we must conduct a comprehensive review of all existing CRAN
  >> packages.
  >
  > Why now? R packages are already code. You don't need poisoned RDS files
  > to wreak havoc using an R package.
  >
  > On the other hand, R data files contain R objects, which contain code.
  > You don't need exploits to smuggle code inside an R object.
  >

  I think the confusion arises because users expect "R data files" to only
  contain data, i.e. numbers, but they can contain any R object, including
  functions.

  I, personally, never use them out of concern that accidentally saved
  function can override some functionality and be difficult to debug. And,
  of course, I never save R sessions.

  If you need to pass data it is a good idea to use some common format like
  tab-separated CSV files with column names. One can also use MVL files
  (RMVL package).

  best

  Vladimir Dergachev




__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

2024-05-04 Thread Vladimir Dergachev




On Sat, 4 May 2024, Maciej Nasinski wrote:


Hey Vladimir,
Thank you for your answer.
GitHub codespaces are "a separate computer" and are free for students and the 
educational sector.


Hi Maciej,

   What I was suggesting is that instead of encapsulating the application 
in a container that runs on the same physical hardware as other 
containers, you would be more secure to use a dedicated computer for the 
application.


   best

Vladimir Dergachev



The GitHub codespaces are a cloud service that can be created anytime, with a 
specific setup behind it (Dockerfile, settings.json, renv.lock,  ...).
The machines GitHub codespaces offer are quite decent (4core 16GB RAM 32GB 
Memory). 
You can destroy and recreate it anytime you want to.
You run GitHub codespaces from a web browser, but as Ivan stated, you may need 
a decent computer to handle them, even if all calculations are done on the 
cloud.
I use GitHub codespaces for all my University projects with my friends.  It is 
great that I do not have to explain many things nowadays to older stuff as many 
things are automatic on GitHub
codespaces.

KR
Maciej Nasinski
University of Warsaw

On Sat, 4 May 2024 at 18:53, Vladimir Dergachev  wrote:


  On Sat, 4 May 2024, Maciej Nasinski wrote:

  > Thank you all for the discussion.Then, we should promote "code 
awareness" and count on the CRAN Team to continue their great work:)
  >
  > What do you think about promoting containers?
  > Nowadays, containers are more accessible, with GitHub codespaces being 
more affordable (mostly free for students and the educational sector).
  > I feel containers can help a little bit in making the R work more 
secure, but once more when used properly.

  I think it is not a good idea to focus on one use case. Some people will
  find containers more convenient some don't.

  If you want security, I am sure containers are not the right approach -
  get a separate physical computer instead.

  >From a convenience point of view containers are only ok as long as you
  don't need to interface with outside software, then it gets tricky as the
  security keeping things containerized starts interfering with getting work
  done. (Prime example: firefox snap on ubuntu)

  One situation where containers can be helpful is distribution of
  commercial applications. Containers allow you to freeze library versions,
  so your app can still run with old C library or a specific version of
  Python. You can then _hope_ that containers will have fewer compatibility
  issues, or at least you can sell containers to your management on this
  idea.

  But this is not really a good thing for an open source project like R.

  best

  Vladimir Dergachev

  >
  > KR
  > Maciej Nasinski
  > University of Warsaw
      >
  > On Sat, 4 May 2024 at 07:17, Vladimir Dergachev 
 wrote:
  >
  >
  >       On Fri, 3 May 2024, Ivan Krylov via R-package-devel wrote:
  >
  >       > Dear Maciej Nasinski,
  >       >
  >       > On Fri, 3 May 2024 11:37:57 +0200
  >       > Maciej Nasinski  wrote:
  >       >
  >       >> I believe we must conduct a comprehensive review of all 
existing CRAN
  >       >> packages.
  >       >
  >       > Why now? R packages are already code. You don't need poisoned 
RDS files
  >       > to wreak havoc using an R package.
  >       >
  >       > On the other hand, R data files contain R objects, which 
contain code.
  >       > You don't need exploits to smuggle code inside an R object.
  >       >
  >
  >       I think the confusion arises because users expect "R data files" 
to only
  >       contain data, i.e. numbers, but they can contain any R object, 
including
  >       functions.
  >
  >       I, personally, never use them out of concern that accidentally 
saved
  >       function can override some functionality and be difficult to 
debug. And,
  >       of course, I never save R sessions.
  >
  >       If you need to pass data it is a good idea to use some common 
format like
  >       tab-separated CSV files with column names. One can also use MVL 
files
  >       (RMVL package).
  >
  >       best
  >
  >       Vladimir Dergachev
  >
  >
  >




__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] SETLENGTH()

2024-05-04 Thread Vladimir Dergachev


I noticed a note on RMVL package check page for development version of R:

Found non-API call to R: ‘SETLENGTH’

Is this something that is work-in-progress for the development version, or 
has SETLENGTH() been deprecated ? What should I use instead ?


thank you very much

Vladimir Dergachev
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [External] SETLENGTH()

2024-05-04 Thread Vladimir Dergachev




On Sat, 4 May 2024, luke-tier...@uiowa.edu wrote:


On Sat, 4 May 2024, Vladimir Dergachev wrote:

[Some people who received this message don't often get email from 
volo...@mindspring.com. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification ]


I noticed a note on RMVL package check page for development version of R:

Found non-API call to R: ?SETLENGTH?

Is this something that is work-in-progress for the development version, or
has SETLENGTH() been deprecated ? What should I use instead ?


SETLENGTH has never been part of the API. It is not safe to use except
in a very, very limited set of circumstances. Using it in other
settings will confuse the memory manager, leading at least to
mis-calculation of memory use information and possibly to
segfaults. For most uses I have seen, copying to a new vector of the
right size is the only safe option.

The one context where something along these lines might be OK is for
growable vectors. This concept is emphatically not in the API at this
point, and the way it is currently implemented in base is not robust
enough to become an API (even though some packages have used it). It
is possible that a proper API for this will be added; at that point
SETLENGTH will be removed from the accessible entry points on
platforms that allow this.

So if you are getting a note about SETLENGTH, either stop using it or
be prepared to make some changes at fairly short notice.

[Similar considerations apply to SET_TRUELENGT. In most but not all
cases using it is less dangerous, but you should still look for other
options if you want your code to continue to work.]


Great, thank you for the explanation ! I will rewrite the code to not use 
SETLENGTH().


My use case was to allocate a vector of some size N_max and then 
repeatedly populate it with variable number of elements. Since the vector 
was protected during the loop, I would have expected to save on memory 
allocation calls.


best

Vladimir Dergachev



Best,

luke



thank you very much

Vladimir Dergachev
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
  Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Overcoming CRAN's 5mb vendoring requirement

2024-05-08 Thread Vladimir Dergachev




On Wed, 8 May 2024, Josiah Parry wrote:




Yes, prqlr is a great Rust-based package! My other Rust based packages that
are on CRAN are based, in part on prqlr.




If there are many packages based on Rust that require common code, would 
it make sense to make a single "rust" compatibility package that they can 
depend on ?


best

Vladimir Dergachev

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Fast Matrix Serialization in R?

2024-05-09 Thread Vladimir Dergachev




On Thu, 9 May 2024, Sameh Abdulah wrote:


Hi,

I need to serialize and save a 20K x 20K matrix as a binary file. This process 
is significantly slower in R compared to Python (4X slower).

I'm not sure about the best approach to optimize the below code. Is it possible 
to parallelize the serialization function to enhance performance?


Parallelization should not help - a single CPU thread should be able to 
saturate your disk or your network, assuming you have a typical computer.


The problem is possibly the conversion to text, writing it as binary 
should be much faster.


To add to other suggestions, you might want to try my package "RMVL" - 
aside from fast writes, it also gives you ability to share data between 
ultimate users of the package.


best

Vladimir Dergachev

PS Example:

library("RMVL")

M<-mvl_open("test1.mvl", append=TRUE, create=TRUE)

n <- 2^2
cat("Generating matrices ... ")
INI.TIME <- proc.time()
A <- matrix(runif(n), ncol = m)
END_GEN.TIME <- proc.time()

mvl_write(M, A, name="A")

mvl_close(M)

END_SER.TIME <- proc.time()


# Use in another script:

library("RMVL")

M2<-mvl_open("test1.mvl")

print(M2$A[1:10, 1:10])

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Altrep header, MSVC, and STRUCT_SUBTYPES macro

2024-05-15 Thread Vladimir Dergachev




On Wed, 15 May 2024, David Cortes wrote:


I'm seeing some issues using R Altrep classes when compiling a package
with the MSVC compiler on windows.

While CRAN doesn't build windows binaries with this compiler, some
packages such as Arrow and LightGBM have had some success in building
their R packages with MSVC outside of CRAN, in order to enable
functionalities that MinGW doesn't support.


Out of curiousity - which functionalities are those ?

One suggestion would be isolate MSVC-specific code in a library and then 
build a package linking to that - this might turn out to be more portable.


thank you

Vladimir Dergachev

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] handling documentation build tools

2024-05-21 Thread Vladimir Dergachev




On Tue, 21 May 2024, Boylan, Ross via R-package-devel wrote:


Thanks for the pointer.  You may have been thrown off by some goofs I made in 
the intro, which said

I would like to build the automatically, with requiring either users or 
repositories to have the tools.


The intended meaning, with corrections in **, was

I would like to build the  *custom documentation* automatically, with*out* 
requiring either users or repositories to have the tools.


So I want to build the document only locally, as you suggest, but am not sure 
how to accomplish that.


I usually just create a Makefile.

It can be something like this:


all: documentation.pdf

documentation.pdf: documentation.lyx
lyx --export pdf4 documentation.lyx

Then every time before you do R build, must run make in the directory with 
the Makefile.


best

Vladimir Dergachev



Regarding the trick, I'm puzzled by what it gains.  It seems like a complicated 
way to get the core pdf copied to inst/doc.

Also, my main concern was how to automate production of the "core" pdf, using 
the language of the blog post.

Ross



From: Dirk Eddelbuettel 
Sent: Tuesday, May 21, 2024 2:15 PM
To: Boylan, Ross
Cc: r-package-devel@r-project.org
Subject: Re: [R-pkg-devel] handling documentation build tools

!---|
 This Message Is From an External Sender
 This message came from outside your organization.
|---!


As lyx is not listed in 'Writing R Extensions', the one (authorative) manual
describing how to build packages for R, I would not assume it to be present
on every CRAN machine building packages. Also note that several user recently
had to ask here how to deal with less common fonts for style files for
(pdf)latex.

So I would recommend 'localising' the pdf creation to your own machine, and
to ship the resulting pdf. You can have pre-made pdfs as core of a vignette,
I trick I quite like to make package building simpler and more robust.  See
https://urldefense.com/v3/__https://www.r-bloggers.com/2019/01/add-a-static-pdf-vignette-to-an-r-package/__;!!LQC6Cpwp!vcNeLBuZJDE3hWqjhjwi0NVVeEkEHhrSe847H98Eqj9ZEEBspCetgb6g-F7a518JPRd35jL-7xkOlj0$
for details.

Cheers, Dirk

--
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] warning: explicit assigning values of variable of type ....

2024-06-07 Thread Vladimir Dergachev




On Thu, 6 Jun 2024, Iris Simmons wrote:


Unless I'm misunderstanding, you're trying to pass a value by name to a
function. That is not a thing in C nor C++. However if you want to name the
arguments, you can do so with comments:

/* print = */ print


I would recommend against using comments in this fashion because while it 
tells you what you meant to do, the compiler does not know it. If you made 
an error the comment will make it harder to find.


If you happen to have a function with a lot of arguments of similar types 
and putting arguments in order is a concern, you can instead convert them 
to a struct and pass a struct instead:


typedef struct {
 int a;
 int b;
 } INPUT_TYPE1

void myfunc(INPUT_TYPE1 x);

And somewhere:
{
INPUT_TYPE1 x;

x.a=3;
x.b=4;

myfunc(x);
}

I don't know whether the modern compilers are smart enough to optimize 
this in the same way as passing an argument list. If this is a concern, 
probably some code restructuring is a good idea.


best

Vladimir Dergachev



On Thu, Jun 6, 2024, 19:16 Søren Højsgaard via R-package-devel <
r-package-devel@r-project.org> wrote:


Dear all,

From CRAN maintainers I recieve:


Flavor: r-devel-linux-x86_64-debian-gcc
Check: whether package can be installed, Result: WARNING
  Found the following significant warnings:
grips_fit_ips.cpp:149:45: warning: explicitly assigning value of
variable of type 'int' to itself [-Wself-assign]
grips_fit_ips.cpp:213:16: warning: explicitly assigning value of
variable of type 'int' to itself [-Wself-assign]
grips_fit_ips.cpp:254:10: warning: explicitly assigning value of
variable of type 'int' to itself [-Wself-assign]
grips_fit_ips.cpp:254:21: warning: explicitly assigning value of
variable of type 'double' to itself [-Wself-assign]



The first warning pertains to the line:

conips_inner_(S, K, elst0, clist0, print=print);

print on lhs of "=" is the formal name and print on rhs of "=" the name of
a variable. Does the compiler think I assign an integer
to itself? Like if I write

int a=7;
a=a;

Can anyone help me throw light on this?

Thanks in advance
Søren
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Options "reset" when options(opts)

2024-07-11 Thread Vladimir Dergachev




On Thu, 11 Jul 2024, David Hugh-Jones wrote:


This surprised me, even though it shouldn’t have done. (My false internal
model of the world was that oo <- options(); … options(oo) would overwrite
the entire options list with the old values.) I wonder if it would be worth
pointing out explicitly in ?options.


Arguably, it would be nice to have a parameter like "reset", so 
that one can call


options(oo, reset=TRUE)

and any options not explicitly passed by oo are set to NULL.

This way there are two modes of operation - bulk setting of subset of 
options with reset=FALSE, and restoring full options set with reset=TRUE.


best

Vladimir Dergachev



Writing: wyclif.substack.com
Book: www.wyclifsdust.com


On Thu, 11 Jul 2024 at 08:03, Greg Jefferis  wrote:


Dear John,

You need to collect the return value when setting options. This will
include an explicit NULL value for an option that was previously NULL.

Best,

Greg Jefferis.

options(digits.secs = NULL)

noset2 = function() {
  opts <- options(digits.secs = 3)
  on.exit(options(opts))
  print(opts)
}


getOption("digits.secs")

NULL


noset2()

$digits.secs
NULL


getOption("digits.secs")

NULL

Gregory Jefferis
Division of Neurobiology
MRC Laboratory of Molecular Biology
Francis Crick Avenue
Cambridge Biomedical Campus
Cambridge, CB2 OQH, UK

http://www2.mrc-lmb.cam.ac.uk/group-leaders/h-to-m/g-jefferis
http://jefferislab.org
https://www.zoo.cam.ac.uk/research/groups/connectomics



On 11 Jul 2024, at 06:08, John Muschelli  wrote:

When setting options in a function, I have always used the following:
 opts <- options()
 on.exit(options(opts), add = TRUE)
and assumed it "reset" options to what they were prior to running the
function.  But for some options that are set to NULL, it does not seem to
reset them.  Specifically, I have found digits.secs to be set after this
simple example below.  Is this expected behavior/documented?  Overall,

this

specific example (the one I encountered in the wild) is not that harmful,
but I wanted to ask before I set a fix for this in our work

noset = function() {
 opts = options()
 print(opts$digits.secs)
 on.exit(options(opts))
 options(digits.secs = 3)
}
getOption("digits.secs")
#> NULL
noset()
#> NULL
getOption("digits.secs")
#> [1] 3


John Muschelli, PhD
Associate Research Professor
Department of Biostatistics
Johns Hopkins Bloomberg School of Public Health

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] How to get arbitrary precise inputs from R for an Rcpp package?

2024-07-18 Thread Vladimir Dergachev



I see there are existing extended precision packages: Ryacas and Rmpfr, 
you might want to take a look at them and their representation of numbers 
with higher precision than a double.


best

Vladimir Dergachev

On Fri, 19 Jul 2024, Khue Tran wrote:


Hi,

I am trying to create an Rcpp package that involves arbitrary precise
calculations. The function to calculate e^x below with 100 digits precision
works well with integers, but for decimals, since the input is a double,
the result differs a lot from the arbitrary precise result I got on
Wolfram.

I understand the results are different since 0.1 cannot be represented
precisely in binary with limited bits. It is possible to enter 1 then 10
and get the multiprecision division of these two integers to attain a more
precise 0.1 in C++, but this method won't work on a large scale. Thus, I am
looking for a general solution to get more precise inputs?


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] How to get arbitrary precise inputs from R for an Rcpp package?

2024-07-18 Thread Vladimir Dergachev



Do you need to run eigen() on an arbitrary matrix or symmetric one ?

best

Vladimir Dergachev

On Fri, 19 Jul 2024, Khue Tran wrote:


Thank you Simon! This is very helpful! Regarding eigen, I found in the
Boost library the following example for arbitrary precision matrix solver:
https://github.com/boostorg/multiprecision/blob/develop/example/eigen_example.cpp.
I am not sure if the precision is fully preserved throughout the process,
but this example motivated me to try coding with the Boost library.

Best,
Khue Tran

On Fri, Jul 19, 2024 at 9:50 AM Simon Urbanek 
wrote:


Khue,



On 19/07/2024, at 11:32 AM, Khue Tran  wrote:

Thank you for the suggestion, Denes, Vladimir, and Dirk. I have indeed
looked into Rmpfr and while the package can interface GNU MPFR with R
smoothly, as of right now, it doesn't have all the functions I need (ie.
eigen for mpfr class) and when one input decimals, say 0.1 to mpfr(), the
precision is still limited by R's default double precision.




Don't use doubles, use decimal fractions:


Rmpfr::mpfr(gmp::as.bigq(1,10), 512)

1 'mpfr' number of precision  512   bits
[1]
0.1002

As for eigen() - I'm not aware of an arbitrary precision solver, so I
think the inputs are your least problem - most tools out there use LAPACK
which doesn't support arbitrary precision so your input precision is likely
irrelevant in this case.

Cheers,
Simon




Thank you for the note, Dirk. I will keep in mind to send any future
questions regarding Rcpp to the Rcpp-devel mailing list. I understand

that

the type used in the Boost library for precision is not one of the types
supported by SEXP, so it will be more complicated to map between the cpp
codes and R. Given Rmpfr doesn't provide all necessary mpfr calculations
(and embarking on interfacing Eigen with Rmpfr is not a small task), does
taking input as strings seem like the best option for me to get precise
inputs?

Sincerely,
Khue

On Fri, Jul 19, 2024 at 8:29 AM Dirk Eddelbuettel 

wrote:




Hi Khue,

On 19 July 2024 at 06:29, Khue Tran wrote:
| I am currently trying to get precise inputs by taking strings instead

of

| numbers then writing a function to decompose the string into a

rational

| with the denominator in the form of 10^(-n) where n is the number of
| decimal places. I am not sure if this is the only way or if there is a
| better method out there that I do not know of, so if you can think of

a

| general way to get precise inputs from users, it will be greatly
| appreciated!

That is one possible way. The constraint really is that the .Call()
interface
we use for all [1] extensions to R only knowns SEXP types which map to a
small set of known types: double, int, string, bool, ...  The type used

by

the Boost library you are using is not among them, so you have to add

code

to
map back and forth. Rcpp makes that easier; it is still far from

automatic.


R has packages such as Rmpfr interfacing GNU MPFR based on GMP. Maybe

that

is
good enough?  Also note that Rcpp has a dedicated (low volume and

friendly)

mailing list where questions such as this one may be better suited.

Cheers, Dirk

[1] A slight generalisation. There are others but they are less common /
not
recommended.

--
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



  [[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel






[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] How to get arbitrary precise inputs from R for an Rcpp package?

2024-07-18 Thread Vladimir Dergachev




On Fri, 19 Jul 2024, Khue Tran wrote:


I will need to run eigen() on a symmetric matrix, but I want to get arbitrary 
precise eigenvalues since we will need those eigenvalues for our further 
calculations. Does that make sense?


For a symmetric matrix there is a nice algorithm that is simple to 
implement and that converges fairly fast.


What you do is find the largest in absolute value off-diagonal entry.

The you construct 2x2 rotation matrix that zeros it: A --> R A R^-1, where 
R is a rotation matrix. The rotation matrix has mostly 1's on the main 
diagonal except for rows and columns of the large off-diagonal entry.


Repeat the procedure until the off-diagonal entries are as small as you 
want.


Because you are picking the largest value every time the algorithm is 
fairly stable.


The only disadvantage is that this algorithm plays havoc with sparse 
matrices by creating too many new non-zero entries, so for those you need 
to use a different method.


best

Vladimir Dergachev


Best,
Khue Tran

On Fri, Jul 19, 2024 at 12:14 PM Vladimir Dergachev  
wrote:

  Do you need to run eigen() on an arbitrary matrix or symmetric one ?

  best

  Vladimir Dergachev

  On Fri, 19 Jul 2024, Khue Tran wrote:

  > Thank you Simon! This is very helpful! Regarding eigen, I found in the
  > Boost library the following example for arbitrary precision matrix 
solver:
  > 
https://github.com/boostorg/multiprecision/blob/develop/example/eigen_example.cpp.
  > I am not sure if the precision is fully preserved throughout the 
process,
  > but this example motivated me to try coding with the Boost library.
  >
  > Best,
  > Khue Tran
  >
  > On Fri, Jul 19, 2024 at 9:50 AM Simon Urbanek 

  > wrote:
  >
  >> Khue,
  >>
  >>
  >>> On 19/07/2024, at 11:32 AM, Khue Tran  wrote:
  >>>
  >>> Thank you for the suggestion, Denes, Vladimir, and Dirk. I have indeed
  >>> looked into Rmpfr and while the package can interface GNU MPFR with R
  >>> smoothly, as of right now, it doesn't have all the functions I need 
(ie.
  >>> eigen for mpfr class) and when one input decimals, say 0.1 to mpfr(), 
the
  >>> precision is still limited by R's default double precision.
  >>>
  >>
  >>
  >> Don't use doubles, use decimal fractions:
  >>
  >>> Rmpfr::mpfr(gmp::as.bigq(1,10), 512)
  >> 1 'mpfr' number of precision  512   bits
  >> [1]
  >> 
0.1002
  >>
  >> As for eigen() - I'm not aware of an arbitrary precision solver, so I
  >> think the inputs are your least problem - most tools out there use 
LAPACK
  >> which doesn't support arbitrary precision so your input precision is 
likely
  >> irrelevant in this case.
  >>
  >> Cheers,
  >> Simon
  >>
  >>
  >>
  >>> Thank you for the note, Dirk. I will keep in mind to send any future
  >>> questions regarding Rcpp to the Rcpp-devel mailing list. I understand
  >> that
  >>> the type used in the Boost library for precision is not one of the 
types
  >>> supported by SEXP, so it will be more complicated to map between the 
cpp
  >>> codes and R. Given Rmpfr doesn't provide all necessary mpfr 
calculations
  >>> (and embarking on interfacing Eigen with Rmpfr is not a small task), 
does
  >>> taking input as strings seem like the best option for me to get 
precise
  >>> inputs?
  >>>
  >>> Sincerely,
  >>> Khue
  >>>
  >>> On Fri, Jul 19, 2024 at 8:29 AM Dirk Eddelbuettel 
  >> wrote:
  >>>
  >>>>
  >>>> Hi Khue,
  >>>>
  >>>> On 19 July 2024 at 06:29, Khue Tran wrote:
  >>>> | I am currently trying to get precise inputs by taking strings 
instead
  >> of
  >>>> | numbers then writing a function to decompose the string into a
  >> rational
  >>>> | with the denominator in the form of 10^(-n) where n is the number 
of
  >>>> | decimal places. I am not sure if this is the only way or if there 
is a
  >>>> | better method out there that I do not know of, so if you can think 
of
  >> a
  >>>> | general way to get precise inputs from users, it will be greatly
  >>>> | appreciated

Re: [R-pkg-devel] New package with C++ code causes R abort in RStudio, not in R console.

2024-11-12 Thread Vladimir Dergachev



Hi Luc,

On Tue, 12 Nov 2024, Luc De Wilde wrote:


Dear Vladimir,

thank you for your reply.

The model syntax is not simple though and the parser needs to look at the 
meaning in SEM terms to accept or reject certain things.


What do you mean by "SEM terms" ? I have not seen a situation yet that a 
grammar could not be handled by bison, it does have mechanism to deal with 
exceptions to pure LR syntax.




Moreover, this is only a first step and later other calculations need to 
be done in C++, which is why I find it important to know exactly why the 
code works in R console but not in RStudio, and of course what can be 
done to make it work in RStudio also.


My thought was that perhaps some sort of memory issues occurs because of 
the hand-written parser.


For example, one possibility is that stack sizes in R and Rstudio could be 
different. So if you are parsing something recursively it might work in 
one and not another.


The parsers generated by flex and bison are designed to handle arbitrary 
length inputs.


best

Vladimir Dergachev



Kind regards,


Luc De Wilde


Van: Vladimir Dergachev 
Verzonden: dinsdag 12 november 2024 18:15
Aan: Luc De Wilde 
CC: r-package-devel@r-project.org ; Yves Rosseel 

Onderwerp: Re: [R-pkg-devel] New package with C++ code causes R abort in 
RStudio, not in R console.


Hi Luc,

  The standard tools for writing parsers are "flex" and "bison" - they
generate code automatically and so can save you a lot of effort. For
language with simple syntax you can get away with just using "flex".

  Here are some examples:

Flex:

https://westes.github.io/flex/manual/Simple-Examples.html#Simple-Examples

Bison:

https://www.gnu.org/software/bison/manual/bison.html#Infix-Calc

best

Vladimir Dergachev

On Tue, 12 Nov 2024, Luc De Wilde wrote:


Dear R package developers,

I'm helping with the development of the lavaan package (see 
https://lavaan.ugent.be/) and currently writing a C++ version of the parser of 
the model syntax in lavaan. The package with C++ code is in  
https://github.com/lucdw/lavaanC.

When testing with a bunch of models, there is one model that causes an abort of 
the R session in RStudio (on Windows), but in the R console or in a batch job 
it causes no errors. The model is the following :
model <- '
F1 =~ "a b"*X1
F2 =~ a * X1 + 3*X2 # dat is hier een beetje commentaar
# efa block 2
efa("efa2")*f3 +
efa("efa2")*f4 =~ y1 + y2 + y3 + y1:y3
f4 := 3.14159 * F2
F1 ~ start(0.76)*F2 + a*F2
a == (b + f3)^2
b1 > exp(b2 + b3) '

and the translation can be tested - after installing lavaanC - with

lavaanC::lav_parse_model_string_c(model)

As mentioned, this causes an abort of the R session when executed in RStudio on 
Windows (10 or 11), but passes without problem in the R console or a batch job.

Because many users are using RStudio I 'd like to tackle this problem, but 
don't know how to pinpoint the cause of the problem. I hope some of you have an 
idea how to handle this problem ...

All the best,


Luc De Wilde

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel





__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] New package with C++ code causes R abort in RStudio, not in R console.

2024-11-12 Thread Vladimir Dergachev



Hi Luc,

  The standard tools for writing parsers are "flex" and "bison" - they 
generate code automatically and so can save you a lot of effort. For 
language with simple syntax you can get away with just using "flex".


  Here are some examples:

Flex:

https://westes.github.io/flex/manual/Simple-Examples.html#Simple-Examples

Bison:

https://www.gnu.org/software/bison/manual/bison.html#Infix-Calc

best

Vladimir Dergachev

On Tue, 12 Nov 2024, Luc De Wilde wrote:


Dear R package developers,

I'm helping with the development of the lavaan package (see 
https://lavaan.ugent.be/) and currently writing a C++ version of the parser of 
the model syntax in lavaan. The package with C++ code is in  
https://github.com/lucdw/lavaanC. 

When testing with a bunch of models, there is one model that causes an abort of 
the R session in RStudio (on Windows), but in the R console or in a batch job 
it causes no errors. The model is the following : 
model <- ' 
    F1 =~ "a b"*X1
    F2 =~ a * X1 + 3*X2 # dat is hier een beetje commentaar
    # efa block 2
    efa("efa2")*f3 +
    efa("efa2")*f4 =~ y1 + y2 + y3 + y1:y3
    f4 := 3.14159 * F2
    F1 ~ start(0.76)*F2 + a*F2
    a == (b + f3)^2
    b1 > exp(b2 + b3) '

and the translation can be tested - after installing lavaanC - with 

lavaanC::lav_parse_model_string_c(model)

As mentioned, this causes an abort of the R session when executed in RStudio on 
Windows (10 or 11), but passes without problem in the R console or a batch job.

Because many users are using RStudio I 'd like to tackle this problem, but 
don't know how to pinpoint the cause of the problem. I hope some of you have an 
idea how to handle this problem ...

All the best,


Luc De Wilde

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Removing packages files

2025-01-08 Thread Vladimir Dergachev



Hi Lluís,

  Just wanted to add to the discussion that it would be good to consider 
users that are disconnected or behind a firewall and are installing the 
package from file.


  An option to point the package to a separately downloaded file would be
useful.

best

Vladimir Dergachev

On Thu, 2 Jan 2025, Lluís Revilla wrote:


Hi list,

I am developing a package that will download some data, and I'd like
to store it locally to not recalculate it often.
The CRAN policy requires tools::R_user_dir to be used and "the
contents are actively managed (including removing outdated material)"
or using TMPDIR but "such usage should be cleaned up".

When loading a package there is .onLoad or .onAttach to fill or check
those files and other settings required for a package. Is there
something for when a package is removed?

I found some related functions like .Last or reg.fnalizer and setHook
or packageEvent but they are about closing a session or don't have a
specific event for when uninstalling packages via (remove.packages). I
appreciate any feedback, thanks in advance.

Best wishes and a happy new year,

Lluís

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel