Re: [Rd] stats::fft produces inconsistent results

2021-10-21 Thread GILLIBERT, Andre
> Haha, thanks : ) I guess I will probably be grouchy too if seeing so many 
> people making the same mistakes again and again. It just happened to be me.


Fortunately, you did not get offensed. :)


This is nice to have a large community of developers for R packages, even if, 
sometimes, buggy packages are annoying R developers because any small change in 
R may "break" them even though they were actually broken from the begining.


> Indeed, I found myself often confused about when to PROTECT and when not.



A (relatively) quick explanation.

There are several �pools� of data objects that have different rules. The most 
common �pool� is the pool of garbage collectable R objects, that can be 
allocated with allocVector and is passed from R to C code and vice versa. 
Another pool is the malloc/free pool, that works with explicit 
allocation/deallocation. R does not modify the malloc/free implementation in 
any way, and memory leaks may happen. Operating systems may have other pools of 
memory (e.g. mmap'ed memory) that are not handled by R either. There is also a 
transient storage (R_alloc/vmaxset/vmaxget) that is automatically freed when 
returning from C to R, and should be used for temporary storage but not for 
objects returned to R code.



The PROTECT system is needed for garbage collectable objects.

The garbage collector may trigger whenever a R internal function is called. 
Typically, when some memory is internally allocated.

The garbage collector frees objects that are neither referenced directly nor 
indirectly from R code and from the PROTECT stack.

The PROTECT stack is used by C code to make sure objects that are not yet (or 
will never be) referenced by R code, are not destroyed when the garbage 
collector runs.



The functions allocating new R objects, such as allocVector(), but also 
coerceVector(), duplicate(),return unprotected objects, that may be destroyed 
the next time an internal R function is called, unless it is explicitly 
PROTECT'ed before. Indeed, such objects would have no reference from R code and 
so, would be deleted.


The PROTECT stack must be balanced on a call from R to a C function. There must 
be as many UNPROTECT'ions than PROTECT'ions.

The typical C code PROTECTs any object allocated as soon as it is allocated 
(e.g. call to allocVector or coerceVector). It UNPROTECTs temporary objects to 
"free" them (the actual memory release may be delayed to the next garbage 
collection). It UNPROTECTs the object it returns to R code. Indeed, in pure C 
code, there will be no garbage collection between the time the object is 
UNPROTECTed and the time R grabs the object. You must be very careful if you 
are using C++, because destructors must not call any R internal function that 
may trigger a garbage collection.
The arguments to the C code, do not have to be PROTECT'ed, unless they are 
re-allocated. For instance, it is frequent to call coerceVector or arguments 
and re-assign them to the C variable that represents the argument. The new 
object must be PROTECT'ed.


Actually, you do not need to *directly* PROTECT all objects that are allocated 
in the C function, but you must make sure that all objects are *indirectly* 
PROTECT'ed. For instance, you may allocate a VECSXP (a "list" in R) and fill 
the slots with newly allocated objects. You only need to PROTECT the VECSXP, 
since its slots are indirectly protected.


If you have any doubt, it is not a bug to over-PROTECT objects. It may slightly 
slow down garbage collection and use space on the PROTECTion stack, but that is 
rarely a big deal. You should only avoid that when that would lead to thousands 
or millions of protections.


As I said, the PROTECT stack must be balanced between the entry and exit of the 
C code. This is not a problem for 99% of functions that free all the memory 
they use internally except the object that is returned. Sometimes, some 
"background" memory, hidden to R code, may have to be allocated for more time. 
A call to R_PreserveObject protects the object, even after the C code returns 
to R, until R_ReleaseObject is called. Without an explicit call to 
R_ReleaseObject, memory is leaked!


There is another mechanism in R that must be known. If you call any R function 
from C code, or any internal R function that may fail with an error, or any 
internal R function that can be stopped by the user (see R_CheckUserInterrupt), 
then, R may call a longjmp to exit all the C code. This is very much 
incompatible with C++ exceptions or constructors/destructors. Rcpp can avoid, 
to some extent, that problem.


With C code, this means that some malloc'ed memory or allocated resources (file 
descriptors, sockets, etc.) may be leaked unless something is done to prevent 
that. All PROTECT'ed objects are automatically unprotected, so there is no 
problem with memory leak of garbage collectable objects. There is a 
R_UnwindProtect() mechanism to free temporary resources (e.g. a socket you 
allocated) when a 

Re: [Rd] BUG?: R CMD check with --as-cran *disables* checks for unused imports otherwise performed

2021-10-21 Thread Jeffrey Dick
FWIW, I also encountered this issue and posted on R-pkg-devel about it,
with no resolution at the time (May 2020). See "Dependencies NOTE lost with
--as-cran" (
https://stat.ethz.ch/pipermail/r-package-devel/2020q2/005467.html)

On Wed, Oct 20, 2021 at 11:55 PM Henrik Bengtsson <
henrik.bengts...@gmail.com> wrote:

> ISSUE:
>
> Using 'R CMD check' with --as-cran,
> set_R_CHECK_PACKAGES_USED_IGNORE_UNUSED_IMPORTS_=TRUE, whereas the
> default is FALSE, which you get if you don't add --as-cran.
> I would expect --as-cran to check more things and more be conservative
> than without.  So, is this behavior a mistake?  Could it be a thinko
> around the negating "IGNORE", and the behavior is meant to be vice
> verse?
>
> Example:
>
> $ R CMD check QDNAseq_1.29.4.tar.gz
> ...
> * using R version 4.1.1 (2021-08-10)
> * using platform: x86_64-pc-linux-gnu (64-bit)
> ...
> * checking dependencies in R code ... NOTE
> Namespace in Imports field not imported from: ‘future’
>   All declared Imports should be used.
>
> whereas, if I run with --as-cran, I don't get that NOTE;
>
> $ R CMD check --as-cran QDNAseq_1.29.4.tar.gz
> ...
> * checking dependencies in R code ... OK
>
>
> TROUBLESHOOTING:
>
> In src/library/tools/R/check.R [1], the following is set if --as-cran is
> used:
>
>   Sys.setenv("_R_CHECK_PACKAGES_USED_IGNORE_UNUSED_IMPORTS_" = "TRUE")
>
> whereas, if not set, the default is:
>
> ignore_unused_imports <-
>
> config_val_to_logical(Sys.getenv("_R_CHECK_PACKAGES_USED_IGNORE_UNUSED_IMPORTS_",
> "FALSE"))
>
> [1]
> https://github.com/wch/r-source/blob/b50e3f755674cbb697a4a7395b766647a5cfeea2/src/library/tools/R/check.R#L6335
> [2]
> https://github.com/wch/r-source/blob/b50e3f755674cbb697a4a7395b766647a5cfeea2/src/library/tools/R/QC.R#L5954-L5956
>
> /Henrik
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fwd: Using existing envars in Renviron on friendly Windows

2021-10-21 Thread Martin Maechler
> Michał Bojanowski 
> on Wed, 20 Oct 2021 16:31:08 +0200 writes:

> Hello Tomas,
> Yes, that's accurate although rather terse, which is perhaps the
> reason why I did not realize it applies to my case.

> How about adding something in the direction of:

> 1. Continuing the cited paragraph with:
> In particular, on Windows it may be necessary to quote references to
> existing environment variables, especially those containing file paths
> (which include backslashes). For example: `"${WINVAR}"`.

> 2. Add an example (not run):

> # On Windows do quote references to variables containing paths, e.g.:
> # If APPDATA=C:\Users\foobar\AppData\Roaming
> # to point to a library tree inside APPDATA in .Renviron use
> R_LIBS_USER="${APPDATA}"/R-library

> Incidentally the last example is on backslashes too.


> What do you think?

I agree that adding an example really helps a lot in such cases,
in my experience, notably if it's precise enough to be used +/- directly.



> On Mon, Oct 18, 2021 at 5:02 PM Tomas Kalibera  
wrote:
>> 
>> 
>> On 10/15/21 6:44 PM, Michał Bojanowski wrote:
>> > Perhaps a small update to ?.Renviron would be in order to mention 
that...
>> 
>> Would you have a more specific suggestion how to update the
>> documentation? Please note that it already says
>> 
>> "‘value’ is then processed in a similar way to a Unix shell: in
>> particular the outermost level of (single or double) quotes is stripped,
>> and backslashes are removed except inside quotes."
>> 
>> Thanks,
>> Tomas
>> 
>> > On Fri, Oct 15, 2021 at 6:43 PM Michał Bojanowski 
 wrote:
>> >> Indeed quoting works! Kevin suggested the same, but he didnt reply to 
the list.
>> >> Thank you all!
>> >> Michal
>> >>
>> >> On Fri, Oct 15, 2021 at 6:40 PM Ivan Krylov  
wrote:
>> >>> Sorry for the noise! I wasn't supposed to send my previous message.
>> >>>
>> >>> On Fri, 15 Oct 2021 16:44:28 +0200
>> >>> Michał Bojanowski  wrote:
>> >>>
>>  AVAR=${APPDATA}/foo/bar
>> 
>>  Which is a documented way of referring to existing environment
>>  variables. Now, with that in R I'm getting:
>> 
>>  Sys.getenv("APPDATA")# That works OK
>>  [1] "C:\\Users\\mbojanowski\\AppData\\Roaming"
>> 
>>  so OK, but:
>> 
>>  Sys.getenv("AVAR")
>>  [1] "C:UsersmbojanowskiAppDataRoaming/foo/bar"
>> >>> Hmm, a function called by readRenviron does seem to remove 
backslashes,
>> >>> but not if they are encountered inside quotes:
>> >>>
>> >>> 
https://github.com/r-devel/r-svn/blob/3f8b75857fb1397f9f3ceab6c75554e1a5386adc/src/main/Renviron.c#L149
>> >>>
>> >>> Would AVAR="${APPDATA}"/foo/bar work?
>> >>>
>> >>> --
>> >>> Best regards,
>> >>> Ivan
>> > __
>> > R-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stats::fft produces inconsistent results

2021-10-21 Thread Dipterix Wang
Thank you for such detailed and plain explanation. It is much clearer to me now 
w.r.t. the R internal memory management and how PROTECT should be used. 

Also after diving into the documentation of FFTW3 library, I think I found why 
the data was centered.

https://www.fftw.org/fftw3_doc/Planner-Flags.html

Basically 
1. FFTW3 modifies the input data by default 
2. one has to initialize the data after planning fft (except for some special 
situations). This “subtle” detail is buried in their documentation and is very 
hard to debug once a mistake is made. 

The second one actually causes CRAN package fftwtools to produce inconsistent 
results on osx (https://github.com/krahim/fftwtools/issues/15)

Best,
Dipterix

> On Oct 21, 2021, at 6:32 AM, GILLIBERT, Andre  
> wrote:
> 
> > Haha, thanks : ) I guess I will probably be grouchy too if seeing so many 
> > people making the same mistakes again and again. It just happened to be me.
> 
> Fortunately, you did not get offensed. :)
> 
> This is nice to have a large community of developers for R packages, even if, 
> sometimes, buggy packages are annoying R developers because any small change 
> in R may "break" them even though they were actually broken from the begining.
> 
> > Indeed, I found myself often confused about when to PROTECT and when not. 
>  
> A (relatively) quick explanation.
> There are several “pools” of data objects that have different rules. The most 
> common “pool” is the pool of garbage collectable R objects, that can be 
> allocated with allocVector and is passed from R to C code and vice versa. 
> Another pool is the malloc/free pool, that works with explicit 
> allocation/deallocation. R does not modify the malloc/free implementation in 
> any way, and memory leaks may happen. Operating systems may have other pools 
> of memory (e.g. mmap'ed memory) that are not handled by R either. There is 
> also a transient storage (R_alloc/vmaxset/vmaxget) that is automatically 
> freed when returning from C to R, and should be used for temporary storage 
> but not for objects returned to R code.
>  
> The PROTECT system is needed for garbage collectable objects.
> The garbage collector may trigger whenever a R internal function is called. 
> Typically, when some memory is internally allocated.
> The garbage collector frees objects that are neither referenced directly nor 
> indirectly from R code and from the PROTECT stack.
> The PROTECT stack is used by C code to make sure objects that are not yet (or 
> will never be) referenced by R code, are not destroyed when the garbage 
> collector runs.
>  
> The functions allocating new R objects, such as allocVector(), but also 
> coerceVector(), duplicate(),return unprotected objects, that may be destroyed 
> the next time an internal R function is called, unless it is explicitly 
> PROTECT'ed before. Indeed, such objects would have no reference from R code 
> and so, would be deleted.
>   
> The PROTECT stack must be balanced on a call from R to a C function. There 
> must be as many UNPROTECT'ions than PROTECT'ions.
> 
> The typical C code PROTECTs any object allocated as soon as it is allocated 
> (e.g. call to allocVector or coerceVector). It UNPROTECTs temporary objects 
> to "free" them (the actual memory release may be delayed to the next garbage 
> collection). It UNPROTECTs the object it returns to R code. Indeed, in pure C 
> code, there will be no garbage collection between the time the object is 
> UNPROTECTed and the time R grabs the object. You must be very careful if you 
> are using C++, because destructors must not call any R internal function that 
> may trigger a garbage collection.
> The arguments to the C code, do not have to be PROTECT'ed, unless they are 
> re-allocated. For instance, it is frequent to call coerceVector or arguments 
> and re-assign them to the C variable that represents the argument. The new 
> object must be PROTECT'ed.
> 
> Actually, you do not need to *directly* PROTECT all objects that are 
> allocated in the C function, but you must make sure that all objects are 
> *indirectly* PROTECT'ed. For instance, you may allocate a VECSXP (a "list" in 
> R) and fill the slots with newly allocated objects. You only need to PROTECT 
> the VECSXP, since its slots are indirectly protected.
> 
> If you have any doubt, it is not a bug to over-PROTECT objects. It may 
> slightly slow down garbage collection and use space on the PROTECTion stack, 
> but that is rarely a big deal. You should only avoid that when that would 
> lead to thousands or millions of protections.
> 
> As I said, the PROTECT stack must be balanced between the entry and exit of 
> the C code. This is not a problem for 99% of functions that free all the 
> memory they use internally except the object that is returned. Sometimes, 
> some "background" memory, hidden to R code, may have to be allocated for more 
> time. A call to R_PreserveObject protects the object, even after the C code 
> returns to R, 

Re: [Rd] stats::fft produces inconsistent results

2021-10-21 Thread Ben Bolker

  Nice!

On 10/21/21 4:26 PM, Dipterix Wang wrote:
Thank you for such detailed and plain explanation. It is much clearer to 
me now w.r.t. the R internal memory management and how PROTECT should be 
used.


Also after diving into the documentation of FFTW3 library, I think I 
found why the data was centered.


https://www.fftw.org/fftw3_doc/Planner-Flags.html 



Basically
1. FFTW3 modifies the input data by default
2. one has to initialize the data after planning fft (except for some 
special situations). This “subtle” detail is buried in their 
documentation and is very hard to debug once a mistake is made.


The second one actually causes CRAN package fftwtools to produce 
inconsistent results on osx 
(https://github.com/krahim/fftwtools/issues/15 
)


Best,
Dipterix

On Oct 21, 2021, at 6:32 AM, GILLIBERT, Andre 
mailto:andre.gillib...@chu-rouen.fr>> 
wrote:


> Haha, thanks : ) I guess I will probably be grouchy too if seeing so many 
people making the same mistakes again and again. It just happened to be me.

Fortunately, you did not get offensed. :)

This is nice to have a large community of developers for R packages, 
even if, sometimes, buggy packages are annoying R developers because 
any small change in R may "break" them even though they were actually 
broken from the begining.


>Indeed, I found myself often confused about when to PROTECT and when not.

A (relatively) quick explanation.
There are several “pools” of data objects that have different rules. 
The most common “pool” is the pool of garbage collectable R objects, 
that can be allocated with allocVector and is passed from R to C code 
and vice versa. Another pool is the malloc/free pool, that works with 
explicit allocation/deallocation. R does not modify the malloc/free 
implementation in any way, and memory leaks may happen. Operating 
systems may have other pools of memory (e.g. mmap'ed memory) that are 
not handled by R either. There is also a transient storage 
(R_alloc/vmaxset/vmaxget) that is automatically freed when returning 
from C to R, and should be used for temporary storage but not for 
objects returned to R code.


The PROTECT system is needed for garbage collectable objects.
The garbage collector may trigger whenever a R internal function is 
called. Typically, when some memory is internally allocated.
The garbage collector frees objects that are neither referenced 
directly nor indirectly from R code and from the PROTECT stack.
The PROTECT stack is used by C code to make sure objects that are not 
yet (or will never be) referenced by R code, are not destroyed when 
the garbage collector runs.


The functions allocating new R objects, such as allocVector(), but 
also coerceVector(), duplicate(),return unprotected objects, that may 
be destroyed the next time an internal R function is called, unless it 
is explicitly PROTECT'ed before. Indeed, such objects would have no 
reference from R code and so, would be deleted.


The PROTECT stack must be balanced on a call from R to a C function. 
There must be as many UNPROTECT'ions than PROTECT'ions.


The typical C code PROTECTs any object allocated as soon as it is 
allocated (e.g. call to allocVector or coerceVector). It UNPROTECTs 
temporary objects to "free" them (the actual memory release may be 
delayed to the next garbage collection). It UNPROTECTs the object it 
returns to R code. Indeed, in pure C code, there will be no garbage 
collection between the time the object is UNPROTECTed and the time R 
grabs the object. You must be very careful if you are using C++, 
because destructors must not call any R internal function that may 
trigger a garbage collection.
The arguments to the C code, do not have to be PROTECT'ed, unless they 
are re-allocated. For instance, it is frequent to call coerceVector or 
arguments and re-assign them to the C variable that represents the 
argument. The new object must be PROTECT'ed.


Actually, you do not need to *directly* PROTECT all objects that are 
allocated in the C function, but you must make sure that all objects 
are *indirectly* PROTECT'ed. For instance, you may allocate a VECSXP 
(a "list" in R) and fill the slots with newly allocated objects. You 
only need to PROTECT the VECSXP, since its slots are indirectly protected.


If you have any doubt, it is not a bug to over-PROTECT objects. It may 
slightly slow down garbage collection and use space on the PROTECTion 
stack, but that is rarely a big deal. You should only avoid that when 
that would lead to thousands or millions of protections.


As I said, the PROTECT stack must be balanced between the entry and 
exit of the C code. This is not a problem for 99% of functions that 
free all the memory they use internally except the object that is 
returned. Sometimes, some "background" memory, hidden to R code, may 
have to be allocated for more time. A call to R_PreserveObject 
protect