Re: [Rd] stats::fft produces inconsistent results

Ben Bolker Thu, 21 Oct 2021 13:55:08 -0700

  Nice!

On 10/21/21 4:26 PM, Dipterix Wang wrote:

Thank you for such detailed and plain explanation. It is much clearer tome now w.r.t. the R internal memory management and how PROTECT should beused.
Also after diving into the documentation of FFTW3 library, I think Ifound why the data was centered.
https://www.fftw.org/fftw3_doc/Planner-Flags.html<https://www.fftw.org/fftw3_doc/Planner-Flags.html>
Basically
1. FFTW3 modifies the input data by default
2. one has to initialize the data after planning fft (except for somespecial situations). This “subtle” detail is buried in theirdocumentation and is very hard to debug once a mistake is made.
The second one actually causes CRAN package fftwtools to produceinconsistent results on osx(https://github.com/krahim/fftwtools/issues/15<https://github.com/krahim/fftwtools/issues/15>)
Best,
Dipterix
On Oct 21, 2021, at 6:32 AM, GILLIBERT, Andre<[email protected] <mailto:[email protected]>>wrote:
> Haha, thanks : ) I guess I will probably be grouchy too if seeing so many 
people making the same mistakes again and again. It just happened to be me.

Fortunately, you did not get offensed. :)
This is nice to have a large community of developers for R packages,even if, sometimes, buggy packages are annoying R developers becauseany small change in R may "break" them even though they were actuallybroken from the begining.
>Indeed, I found myself often confused about when to PROTECT and when not.

A (relatively) quick explanation.
There are several “pools” of data objects that have different rules.The most common “pool” is the pool of garbage collectable R objects,that can be allocated with allocVector and is passed from R to C codeand vice versa. Another pool is the malloc/free pool, that works withexplicit allocation/deallocation. R does not modify the malloc/freeimplementation in any way, and memory leaks may happen. Operatingsystems may have other pools of memory (e.g. mmap'ed memory) that arenot handled by R either. There is also a transient storage(R_alloc/vmaxset/vmaxget) that is automatically freed when returningfrom C to R, and should be used for temporary storage but not forobjects returned to R code.
The PROTECT system is needed for garbage collectable objects.
The garbage collector may trigger whenever a R internal function iscalled. Typically, when some memory is internally allocated.The garbage collector frees objects that are neither referenceddirectly nor indirectly from R code and from the PROTECT stack.The PROTECT stack is used by C code to make sure objects that are notyet (or will never be) referenced by R code, are not destroyed whenthe garbage collector runs.
The functions allocating new R objects, such as allocVector(), butalso coerceVector(), duplicate(),return unprotected objects, that maybe destroyed the next time an internal R function is called, unless itis explicitly PROTECT'ed before. Indeed, such objects would have noreference from R code and so, would be deleted.
The PROTECT stack must be balanced on a call from R to a C function.There must be as many UNPROTECT'ions than PROTECT'ions.
The typical C code PROTECTs any object allocated as soon as it isallocated (e.g. call to allocVector or coerceVector). It UNPROTECTstemporary objects to "free" them (the actual memory release may bedelayed to the next garbage collection). It UNPROTECTs the object itreturns to R code. Indeed, in pure C code, there will be no garbagecollection between the time the object is UNPROTECTed and the time Rgrabs the object. You must be very careful if you are using C++,because destructors must not call any R internal function that maytrigger a garbage collection.The arguments to the C code, do not have to be PROTECT'ed, unless theyare re-allocated. For instance, it is frequent to call coerceVector orarguments and re-assign them to the C variable that represents theargument. The new object must be PROTECT'ed.
Actually, you do not need to *directly* PROTECT all objects that areallocated in the C function, but you must make sure that all objectsare *indirectly* PROTECT'ed. For instance, you may allocate a VECSXP(a "list" in R) and fill the slots with newly allocated objects. Youonly need to PROTECT the VECSXP, since its slots are indirectly protected.
If you have any doubt, it is not a bug to over-PROTECT objects. It mayslightly slow down garbage collection and use space on the PROTECTionstack, but that is rarely a big deal. You should only avoid that whenthat would lead to thousands or millions of protections.
As I said, the PROTECT stack must be balanced between the entry andexit of the C code. This is not a problem for 99% of functions thatfree all the memory they use internally except the object that isreturned. Sometimes, some "background" memory, hidden to R code, mayhave to be allocated for more time. A call to R_PreserveObjectprotects the object, even after the C code returns to R, untilR_ReleaseObject is called. Without an explicit call toR_ReleaseObject, memory is leaked!
There is another mechanism in R that must be known. If you call any Rfunction from C code, or any internal R function that may fail with anerror, or any internal R function that can be stopped by the user (seeR_CheckUserInterrupt), then, R may call a longjmp to exit all the Ccode. This is very much incompatible with C++ exceptions orconstructors/destructors. Rcpp can avoid, to some extent, that problem.
With C code, this means that some malloc'ed memory or allocatedresources (file descriptors, sockets, etc.) may be leaked unlesssomething is done to prevent that. All PROTECT'ed objects areautomatically unprotected, so there is no problem with memory leak ofgarbage collectable objects. There is a R_UnwindProtect() mechanism tofree temporary resources (e.g. a socket you allocated) when a longjmpis triggered. Non-memory resources (e.g. a socket) returned to Rshould use theR_MakeExternalPtr() mechanism to make sure that, whenthe memory is freed by the garbage collector, the resource is also freed.
"Writing R extensions" contains more extensive documentation, but Ihope that my quick description of the system will make it easier tounderstand the extensive documentation.
--
Sincèrement
André GILLIBERT
*De :*Dipterix Wang <[email protected]<mailto:[email protected]>>
*Envoyé :*mercredi 20 octobre 2021 22:01
*À :*Martin Maechler <[email protected]<mailto:[email protected]>>; GILLIBERT, Andre<[email protected]<mailto:[email protected]>>;[email protected]<mailto:[email protected]>
*Cc :*[email protected] <mailto:[email protected]>
*Objet :*Re: [Rd] stats::fft produces inconsistent results
ATTENTION:Cet e-mail provient d’une adresse mail extérieure au CHU deRouen. Ne cliquez pas sur les liens ou n'ouvrez pas les pièces jointesà moins de connaître l'expéditeur et de savoir que le contenu est sûr.En cas de doute, transférer le mail à « DSI, Sécurité » pour analyse.Merci de votre vigilance
Wow, you guys are amazing!

            as part of its pipeline, ravetools::mvfftw computes the
            mean of the
            input vector **and then centers it to a mean of zero**
            (intentionally or
            accidentally?)



            because variables are passed to compiled code by reference
            (someone
            can feel free to correct my terminology), this means that
            the original
            vector in R now has a mean of zero
I didn’t center the input vector in my code. The data was fed “as-is”into FFTW3. My guess is FFTW3 internally center the data. It could bethat FFTW3 library behave differently on different platforms, whichcould explain the reproducibility issue.
            /"Indeed, R vectors are passed "by reference" to C code,
            but the semantic must be "by value", i.e. the C function
            must NOT change the contents of the vector, except in very
            specific cases.”/
CRAN has already had fftw and fftwtools, the issue is the data I’mtargeting at are at GB-level, copying the vectors can be memoryinefficient or even use up memories. The strategy of ravetools is toimport signals from local files, fft, then directly write to disk. Soonly one reference will be used and modifying in-place is on purpose.In fact, and the fft functions I created are not intended to be useddirectly by users.
However, I should've been very cautious when using these functions.This is my fault. I’ll check the whole package to make sure only onereference is used or otherwise the vectors will be copied.
            /This can be tested by the MAYBE_REFERENCED() macro in
            Rinternals.h./

Nice to learn! I’ll add it to my code.

            Properly using R vectors in C code is tricky. You have to
            understand.

            1) When you are allowed or not to modify vectors

            2) When to PROTECT()vectors

            3) How the garbage collector works and when it can trigger
            (answer : basically, when you call any internal R function)

            Chapter 5 of "Writing R Extensions" documentation is quite
            extensive:

            
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C<https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C>

Indeed, I found myself often confused about when to PROTECT and when not.

                ... but then ravetools  is not even a CRAN package, so
                why should you dare to use it for anything serious ?
Haha, thanks : ) I guess I will probably be grouchy too if seeing somany people making the same mistakes again and again. It just happenedto be me.But it’s good to be rigorous. Sooner or later I'll have to face theseproblems. It’s better to make mistakes before having made many.
Thanks y’all!

Best,
- Dipterix Wang



    On Oct 20, 2021, at 5:32 AM, Martin Maechler
    <[email protected]<mailto:[email protected]>> wrote:

                        Martin Maechler
                           on Wed, 20 Oct 2021 11:26:21 +0200 writes:


    [............]


        Thank you, André , that's very good.



        Just to state the obvious conclusion:



        If Ben's suggestion is correct (and André has explained *how*
        that could happen) this would mean  a
        SEVERE BUG in package ravetools's  mvfftw() function.



        and it would have been (yet another) case of gaining speed by
        killing correctness...



        ... but then ravetools  is not even a CRAN package, so why
        should you dare to use it for anything serious ?



        ... yes, being grouchy ..


    which I should rather not be.

    Dipterix Wang *did* say initially that he is currently
    developing ravetools so it's very reasonabl this is not yet a
    CRAN package..

    Best,
    Martin


--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stats::fft produces inconsistent results

Reply via email to