Re: [Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
2018-03-27 6:02 GMT+02:00 : > This has nothing to do with printing or dispatch per se. It is the > result of an internal register (R_ReturnedValue) being protected. It > gets rewritten whenever there is a jump, e.g. by an explicit return > call. So a simplified example is > > new_foo <- function() { > e <- new.env() > reg.finalizer(e, function(e) message("Finalizer called")) > e > } > > bar <- function(x) return(x) > > bar(new_foo()) > gc() # still in .Last.value > gc() # nothing > > UseMethod essentially does a return call so you see the effect there. Understood. Thanks for the explanation, Luke. > The R_ReturnedValue register could probably be safely cleared in more > places but it isn't clear exactly where. As things stand it will be > cleared on the next use of a non-local transfer of control, and those > happen frequently enough that I'm not convinced this is worth > addressing, at least not at this point in the release cycle. I barely know the R internals, and I'm sure there's a good reason behind this change (R 3.2.3 does not show this behaviour), but IMHO it's, at the very least, confusing. When .Last.value is cleared, that object loses the last reference, and I'd expect it to be eligible for gc. In my case, I was using an object that internally generates a bunch of data. I discovered this because I was benchmarking the execution, and I was running out of memory because the memory wasn't been freed as it was supposed to. So I spent half of the day on this because I thought I had a memory leak. :-\ (Not blaming anyone here, of course; just making a case to show that this may be worth addressing at some point). :-) Regards, Iñaki > > Best, > > luke > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
On 03/27/2018 09:51 AM, Iñaki Úcar wrote: 2018-03-27 6:02 GMT+02:00 : This has nothing to do with printing or dispatch per se. It is the result of an internal register (R_ReturnedValue) being protected. It gets rewritten whenever there is a jump, e.g. by an explicit return call. So a simplified example is new_foo <- function() { e <- new.env() reg.finalizer(e, function(e) message("Finalizer called")) e } bar <- function(x) return(x) bar(new_foo()) gc() # still in .Last.value gc() # nothing UseMethod essentially does a return call so you see the effect there. Understood. Thanks for the explanation, Luke. The R_ReturnedValue register could probably be safely cleared in more places but it isn't clear exactly where. As things stand it will be cleared on the next use of a non-local transfer of control, and those happen frequently enough that I'm not convinced this is worth addressing, at least not at this point in the release cycle. I barely know the R internals, and I'm sure there's a good reason behind this change (R 3.2.3 does not show this behaviour), but IMHO it's, at the very least, confusing. When .Last.value is cleared, that object loses the last reference, and I'd expect it to be eligible for gc. In my case, I was using an object that internally generates a bunch of data. I discovered this because I was benchmarking the execution, and I was running out of memory because the memory wasn't been freed as it was supposed to. So I spent half of the day on this because I thought I had a memory leak. :-\ (Not blaming anyone here, of course; just making a case to show that this may be worth addressing at some point). :-) From the perspective of the R user/programmer/package developer, please do not make any assumptions on when finalizers will be run, only that they indeed won't be run when the object is still alive. Similarly, it is not good to make any assumptions that "gc()" will actually run a collection (and a particular type of collection, that it will be immediately, etc). Such guarantees would too much restrict the design space and potential optimizations on the R internals side - and for this reason are typically not given in other managed languages, either. I've seen R examples where most time had been wasted tracing live objects because explicit "gc()" had been run in a tight loop. Note in Java for instance, an explicit call to gc() had been eventually turned into a hint only. Once you start debugging when objects are collected, you are debugging R internals - and surprises/changes between svn versions/etc should be expected as well as changes in behavior caused very indirectly by code changes somewhere else. I work on R internals and spend most of my time debugging - that is unfortunately normal when you work on a language runtime. Indeed, the runtime should try not to keep references to objects for too long, but it remains to be seen whether and for what cost this could be fixed with R_ReturnedValue. Best Tomas Regards, Iñaki Best, luke __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
2018-03-27 11:11 GMT+02:00 Tomas Kalibera : > On 03/27/2018 09:51 AM, Iñaki Úcar wrote: >> >> 2018-03-27 6:02 GMT+02:00 : >>> >>> This has nothing to do with printing or dispatch per se. It is the >>> result of an internal register (R_ReturnedValue) being protected. It >>> gets rewritten whenever there is a jump, e.g. by an explicit return >>> call. So a simplified example is >>> >>> new_foo <- function() { >>>e <- new.env() >>> reg.finalizer(e, function(e) message("Finalizer called")) >>>e >>>} >>> >>> bar <- function(x) return(x) >>> >>> bar(new_foo()) >>> gc() # still in .Last.value >>> gc() # nothing >>> >>> UseMethod essentially does a return call so you see the effect there. >> >> Understood. Thanks for the explanation, Luke. >> >>> The R_ReturnedValue register could probably be safely cleared in more >>> places but it isn't clear exactly where. As things stand it will be >>> cleared on the next use of a non-local transfer of control, and those >>> happen frequently enough that I'm not convinced this is worth >>> addressing, at least not at this point in the release cycle. >> >> I barely know the R internals, and I'm sure there's a good reason >> behind this change (R 3.2.3 does not show this behaviour), but IMHO >> it's, at the very least, confusing. When .Last.value is cleared, that >> object loses the last reference, and I'd expect it to be eligible for >> gc. >> >> In my case, I was using an object that internally generates a bunch of >> data. I discovered this because I was benchmarking the execution, and >> I was running out of memory because the memory wasn't been freed as it >> was supposed to. So I spent half of the day on this because I thought >> I had a memory leak. :-\ (Not blaming anyone here, of course; just >> making a case to show that this may be worth addressing at some >> point). :-) > > From the perspective of the R user/programmer/package developer, please do > not make any assumptions on when finalizers will be run, only that they > indeed won't be run when the object is still alive. Similarly, it is not > good to make any assumptions that "gc()" will actually run a collection (and > a particular type of collection, that it will be immediately, etc). Such > guarantees would too much restrict the design space and potential > optimizations on the R internals side - and for this reason are typically > not given in other managed languages, either. I've seen R examples where > most time had been wasted tracing live objects because explicit "gc()" had > been run in a tight loop. Note in Java for instance, an explicit call to > gc() had been eventually turned into a hint only. > > Once you start debugging when objects are collected, you are debugging R > internals - and surprises/changes between svn versions/etc should be > expected as well as changes in behavior caused very indirectly by code > changes somewhere else. I work on R internals and spend most of my time > debugging - that is unfortunately normal when you work on a language > runtime. Indeed, the runtime should try not to keep references to objects > for too long, but it remains to be seen whether and for what cost this could > be fixed with R_ReturnedValue. To be precise, I was not debugging *when* objects were collected, I was debugging *whether* objects were collected. And for that, I necessarily need some hint about the *when*. But I think that's another discussion. My point is that, as an R user and package developer, I expect consistency, and currently new_foo <- function() { e <- new.env() reg.finalizer(e, function(e) message("Finalizer called")) e } bar <- function(x) return(x) bar(new_foo()) gc() # still in .Last.value gc() # nothing behaves differently than new_foo <- function() { e <- new.env() reg.finalizer(e, function(e) message("Finalizer called")) e } bar <- function(x) x bar(new_foo()) gc() # still in .Last.value gc() # Finalizer called! And such a difference is not explained (AFAIK) in the documentation. At least the help page for 'return' does not make me think that I should not expect exactly the same behaviour if I write (or not) an explicit 'return'. Regards, Iñaki > > Best > Tomas > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
On 03/27/2018 11:53 AM, Iñaki Úcar wrote: 2018-03-27 11:11 GMT+02:00 Tomas Kalibera : On 03/27/2018 09:51 AM, Iñaki Úcar wrote: 2018-03-27 6:02 GMT+02:00 : This has nothing to do with printing or dispatch per se. It is the result of an internal register (R_ReturnedValue) being protected. It gets rewritten whenever there is a jump, e.g. by an explicit return call. So a simplified example is new_foo <- function() { e <- new.env() reg.finalizer(e, function(e) message("Finalizer called")) e } bar <- function(x) return(x) bar(new_foo()) gc() # still in .Last.value gc() # nothing UseMethod essentially does a return call so you see the effect there. Understood. Thanks for the explanation, Luke. The R_ReturnedValue register could probably be safely cleared in more places but it isn't clear exactly where. As things stand it will be cleared on the next use of a non-local transfer of control, and those happen frequently enough that I'm not convinced this is worth addressing, at least not at this point in the release cycle. I barely know the R internals, and I'm sure there's a good reason behind this change (R 3.2.3 does not show this behaviour), but IMHO it's, at the very least, confusing. When .Last.value is cleared, that object loses the last reference, and I'd expect it to be eligible for gc. In my case, I was using an object that internally generates a bunch of data. I discovered this because I was benchmarking the execution, and I was running out of memory because the memory wasn't been freed as it was supposed to. So I spent half of the day on this because I thought I had a memory leak. :-\ (Not blaming anyone here, of course; just making a case to show that this may be worth addressing at some point). :-) From the perspective of the R user/programmer/package developer, please do not make any assumptions on when finalizers will be run, only that they indeed won't be run when the object is still alive. Similarly, it is not good to make any assumptions that "gc()" will actually run a collection (and a particular type of collection, that it will be immediately, etc). Such guarantees would too much restrict the design space and potential optimizations on the R internals side - and for this reason are typically not given in other managed languages, either. I've seen R examples where most time had been wasted tracing live objects because explicit "gc()" had been run in a tight loop. Note in Java for instance, an explicit call to gc() had been eventually turned into a hint only. Once you start debugging when objects are collected, you are debugging R internals - and surprises/changes between svn versions/etc should be expected as well as changes in behavior caused very indirectly by code changes somewhere else. I work on R internals and spend most of my time debugging - that is unfortunately normal when you work on a language runtime. Indeed, the runtime should try not to keep references to objects for too long, but it remains to be seen whether and for what cost this could be fixed with R_ReturnedValue. To be precise, I was not debugging *when* objects were collected, I was debugging *whether* objects were collected. And for that, I necessarily need some hint about the *when*. They would be collected eventually if you were running a non-trivial program (because there would be a jump inside). But I think that's another discussion. My point is that, as an R user and package developer, I expect consistency, and currently new_foo <- function() { e <- new.env() reg.finalizer(e, function(e) message("Finalizer called")) e } bar <- function(x) return(x) bar(new_foo()) gc() # still in .Last.value gc() # nothing behaves differently than new_foo <- function() { e <- new.env() reg.finalizer(e, function(e) message("Finalizer called")) e } bar <- function(x) x bar(new_foo()) gc() # still in .Last.value gc() # Finalizer called! And such a difference is not explained (AFAIK) in the documentation. At least the help page for 'return' does not make me think that I should not expect exactly the same behaviour if I write (or not) an explicit 'return'. As R user and package developer, you should have consistency in _documented_ behavior. If not, it is a bug and has to be fixed either in the documentation, or in the code. You should never depend on undocumented behavior, because that can change at any time. You cannot expect that different versions of R would behave exactly the same, not even the svn versions, that is not possible and would not be possible even if we did not change any code in R implementation, because even the OS, C compiler, hardware, and third party libraries have their specified and unspecified behavior. Best Tomas Regards, Iñaki Best Tomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Typo in src/extra/tzone/registryTZ.c
Thanks! Fixed in R-devel, Tomas On 03/26/2018 03:22 PM, Korpela Mikko (MML) wrote: I stumbled upon a typo in a time zone name: Irtutsk should be Irkutsk. A patch is attached. I also checked that this is the only bug of its kind in this file, i.e., all the other Olson time zones occurring in the file can also be found in Unicode Common Locale Data Repository. - Mikko Korpela Index: src/extra/tzone/registryTZ.c === --- src/extra/tzone/registryTZ.c(revision 74465) +++ src/extra/tzone/registryTZ.c(working copy) @@ -303,7 +303,7 @@ { L"Russia Time Zone 4", "Asia/Yekaterinburg" }, { L"Russia Time Zone 5", "Asia/Novosibirsk" }, { L"Russia Time Zone 6", "Asia/Krasnoyarsk" }, -{ L"Russia Time Zone 7", "Asia/Irtutsk" }, +{ L"Russia Time Zone 7", "Asia/Irkutsk" }, { L"Russia Time Zone 8", "Asia/Yakutsk" }, { L"Russia Time Zone 9", "Asia/Magadan" }, { L"Russia Time Zone 10", "Asia/Srednekolymsk" }, __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
I have committed a change to R-devel that addresses this. To be on the safe side I need to run some more extensive tests before deciding if this can be ported to the release branch for R 3.5.0. Should know in a day or two. Best, luke On Tue, 27 Mar 2018, luke-tier...@uiowa.edu wrote: This has nothing to do with printing or dispatch per se. It is the result of an internal register (R_ReturnedValue) being protected. It gets rewritten whenever there is a jump, e.g. by an explicit return call. So a simplified example is new_foo <- function() { e <- new.env() reg.finalizer(e, function(e) message("Finalizer called")) e } bar <- function(x) return(x) bar(new_foo()) gc() # still in .Last.value gc() # nothing UseMethod essentially does a return call so you see the effect there. The R_ReturnedValue register could probably be safely cleared in more places but it isn't clear exactly where. As things stand it will be cleared on the next use of a non-local transfer of control, and those happen frequently enough that I'm not convinced this is worth addressing, at least not at this point in the release cycle. Best, luke On Mon, 26 Mar 2018, Iñaki Úcar wrote: Hi, I initially opened an issue in the R6 repo because my issue was with an R6 object. But Winston (thanks!) further simplified my example, and it turns out that the issue (whether a feature or a bug is yet to be seen) had to do with S3 dispatching. The following example, by Winston, depicts the issue: print.foo <- function(x, ...) { cat("print.foo called\n") invisible(x) } new_foo <- function() { e <- new.env() reg.finalizer(e, function(e) message("Finalizer called")) class(e) <- "foo" e } new_foo() gc() # still in .Last.value gc() # nothing I would expect that the second call to gc() should free 'e', but it's not. However, if we call now *any* S3 method, then the object can be finally gc'ed: print(1) gc() # Finalizer called So the hypothesis is that there is some kind of caching (?) mechanism going on. Intended behaviour or not, this is something that was introduced between R 3.2.3 and 3.3.2 (the first succeeds; from the second on, the example fails as described above). Regards, Iñaki PS: Further discussion and examples in https://github.com/r-lib/R6/issues/140 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] as.pairlist does not convert call objects
Dear all, It seems that as.pairlist does not convert call objects, producing results like the following: > is.pairlist(as.pairlist(quote(x + y))) [1] FALSE Should this behavior be expected? Thanks, Jialin > sessionInfo() R version 3.4.1 (2017-06-30) Platform: x86_64-suse-linux-gnu (64-bit) Running under: openSUSE Tumbleweed Matrix products: default BLAS: /usr/lib64/R/lib/libRblas.so LAPACK: /usr/lib64/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base other attached packages: [1] magrittr_1.5 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel