[Rd] Mitigating Stalls Caused by Call Deparse on Error
When large calls cause errors R may stall for extended periods. This is particularly likely to happen with `do.call`, as in this example with a 24 second stall: x <- runif(1e7) system.time(do.call(paste0, list(abs, x))) # intentional error ## Error in (function (..., collapse = NULL) : ## cannot coerce type 'builtin' to vector of type 'character' ## Calls: system.time -> do.call -> ## Timing stopped at: 23.81 0.149 24.04 str(.Traceback) ## Dotted pair list of 3 ## $ : chr [1:2500488] "(function (..., collapse = NULL) " ".Internal(paste0(list(...), collapse)))(.Primitive(\"abs\"), c(0.718117154669017, " "0.494785501621664, 0.1453434410505, 0.635028422810137, 0.0353180423844606, " "0.688418723642826, 0.889682895969599, 0.728154224809259, 0.292572240810841, " ... ## $ : chr "do.call(paste0, list(abs, x))" ## $ : chr "system.time(do.call(paste0, list(abs, x)))" The first time I noticed this I thought my session had frozen/crashed as the standard interrupt ^C does not work during the deparse. The stall happens when on error the call stack is deparsed prior to being saved to `.Traceback`. The deparsing is done by `deparse1m` in native code, with the value of `getOption('deparse.max.lines')` which defaults to all lines. Since there is little value to seeing millions of lines of deparsed objects in `traceback()`, a simple work-around is to change the `deparse.max.lines` value: options(deparse.max.lines=1) system.time(do.call(paste0, list(abs, x))) ## Error in (function (..., collapse = NULL) : ## cannot coerce type 'builtin' to vector of type 'character' ## Calls: system.time -> do.call -> ## Timing stopped at: 0 0 0 Unfortunately this will affect all `deparse` calls, and it seems undesirable to pre-emptively enable it just for calls that might cause large deparses on error. An alternative is to store the actual calls instead of their deparsed character equivalents in `.Traceback`. This defers the deparsing to when `traceback()` is used. As per `?traceback`, it should be relatively safe to modify `.Traceback` in this way: > It is undocumented where .Traceback is stored nor that it is > visible, and this is subject to change. Deferring the deparsing to `traceback()` will give us the opportunity to use a different `max.lines` setting as we do here with the patch applied: system.time(do.call(paste0, list(abs, x))) ## Error in (function (..., collapse = NULL) : ## cannot coerce type 'builtin' to vector of type 'character' ## Timing stopped at: 0.028 0 0.029 system.time(traceback(max.lines=3)) ## 3: (function (..., collapse = NULL) ## .Internal(paste0(list(...), collapse)))(.Primitive("abs"), c(0.535468587651849, ## 0.0540027911774814, 0.732930393889546, 0.565360915614292, 0.544816034380347, ## ... ## 2: do.call(paste0, list(abs, x)) ## 1: system.time(do.call(paste0, list(abs, x))) ## user system elapsed ## 0.000 0.000 0.003 More generally, it might be better to have a different smaller default value for the lines to deparse when calls are _displayed_ as parts of lists, as is the case with `traceback()`, or in `print(sys.calls())` and similar. I attach a patch that does this. I have run some basic tests and `make check-devel` passes. I can file an issue on bugzilla if that is a better place to have this conversation (assuming there is interest in it). Best, Brodie PS: for some reason my mail client is refusing to attach the patch so I paste it starting on the next line. Index: src/gnuwin32/Rdll.hide === --- src/gnuwin32/Rdll.hide (revision 76827) +++ src/gnuwin32/Rdll.hide (working copy) @@ -94,6 +94,7 @@ R_GetMaxNSize R_GetMaxVSize R_GetTraceback + R_GetTracebackParsed R_GetVarLocSymbol R_GetVarLocValue R_HandlerStack Index: src/include/Defn.h === --- src/include/Defn.h (revision 76827) +++ src/include/Defn.h (working copy) @@ -1296,6 +1296,7 @@ void NORET ErrorMessage(SEXP, int, ...); void WarningMessage(SEXP, R_WARNING, ...); SEXP R_GetTraceback(int); +SEXP R_GetTracebackParsed(int); R_size_t R_GetMaxVSize(void); void R_SetMaxVSize(R_size_t); Index: src/library/base/R/traceback.R === --- src/library/base/R/traceback.R (revision 76827) +++ src/library/base/R/traceback.R (working copy) @@ -16,9 +16,19 @@ # A copy of the GNU General Public License is available at # https://www.R-project.org/Licenses/ -.traceback <- function(x = NULL) { - if(is.null(x) && !is.null(x <- get0(".Traceback", envir = baseenv( - {} +.traceback <- function(x = NULL, max.lines=getOption("deparse.max.lines")) { + if(!(is.numeric(max.lines) && !is.na(max.lines) && + as.integer(max.lines) > 0L) +
Re: [Rd] strange increase in the reference number
If you would like more details I wrote about this recently: https://www.brodieg.com/2019/02/18/an-unofficial-reference-for-internal-inspect/ Basically as soon as you hit a closure R assumes that theargument might still have a surviving reference to iteven after the closure evaluation ends because theclosure environment still exists. This is why thenamed count is set to the maximum value. Note that since the time I wrote the above andnow the maximum reference count changed to 7 from 3. Atsome point I believe there was a plan to add true(er?)reference counting in 3.6 to but that was not quiteready in time. On Friday, July 12, 2019, 2:47:41 PM EDT, Duncan Murdoch wrote: On 12/07/2019 1:22 p.m., King Jiefei wrote: > Hi, > > I just found a strange increase in the reference number and I'm wondering > if there is any reason for it, here is the code. > >> a=c(1,2,3) >> .Internal(inspect(a)) > @0x1bf0b9b0 14 REALSXP g0c3 [NAM(1)] (len=3, tl=0) 1,2,3 >> is.vector(a) > [1] TRUE >> .Internal(inspect(a)) > @0x1bf0b9b0 14 REALSXP g0c3 [NAM(7)] (len=3, tl=0) 1,2,3 > > The variable *a* initially has one reference number, after calling > *is.vector* function, the reference number goes to 7, which I believe is > the highest number that is allowed in R. I also tried the other R > functions, *is.atomic, is.integer* and *is.numeric* do not increase the > reference number, but *typeof *will do. Is it intentional? is.vector() is a closure that calls .Internal. is.atomic(), is.integer() and is.numeric() are all primitives. Generally speaking closures that call .Internal are easier to implement (e.g. is.vector can use the regular mechanism to set a default for its second argument), but less efficient in CPU time. From it's help page, it appears that the logic for is.vector() is a lot more complex than for the others, so that implementation does make sense. So why does NAMED go to 7? Initially, the vector is bound to a. Within is.vector, it is bound to the local variable x. At this point there are two names bound to the same object, so it has to be considered immutable. There's really no difference between any of the values of 2 or more in the memory manager. (But see http://developer.r-project.org/Refcnt.html for some plans. That document is from about 5 years ago; I don't know the current state.) Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] strange increase in the reference number
Re ENSURE_NAMEDMAX, I am unsure but think this happens in (src/eval.c@492): static SEXP forcePromise(SEXP e) { if (PRVALUE(e) == R_UnboundValue) { /* ... SNIP ...*/ val = eval(PRCODE(e), PRENV(e)); /* ... SNIP ...*/ SET_PRSEEN(e, 0); SET_PRVALUE(e, val); ENSURE_NAMEDMAX(val); <<< HERE SET_PRENV(e, R_NilValue); } return PRVALUE(e); } as part of the evaluations of the closure. `forcePromise` is called ineval (src/eval.c@656). It's been a while since I've looked at the mechanicsof how the native version of `eval` works so I could be completely wrong. B. PS: line references are in r-devel@76287. On Friday, July 12, 2019, 4:38:06 PM EDT, Gabriel Becker wrote: Hi Jiefei and Duncan, I suspect what is likely happening is that one of ENSURE_NAMEDMAX or MARK_NOT_MUTABLE are being hit for x. These used to set named to 3, but now set it to 7 (ie the previous and current NAMEDMAX value, respectively). Because these are macros rather than C functions, its not easy to figure out why one of them is being invoked from do_isvector (a cursory exploration didn't reveal what was going on, at least to me) and I don't have the time to dig super deeply into this right now, but perhaps Luke or Tomas know why this is happening of the top of their head. Sorry I can't be of more help. ~G On Fri, Jul 12, 2019 at 11:47 AM Duncan Murdoch wrote: > On 12/07/2019 1:22 p.m., King Jiefei wrote: > > Hi, > > > > I just found a strange increase in the reference number and I'm wondering > > if there is any reason for it, here is the code. > > > >> a=c(1,2,3) > >> .Internal(inspect(a)) > > @0x1bf0b9b0 14 REALSXP g0c3 [NAM(1)] (len=3, tl=0) 1,2,3 > >> is.vector(a) > > [1] TRUE > >> .Internal(inspect(a)) > > @0x1bf0b9b0 14 REALSXP g0c3 [NAM(7)] (len=3, tl=0) 1,2,3 > > > > The variable *a* initially has one reference number, after calling > > *is.vector* function, the reference number goes to 7, which I believe is > > the highest number that is allowed in R. I also tried the other R > > functions, *is.atomic, is.integer* and *is.numeric* do not increase the > > reference number, but *typeof *will do. Is it intentional? > > is.vector() is a closure that calls .Internal. is.atomic(), > is.integer() and is.numeric() are all primitives. > > Generally speaking closures that call .Internal are easier to implement > (e.g. is.vector can use the regular mechanism to set a default for its > second argument), but less efficient in CPU time. From it's help page, > it appears that the logic for is.vector() is a lot more complex than for > the others, so that implementation does make sense. > > So why does NAMED go to 7? Initially, the vector is bound to a. Within > is.vector, it is bound to the local variable x. At this point there are > two names bound to the same object, so it has to be considered > immutable. There's really no difference between any of the values of 2 > or more in the memory manager. (But see > http://developer.r-project.org/Refcnt.html for some plans. That > document is from about 5 years ago; I don't know the current state.) > > Duncan Murdoch > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel