[Rd] Mitigating Stalls Caused by Call Deparse on Error

2019-07-13 Thread brodie gaslam via R-devel
When large calls cause errors R may stall for extended periods.  This
is particularly likely to happen with `do.call`, as in this example 
with a 24 second stall:

    x <- runif(1e7)
    system.time(do.call(paste0, list(abs, x)))  # intentional error
    ## Error in (function (..., collapse = NULL)  : 
    ##   cannot coerce type 'builtin' to vector of type 'character'
    ## Calls: system.time -> do.call -> 
    ## Timing stopped at: 23.81 0.149 24.04

    str(.Traceback)
    ## Dotted pair list of 3
    ##  $ : chr [1:2500488] "(function (..., collapse = NULL) " 
".Internal(paste0(list(...), collapse)))(.Primitive(\"abs\"), 
c(0.718117154669017, " "0.494785501621664, 0.1453434410505, 0.635028422810137, 
0.0353180423844606, " "0.688418723642826, 0.889682895969599, 0.728154224809259, 
0.292572240810841, " ...
    ##  $ : chr "do.call(paste0, list(abs, x))"
    ##  $ : chr "system.time(do.call(paste0, list(abs, x)))"

The first time I noticed this I thought my session had frozen/crashed
as the standard interrupt ^C does not work during the deparse.  The
stall happens when on error the call stack is deparsed prior to being
saved to `.Traceback`.  The deparsing is done by `deparse1m` in native
code, with the value of `getOption('deparse.max.lines')` which
defaults to all lines.

Since there is little value to seeing millions of lines of deparsed
objects in `traceback()`, a simple work-around is to change the
`deparse.max.lines` value:

    options(deparse.max.lines=1)
    system.time(do.call(paste0, list(abs, x)))
    ## Error in (function (..., collapse = NULL)  : 
    ##   cannot coerce type 'builtin' to vector of type 'character'
    ## Calls: system.time -> do.call -> 
    ## Timing stopped at: 0 0 0

Unfortunately this will affect all `deparse` calls, and it seems
undesirable to pre-emptively enable it just for calls that might cause
large deparses on error.

An alternative is to store the actual calls instead of their deparsed
character equivalents in `.Traceback`.  This defers the deparsing to 
when `traceback()` is used.  As per `?traceback`, it should be
relatively safe to modify `.Traceback` in this way:

> It is undocumented where .Traceback is stored nor that it is
> visible, and this is subject to change.

Deferring the deparsing to `traceback()` will give us the 
opportunity to use a different `max.lines` setting as we do here 
with the patch applied:

    system.time(do.call(paste0, list(abs, x)))
    ## Error in (function (..., collapse = NULL)  : 
    ##   cannot coerce type 'builtin' to vector of type 'character'
    ## Timing stopped at: 0.028 0 0.029

    system.time(traceback(max.lines=3))
    ## 3: (function (..., collapse = NULL) 
    ##    .Internal(paste0(list(...), collapse)))(.Primitive("abs"), 
c(0.535468587651849, 
    ##    0.0540027911774814, 0.732930393889546, 0.565360915614292, 
0.544816034380347, 
    ## ...
    ## 2: do.call(paste0, list(abs, x))
    ## 1: system.time(do.call(paste0, list(abs, x)))
    ##    user  system elapsed 
    ##   0.000   0.000   0.003 


More generally, it might be better to have a different smaller default
value for the lines to deparse when calls  are _displayed_ as parts of 
lists, as is the case with `traceback()`, or in `print(sys.calls())` and
similar.

I attach a patch that does this.  I have run some basic tests 
and `make check-devel` passes. I can file an issue on bugzilla 
if that is a better place to have this conversation (assuming there 
is interest in it).

Best,

Brodie

PS: for some reason my mail client is refusing to attach the patch so I paste it
starting on the next line.
Index: src/gnuwin32/Rdll.hide
===
--- src/gnuwin32/Rdll.hide    (revision 76827)
+++ src/gnuwin32/Rdll.hide    (working copy)
@@ -94,6 +94,7 @@
  R_GetMaxNSize
  R_GetMaxVSize
  R_GetTraceback
+ R_GetTracebackParsed
  R_GetVarLocSymbol
  R_GetVarLocValue
  R_HandlerStack
Index: src/include/Defn.h
===
--- src/include/Defn.h    (revision 76827)
+++ src/include/Defn.h    (working copy)
@@ -1296,6 +1296,7 @@
 void NORET ErrorMessage(SEXP, int, ...);
 void WarningMessage(SEXP, R_WARNING, ...);
 SEXP R_GetTraceback(int);
+SEXP R_GetTracebackParsed(int);
 
 R_size_t R_GetMaxVSize(void);
 void R_SetMaxVSize(R_size_t);
Index: src/library/base/R/traceback.R
===
--- src/library/base/R/traceback.R    (revision 76827)
+++ src/library/base/R/traceback.R    (working copy)
@@ -16,9 +16,19 @@
 #  A copy of the GNU General Public License is available at
 #  https://www.R-project.org/Licenses/
 
-.traceback <- function(x = NULL) {
-    if(is.null(x) && !is.null(x <- get0(".Traceback", envir = baseenv(
-    {}
+.traceback <- function(x = NULL, max.lines=getOption("deparse.max.lines")) {
+    if(!(is.numeric(max.lines) && !is.na(max.lines) &&
+ as.integer(max.lines) > 0L)
+   

Re: [Rd] strange increase in the reference number

2019-07-13 Thread brodie gaslam via R-devel
 If you would like more details I wrote about this recently:
https://www.brodieg.com/2019/02/18/an-unofficial-reference-for-internal-inspect/
Basically as soon as you hit a closure R assumes that theargument might still 
have a surviving reference to iteven after the closure evaluation ends because 
theclosure environment still exists.  This is why thenamed count is set to the 
maximum value.

Note that since the time I wrote the above andnow the maximum reference count 
changed to 7 from 3.  Atsome point I believe there was a plan to add 
true(er?)reference counting in 3.6 to but that was not quiteready in time.

On Friday, July 12, 2019, 2:47:41 PM EDT, Duncan Murdoch 
 wrote:  
 
 On 12/07/2019 1:22 p.m., King Jiefei wrote:
> Hi,
> 
> I just found a strange increase in the reference number and I'm wondering
> if there is any reason for it, here is the code.
> 
>> a=c(1,2,3)
>> .Internal(inspect(a))
> @0x1bf0b9b0 14 REALSXP g0c3 [NAM(1)] (len=3, tl=0) 1,2,3
>> is.vector(a)
> [1] TRUE
>> .Internal(inspect(a))
> @0x1bf0b9b0 14 REALSXP g0c3 [NAM(7)] (len=3, tl=0) 1,2,3
> 
> The variable *a* initially has one reference number, after calling
> *is.vector* function, the reference number goes to 7, which I believe is
> the highest number that is allowed in R.  I also tried the other R
> functions, *is.atomic, is.integer* and *is.numeric* do not increase the
> reference number, but *typeof *will do. Is it intentional?

is.vector() is a closure that calls .Internal.  is.atomic(), 
is.integer() and is.numeric() are all primitives.

Generally speaking closures that call .Internal are easier to implement 
(e.g. is.vector can use the regular mechanism to set a default for its 
second argument), but less efficient in CPU time.  From it's help page, 
it appears that the logic for is.vector() is a lot more complex than for 
the others, so that implementation does make sense.

So why does NAMED go to 7?  Initially, the vector is bound to a.  Within 
is.vector, it is bound to the local variable x.  At this point there are 
two names bound to the same object, so it has to be considered 
immutable.  There's really no difference between any of the values of 2 
or more in the memory manager.  (But see 
http://developer.r-project.org/Refcnt.html for some plans.  That 
document is from about 5 years ago; I don't know the current state.)

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
  
[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] strange increase in the reference number

2019-07-13 Thread brodie gaslam via R-devel
Re ENSURE_NAMEDMAX, I am unsure but think this happens in (src/eval.c@492):
 static SEXP forcePromise(SEXP e)
{
    if (PRVALUE(e) == R_UnboundValue) {
    /* ... SNIP ...*/
    val = eval(PRCODE(e), PRENV(e));
    /* ... SNIP ...*/
    SET_PRSEEN(e, 0);
    SET_PRVALUE(e, val);
    ENSURE_NAMEDMAX(val); <<< HERE
    SET_PRENV(e, R_NilValue);
    }
    return PRVALUE(e);
}

as part of the evaluations of the closure.  `forcePromise` is called ineval 
(src/eval.c@656).  It's been a while since I've looked at the mechanicsof how 
the native version of `eval` works so I could be completely wrong.

B.

PS: line references are in r-devel@76287.


On Friday, July 12, 2019, 4:38:06 PM EDT, Gabriel Becker 
 wrote: 





Hi Jiefei and Duncan,

I suspect what is likely happening is that one of  ENSURE_NAMEDMAX or
MARK_NOT_MUTABLE are being hit for x. These used to set named to 3, but now
set it to 7 (ie the previous and current NAMEDMAX  value, respectively).

Because these are macros rather than C functions, its not easy to figure
out why one of them is being invoked from do_isvector  (a cursory
exploration didn't reveal what was going on, at least to me) and I don't
have the time to dig super deeply into this right now,  but perhaps Luke or
Tomas know why this is happening of the top of their head.

Sorry I can't be of more help.

~G



On Fri, Jul 12, 2019 at 11:47 AM Duncan Murdoch 
wrote:

> On 12/07/2019 1:22 p.m., King Jiefei wrote:
> > Hi,
> >
> > I just found a strange increase in the reference number and I'm wondering
> > if there is any reason for it, here is the code.
> >
> >> a=c(1,2,3)
> >> .Internal(inspect(a))
> > @0x1bf0b9b0 14 REALSXP g0c3 [NAM(1)] (len=3, tl=0) 1,2,3
> >> is.vector(a)
> > [1] TRUE
> >> .Internal(inspect(a))
> > @0x1bf0b9b0 14 REALSXP g0c3 [NAM(7)] (len=3, tl=0) 1,2,3
> >
> > The variable *a* initially has one reference number, after calling
> > *is.vector* function, the reference number goes to 7, which I believe is
> > the highest number that is allowed in R.  I also tried the other R
> > functions, *is.atomic, is.integer* and *is.numeric* do not increase the
> > reference number, but *typeof *will do. Is it intentional?
>
> is.vector() is a closure that calls .Internal.  is.atomic(),
> is.integer() and is.numeric() are all primitives.
>
> Generally speaking closures that call .Internal are easier to implement
> (e.g. is.vector can use the regular mechanism to set a default for its
> second argument), but less efficient in CPU time.  From it's help page,
> it appears that the logic for is.vector() is a lot more complex than for
> the others, so that implementation does make sense.
>
> So why does NAMED go to 7?  Initially, the vector is bound to a.  Within
> is.vector, it is bound to the local variable x.  At this point there are
> two names bound to the same object, so it has to be considered
> immutable.  There's really no difference between any of the values of 2
> or more in the memory manager.  (But see
> http://developer.r-project.org/Refcnt.html for some plans.  That
> document is from about 5 years ago; I don't know the current state.)
>
> Duncan Murdoch
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

    [[alternative HTML version deleted]]


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel