date:20190521

Re: [Rd] anyNA() performance on vectors of POSIXct

2019-05-21 Thread Joshua Ulrich

On Wed, May 1, 2019 at 7:45 AM Harvey Smith  wrote:
>
> Inside of the anyNA() function, it will use the legacy any(is.na()) code if
> x is an OBJECT().  If x is a vector of POSIXct, it will be an OBJECT(), but
> it is also TYPEOF(x) == REALSXP.  Therefore, it will skip the faster
> ITERATE_BY_REGION, which is typically 5x faster in my testing.
>
> Is the OBJECT() condition really necessary, or could it be moved after the
> switch() for the individual TYPEOF(x) ITERATE_BY_REGION calls?
>
> # script to demonstrate performance difference if x is an OBJECT or not by
> using unclass()
> x.posixct = Sys.time() + 1:1e6
> microbenchmark::microbenchmark(
>   any(is.na( x.posixct )),
>   anyNA( x.posixct ),
>   anyNA( unclass(x.posixct) ),
>   unit='ms')
>
>
>
> static Rboolean anyNA(SEXP call, SEXP op, SEXP args, SEXP env)
> {
>   SEXP x = CAR(args);
>   SEXPTYPE xT = TYPEOF(x);
>   Rboolean isList =  (xT == VECSXP || xT == LISTSXP), recursive = FALSE;
>
>   if (isList && length(args) > 1) recursive = asLogical(CADR(args));
>   *if (OBJECT(x) || (isList && !recursive)) {*
> SEXP e0 = PROTECT(lang2(install("is.na"), x));
> SEXP e = PROTECT(lang2(install("any"), e0));
> SEXP res = PROTECT(eval(e, env));
> int ans = asLogical(res);
> UNPROTECT(3);
> return ans == 1; // so NA answer is false.
>   }
>
>   R_xlen_t i, n = xlength(x);
>   switch (xT) {
> case REALSXP:
> {
>   if(REAL_NO_NA(x))
> return FALSE;
>   ITERATE_BY_REGION(x, xD, i, nbatch, double, REAL, {
> for (int k = 0; k < nbatch; k++)
>   if (ISNAN(xD[k]))
> return TRUE;
>   });
>   break;
> }
>

I'm interested in this as well, because it causes performance
degradation in xts subsetting:
https://github.com/joshuaulrich/xts/issues/296

Would it be possible to special-case POSIXct, and perhaps other types
defined in base+recommended packages?

Best,
Josh

-- 
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com
R/Finance 2019 | www.rinfinance.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Patch to replace "his" in Writing R Extensions

2019-05-21 Thread Maëlle SALMON via R-devel

Dear R-devel team,

Many thanks for the great resource that is "Writing R Extensions"! 

I noticed two occurrences of "his", one to refer to the R package user, another 
to refer to the R package author. Folks in these two groups are not all men, so 
I suggest changing the word to "their" to make it gender-neutral. Attached is a 
patch for your consideration. 

Thanks for your time, best regards, 

Maëlle.Index: doc/manual/R-exts.texi
===
--- doc/manual/R-exts.texi	(revision 76557)
+++ doc/manual/R-exts.texi	(working copy)
@@ -3450,7 +3450,7 @@
 outputs in the package sources it is not necessary that these can be
 re-built at install time, i.e., the package author can use private @R{}
 packages, screen snapshots and @LaTeX{} extensions which are only
-available on his machine.@footnote{provided the conditions of the
+available on their machine.@footnote{provided the conditions of the
 package's license are met: many, including @acronym{CRAN}, see the
 omission of source components as incompatible with an Open Source
 license.}
@@ -4538,7 +4538,7 @@
 Under no circumstances should your compiled code ever call @code{abort}
 or @code{exit}@footnote{or where supported the variants @code{_Exit} and
 @code{_exit}.}: these terminate the user's @R{} process, quite possibly
-including all his unsaved work.  One usage that could call @code{abort}
+including all their unsaved work.  One usage that could call @code{abort}
 is the @code{assert} macro in C or C++ functions, which should never be
 active in production code.  The normal way to ensure that is to define
 the macro @code{NDEBUG}, and @command{R CMD INSTALL} does so as part of
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Patch to replace "his" in Writing R Extensions

2019-05-21 Thread Tierney, Luke

Thanks. Addressed in r76559 (trunk) and r76560 (R-3-6-branch).

Best,

luke

On Tue, 21 May 2019, Maëlle SALMON via R-devel wrote:

> Dear R-devel team,
>
> Many thanks for the great resource that is "Writing R Extensions"!
>
> I noticed two occurrences of "his", one to refer to the R package user, 
> another to refer to the R package author. Folks in these two groups are not 
> all men, so I suggest changing the word to "their" to make it gender-neutral. 
> Attached is a patch for your consideration.
>
> Thanks for your time, best regards,
>
> Maëlle.

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] anyNA() performance on vectors of POSIXct

2019-05-21 Thread Martin Maechler

> Harvey Smith 
> on Wed, 1 May 2019 03:20:55 -0400 writes:

> Inside of the anyNA() function, it will use the legacy any(is.na()) code 
if
> x is an OBJECT().  If x is a vector of POSIXct, it will be an OBJECT(), 
but
> it is also TYPEOF(x) == REALSXP.  Therefore, it will skip the faster
> ITERATE_BY_REGION, which is typically 5x faster in my testing.

> Is the OBJECT() condition really necessary, or could it be moved after the
> switch() for the individual TYPEOF(x) ITERATE_BY_REGION calls?

 "necessary ?" :  yes, in the following sense :

When it was introduced, the idea of anyNA(.) has been that it
should be equivalent (but often faster) than  any(is.na(.)).
As anyNA() was only introduced quite recently (*)
and many (S3 and S4) classes have had  is.na() methods defined
for them but -- initially at least -- not an anyNA().

So to ensure  the equivalenceanyNA(x)  ===   any(is.na(x))
for "all" R objects 'x', that OBJECT(.) condition had been
important and necessary.

Still, being the person who had added  anyNA() to R,
I'm naturally sympathetic to have it faster in cases such as
"Date" or "POSIXct" objects.

I'd find it ugly to test for these classes specifically in the C code (via
the equivalent of  inherits(., "POSIXct")
  {{ *NOT* via the really wrong  class(.)[[1]] == "POSIXct"
  that I see in some "experts" R code, because that fails
  for all class extensions ! }}
but that may still be an option;

Yet alternatively, one *could* consider changing the API and
declare that for atomic types with a class {i.e. OBJECT(.)}, and
*if* there is no anyNA() method, anyNA() will use the "atomic"
fast method, instead of using any(is.na(.)).

This may break existing code in packages, but the maintainers of
that code could solve the problems by providing  anyNA(.)
methods for their objects.

Other opinions / ideas ?

Martin Maechler
ETH Zurich / R Core Team

--
*) in Spring 2013, but too late for R 3.0.0;
   "recently", considering R's history starting with S in the early 1980's

> # script to demonstrate performance difference if x is an OBJECT or not by
> using unclass()
> x.posixct = Sys.time() + 1:1e6
> microbenchmark::microbenchmark(
>   any(is.na( x.posixct )),
>   anyNA( x.posixct ),
>   anyNA( unclass(x.posixct) ),
>   unit='ms')
> 
> 
> 
> static Rboolean anyNA(SEXP call, SEXP op, SEXP args, SEXP env)
> {
>   SEXP x = CAR(args);
>   SEXPTYPE xT = TYPEOF(x);
>   Rboolean isList =  (xT == VECSXP || xT == LISTSXP), recursive = FALSE;
> 
>   if (isList && length(args) > 1) recursive = asLogical(CADR(args));
>   *if (OBJECT(x) || (isList && !recursive)) {*
> SEXP e0 = PROTECT(lang2(install("is.na"), x));
> SEXP e = PROTECT(lang2(install("any"), e0));
> SEXP res = PROTECT(eval(e, env));
> int ans = asLogical(res);
> UNPROTECT(3);
> return ans == 1; // so NA answer is false.
>   }
> 
>   R_xlen_t i, n = xlength(x);
>   switch (xT) {
> case REALSXP:
> {
>   if(REAL_NO_NA(x))
> return FALSE;
>   ITERATE_BY_REGION(x, xD, i, nbatch, double, REAL, {
> for (int k = 0; k < nbatch; k++)
>   if (ISNAN(xD[k]))
> return TRUE;
>   });
>   break;
> }
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] print.() not called when autoprinting

2019-05-21 Thread Martin Maechler

> William Dunlap via R-devel 
> on Thu, 16 May 2019 11:56:45 -0700 writes:

> In R-3.6.0 autoprinting was changed so that print methods for the storage
> modes are not called when there is no explicit class attribute.   E.g.,

> % R-3.6.0 --vanilla --quiet
>> print.function <- function(x, ...) { cat("Function with argument list ");
> cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
>> f <- function(x, ...) { sum( x * seq_along(x) ) }
>> f
> function(x, ...) { sum( x * seq_along(x) ) }
>> print(f)
> Function with argument list function (x, ...)

> Previous to R-3.6.0 autoprinting did call such methods
> % R-3.5.3 --vanilla --quiet
>> print.function <- function(x, ...) { cat("Function with argument list ");
> cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
>> f <- function(x, ...) { sum( x * seq_along(x) ) }
>> f
> Function with argument list function (x, ...)
>> print(f)
> Function with argument list function (x, ...)

> Was this intentional?

No, it was not.  ... and I've been the one committing the wrong change.

... Related to the NEWS entries which start

 "Changes in print.*() "

Thank you Bill, for reporting

It's amazing this has not been detected earlier by anybody.

I think it is *only* for functions, not general
print.() as you were suggesting - right?

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] print.() not called when autoprinting

2019-05-21 Thread William Dunlap via R-devel

It also is a problem with storage.modes "integer" and "complex":

3.6.0> print.integer <- function(x,...) "integer vector"
3.6.0> 1:10
 [1]  1  2  3  4  5  6  7  8  9 10
3.6.0> print(1:10)
[1] "integer vector"
3.6.0>
3.6.0> print.complex <- function(x, ...) "complex vector"
3.6.0> 1+2i
[1] 1+2i
3.6.0> print(1+2i)
[1] "complex vector"

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, May 21, 2019 at 9:31 AM Martin Maechler 
wrote:

> > William Dunlap via R-devel
> > on Thu, 16 May 2019 11:56:45 -0700 writes:
>
> > In R-3.6.0 autoprinting was changed so that print methods for the
> storage
> > modes are not called when there is no explicit class attribute.
>  E.g.,
>
> > % R-3.6.0 --vanilla --quiet
> >> print.function <- function(x, ...) { cat("Function with argument
> list ");
> > cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
> >> f <- function(x, ...) { sum( x * seq_along(x) ) }
> >> f
> > function(x, ...) { sum( x * seq_along(x) ) }
> >> print(f)
> > Function with argument list function (x, ...)
>
> > Previous to R-3.6.0 autoprinting did call such methods
> > % R-3.5.3 --vanilla --quiet
> >> print.function <- function(x, ...) { cat("Function with argument
> list ");
> > cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
> >> f <- function(x, ...) { sum( x * seq_along(x) ) }
> >> f
> > Function with argument list function (x, ...)
> >> print(f)
> > Function with argument list function (x, ...)
>
> > Was this intentional?
>
> No, it was not.  ... and I've been the one committing the wrong change.
>
> ... Related to the NEWS entries which start
>
>  "Changes in print.*() "
>
> Thank you Bill, for reporting
>
> It's amazing this has not been detected earlier by anybody.
>
> I think it is *only* for functions, not general
> print.() as you were suggesting - right?
>
> Martin
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] print.() not called when autoprinting

2019-05-21 Thread Lionel Henry

FWIW it was the intention of the patch to make printing of unclassed
functions consistent with other base types. This was documented in the
"patch 3" section:

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17398

I think we need a general way to customise auto-printing for base types
and even classed objects as that'd be useful for both users and IDEs.

However S3 dispatch may not be optimal for this because it essentially
requires polluting the global environment with print methods. Maybe
it'd make sense to add getOption("autoprint") which should be set to
a user- or environment- supplied function. That function would do the
dispatch. I'd be happy to send a patch for this, if it makes sense.

Best,
Lionel


> On 21 May 2019, at 13:38, William Dunlap via R-devel  
> wrote:
> 
> It also is a problem with storage.modes "integer" and "complex":
> 
> 3.6.0> print.integer <- function(x,...) "integer vector"
>3.6.0> 1:10
> [1]  1  2  3  4  5  6  7  8  9 10
> 3.6.0> print(1:10)
> [1] "integer vector"
> 3.6.0>
> 3.6.0> print.complex <- function(x, ...) "complex vector"
> 3.6.0> 1+2i
> [1] 1+2i
> 3.6.0> print(1+2i)
> [1] "complex vector"
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> 
> 
> On Tue, May 21, 2019 at 9:31 AM Martin Maechler 
> wrote:
> 
>>> William Dunlap via R-devel
>>>on Thu, 16 May 2019 11:56:45 -0700 writes:
>> 
>>> In R-3.6.0 autoprinting was changed so that print methods for the
>> storage
>>> modes are not called when there is no explicit class attribute.
>> E.g.,
>> 
>>> % R-3.6.0 --vanilla --quiet
 print.function <- function(x, ...) { cat("Function with argument
>> list ");
>>> cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
 f <- function(x, ...) { sum( x * seq_along(x) ) }
 f
>>> function(x, ...) { sum( x * seq_along(x) ) }
 print(f)
>>> Function with argument list function (x, ...)
>> 
>>> Previous to R-3.6.0 autoprinting did call such methods
>>> % R-3.5.3 --vanilla --quiet
 print.function <- function(x, ...) { cat("Function with argument
>> list ");
>>> cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
 f <- function(x, ...) { sum( x * seq_along(x) ) }
 f
>>> Function with argument list function (x, ...)
 print(f)
>>> Function with argument list function (x, ...)
>> 
>>> Was this intentional?
>> 
>> No, it was not.  ... and I've been the one committing the wrong change.
>> 
>> ... Related to the NEWS entries which start
>> 
>> "Changes in print.*() "
>> 
>> Thank you Bill, for reporting
>> 
>> It's amazing this has not been detected earlier by anybody.
>> 
>> I think it is *only* for functions, not general
>> print.() as you were suggesting - right?
>> 
>> Martin
>> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] anyNA() performance on vectors of POSIXct

2019-05-21 Thread Harvey Smith

 I think there was a similar discussion to this when I raised the issue of
interpreting the sort order for an object versus its underlying type.  In
this anyNA example it is the is.na for the object versus the is.na for the
type, whereas in the discussion below, which Gabriel Becker raised, it was
the sort ordering.  They seem to be related when vectors of POSIXct are
handled as objects instead of the underlying numeric type.

So looking at this, it is because is.object(x.posixct) returns true, which
means sort.default does x[order(x, )], which ALTREP is not
currently (and may not ever be?) smart enough to catch on its own and know
is sorted.

Its true we could add something after that to wrap it in what is called a
wrapper altrep which would know it's sorted, but we don't do that currently
now and I'm not sure we actually should in the general case. I'm not
convinced its safe to assume an object class' defined ordering will match
the ordering of an underlying double/int representation. I believe we ran
into something similar with deferred sting conversions from integers (I
think, possibly doubles) where the int had sortedness information but that
wasn't correct for the *character vector *the ALTREP ultimately
represented.

http://r.789695.n4.nabble.com/unsorted-suggestion-for-performance-improvement-and-ALTREP-support-for-POSIXct-td4754634.html


On Tue, May 21, 2019 at 12:04 PM Martin Maechler 
wrote:

> > Harvey Smith
> > on Wed, 1 May 2019 03:20:55 -0400 writes:
>
> > Inside of the anyNA() function, it will use the legacy any(is.na())
> code if
> > x is an OBJECT().  If x is a vector of POSIXct, it will be an
> OBJECT(), but
> > it is also TYPEOF(x) == REALSXP.  Therefore, it will skip the faster
> > ITERATE_BY_REGION, which is typically 5x faster in my testing.
>
> > Is the OBJECT() condition really necessary, or could it be moved
> after the
> > switch() for the individual TYPEOF(x) ITERATE_BY_REGION calls?
>
>  "necessary ?" :  yes, in the following sense :
>
> When it was introduced, the idea of anyNA(.) has been that it
> should be equivalent (but often faster) than  any(is.na(.)).
> As anyNA() was only introduced quite recently (*)
> and many (S3 and S4) classes have had  is.na() methods defined
> for them but -- initially at least -- not an anyNA().
>
> So to ensure  the equivalenceanyNA(x)  ===   any(is.na(x))
> for "all" R objects 'x', that OBJECT(.) condition had been
> important and necessary.
>
> Still, being the person who had added  anyNA() to R,
> I'm naturally sympathetic to have it faster in cases such as
> "Date" or "POSIXct" objects.
>
> I'd find it ugly to test for these classes specifically in the C code (via
> the equivalent of  inherits(., "POSIXct")
>   {{ *NOT* via the really wrong  class(.)[[1]] == "POSIXct"
>   that I see in some "experts" R code, because that fails
>   for all class extensions ! }}
> but that may still be an option;
>
> Yet alternatively, one *could* consider changing the API and
> declare that for atomic types with a class {i.e. OBJECT(.)}, and
> *if* there is no anyNA() method, anyNA() will use the "atomic"
> fast method, instead of using any(is.na(.)).
>
> This may break existing code in packages, but the maintainers of
> that code could solve the problems by providing  anyNA(.)
> methods for their objects.
>
> Other opinions / ideas ?
>
> Martin Maechler
> ETH Zurich / R Core Team
>
>
> --
> *) in Spring 2013, but too late for R 3.0.0;
>"recently", considering R's history starting with S in the early 1980's
>
>
> > # script to demonstrate performance difference if x is an OBJECT or not
> by
> > using unclass()
> > x.posixct = Sys.time() + 1:1e6
> > microbenchmark::microbenchmark(
> >   any(is.na( x.posixct )),
> >   anyNA( x.posixct ),
> >   anyNA( unclass(x.posixct) ),
> >   unit='ms')
> >
> >
> >
> > static Rboolean anyNA(SEXP call, SEXP op, SEXP args, SEXP env)
> > {
> >   SEXP x = CAR(args);
> >   SEXPTYPE xT = TYPEOF(x);
> >   Rboolean isList =  (xT == VECSXP || xT == LISTSXP), recursive = FALSE;
> >
> >   if (isList && length(args) > 1) recursive = asLogical(CADR(args));
> >   *if (OBJECT(x) || (isList && !recursive)) {*
> > SEXP e0 = PROTECT(lang2(install("is.na"), x));
> > SEXP e = PROTECT(lang2(install("any"), e0));
> > SEXP res = PROTECT(eval(e, env));
> > int ans = asLogical(res);
> > UNPROTECT(3);
> > return ans == 1; // so NA answer is false.
> >   }
> >
> >   R_xlen_t i, n = xlength(x);
> >   switch (xT) {
> > case REALSXP:
> > {
> >   if(REAL_NO_NA(x))
> > return FALSE;
> >   ITERATE_BY_REGION(x, xD, i, nbatch, double, REAL, {
> > for (int k = 0; k < nbatch; k++)
> >   if (ISNAN(xD[k]))
> > return TRUE;
> >   });
> >   break;
> > }
> >
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/

Re: [Rd] print.() not called when autoprinting

2019-05-21 Thread William Dunlap via R-devel

Letting a user supply the autoprint function would be nice also.  In a way
you can already do that, using addTaskCallback(), but that doesn't let you
suppress the standard autoprinting.

Having the default autoprinting do its own style of method dispatch doesn't
seem right.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, May 21, 2019 at 10:50 AM Lionel Henry  wrote:

> FWIW it was the intention of the patch to make printing of unclassed
> functions consistent with other base types. This was documented in the
> "patch 3" section:
>
> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17398
>
> I think we need a general way to customise auto-printing for base types
> and even classed objects as that'd be useful for both users and IDEs.
>
> However S3 dispatch may not be optimal for this because it essentially
> requires polluting the global environment with print methods. Maybe
> it'd make sense to add getOption("autoprint") which should be set to
> a user- or environment- supplied function. That function would do the
> dispatch. I'd be happy to send a patch for this, if it makes sense.
>
> Best,
> Lionel
>
>
> > On 21 May 2019, at 13:38, William Dunlap via R-devel <
> r-devel@r-project.org> wrote:
> >
> > It also is a problem with storage.modes "integer" and "complex":
> >
> > 3.6.0> print.integer <- function(x,...) "integer vector"
> >3.6.0> 1:10
> > [1]  1  2  3  4  5  6  7  8  9 10
> > 3.6.0> print(1:10)
> > [1] "integer vector"
> > 3.6.0>
> > 3.6.0> print.complex <- function(x, ...) "complex vector"
> > 3.6.0> 1+2i
> > [1] 1+2i
> > 3.6.0> print(1+2i)
> > [1] "complex vector"
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> >
> > On Tue, May 21, 2019 at 9:31 AM Martin Maechler <
> maech...@stat.math.ethz.ch>
> > wrote:
> >
> >>> William Dunlap via R-devel
> >>>on Thu, 16 May 2019 11:56:45 -0700 writes:
> >>
> >>> In R-3.6.0 autoprinting was changed so that print methods for the
> >> storage
> >>> modes are not called when there is no explicit class attribute.
> >> E.g.,
> >>
> >>> % R-3.6.0 --vanilla --quiet
>  print.function <- function(x, ...) { cat("Function with argument
> >> list ");
> >>> cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
>  f <- function(x, ...) { sum( x * seq_along(x) ) }
>  f
> >>> function(x, ...) { sum( x * seq_along(x) ) }
>  print(f)
> >>> Function with argument list function (x, ...)
> >>
> >>> Previous to R-3.6.0 autoprinting did call such methods
> >>> % R-3.5.3 --vanilla --quiet
>  print.function <- function(x, ...) { cat("Function with argument
> >> list ");
> >>> cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
>  f <- function(x, ...) { sum( x * seq_along(x) ) }
>  f
> >>> Function with argument list function (x, ...)
>  print(f)
> >>> Function with argument list function (x, ...)
> >>
> >>> Was this intentional?
> >>
> >> No, it was not.  ... and I've been the one committing the wrong change.
> >>
> >> ... Related to the NEWS entries which start
> >>
> >> "Changes in print.*() "
> >>
> >> Thank you Bill, for reporting
> >>
> >> It's amazing this has not been detected earlier by anybody.
> >>
> >> I think it is *only* for functions, not general
> >> print.() as you were suggesting - right?
> >>
> >> Martin
> >>
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] most robust way to call R API functions from a secondary thread

2019-05-21 Thread Andreas Kersting

Hi Simon,

Your response hits the mark of why I am doing it that way rather than going 
with what Stepan proposed. Also good to hear that you consider my analysis to 
be pretty complete. Thanks for the feedback!

Regards,
Andreas

2019-05-20 15:54 GMT+02:00 Simon Urbanek:
> Stepan,
>
> Andreas gave a lot more thought into what you question in your reply. His 
> question was how you can avoid what you where proposing and have proper 
> threading under safe conditions. Having dealt with this before, I think 
> Andreas' write up is pretty much the most complete analysis I have seen. I'd 
> wait for Luke to chime in as the ultimate authority if he gets to it.
>
> The "classic" approach which you mention is to collect and allocate 
> everything, then execute parallel code and then return. What Andres is 
> proposing is obviously much more efficient: you only synchronize on R API 
> calls which are likely a small fraction on the entire time while you keep all 
> threads alive. His question was how to do that safely. (BTW: I really like 
> the touch of counting frames that toplevel exec can use ;) - it may make 
> sense to deal with that edge-case in R if we can ...).
>
> Cheers,
> Simon
>
>
>
>
>> On May 20, 2019, at 5:45 AM, Stepan  wrote:
>>
>> Hi Andreas,
>>
>> note that with the introduction of ALTREP, as far as I understand, calls as 
>> "simple" as DATAPTR can execute arbitrary code (R or native). Even without 
>> ALTREP, if you execute user-provided R code via Rf_eval and such on some 
>> custom thread, you may end up executing native code of some package, which 
>> may assume it is executed only from the R main thread.
>>
>> Could you (1) decompose your problem in a way that in some initial phase you 
>> pull all the necessary data from R, then start the parallel computation, and 
>> then again in the R main thread "submit" the results back to the R world?
>>
>> If you wanted something really robust, you can (2) "send" the requests for R 
>> API usage to the R main thread and pause the worker thread until it receives 
>> the results back. This looks similar to what the "later" package does. Maybe 
>> you can even use that package for your purposes?
>>
>> Do you want to parallelize your code to achieve better performance? Even 
>> with your proposed solution, you need synchronization and chances are that 
>> excessive synchronization will severely affect the expected performance 
>> benefits of parallelization. If you do not need to synchronize that much, 
>> then the question is if you can do with (1) or (2).
>>
>> Best regards,
>> Stepan
>>
>> On 19/05/2019 11:31, Andreas Kersting wrote:
>>> Hi,
>>> As the subject suggests, I am looking for the most robust way to call an 
>>> (arbitrary) function from the R API from another but the main POSIX thread 
>>> in a package's code.
>>> I know that, "[c]alling any of the R API from threaded code is ‘for experts 
>>> only’ and strongly discouraged. Many functions in the R API modify internal 
>>> R data structures and might corrupt these data structures if called 
>>> simultaneously from multiple threads. Most R API functions can signal 
>>> errors, which must only happen on the R main thread." 
>>> (https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-2Dproject.org_doc_manuals_r-2Drelease_R-2Dexts.html-23OpenMP-2Dsupport&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=neKFCw86thQe2E2-61NAgpDMw4cC7oD_tUTTzraOkQM&m=d1r2raD4w0FF7spOVuz2IVEo0P_II3ZtSbw0TU2NmaE&s=JaadZR_m-QiJ3BQzzQ_fJPYt034tM5Ts6vKhdi6f__A&e=)
>>> Let me start with my understanding of the related issues and possible 
>>> solutions:
>>> 1) R API functions are generally not thread-safe and hence one must ensure, 
>>> e.g. by using mutexes, that no two threads use the R API simultaneously
>>> 2) R uses longjmps on error and interrupts as well as for condition 
>>> handling and it is undefined behaviour to do a longjmp from one thread to 
>>> another; interrupts can be suspended before creating the threads by setting 
>>> R_interrupts_suspended = TRUE; by wrapping the calls to functions from the 
>>> R API with R_ToplevelExec(), longjmps across thread boundaries can be 
>>> avoided; the only reason for R_ToplevelExec() itself to fail with an 
>>> R-style error (longjmp) is a pointer protection stack overflow
>>> 3) R_CheckStack() might be executed (indirectly), which will (probably) 
>>> signal a stack overflow because it only works correctly when called form 
>>> the main thread (see 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-2Dproject.org_doc_manuals_r-2Drelease_R-2Dexts.html-23Threading-2Dissues&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=neKFCw86thQe2E2-61NAgpDMw4cC7oD_tUTTzraOkQM&m=d1r2raD4w0FF7spOVuz2IVEo0P_II3ZtSbw0TU2NmaE&s=J_TMw2gu43dxB_EX2vHbtF4Zr4bIAFR8RSFzvbRV6jE&e=);
>>>  in particular, any function that does allocations, e.g. via allocVector3() 
>>> might end up calling it via GC -> finalizer -> ... -> eval; the only way 
>>> around this p

Re: [Rd] [External] most robust way to call R API functions from a secondary thread

2019-05-21 Thread Andreas Kersting

Hi Luke,

Thanks also for your feedback! I will then follow the proposed route for the 
problem at hand and I will report back if I encounter any issues. 

I am also going look into the issues of stack checking and R_ToplevelExec.

Regards,
Andreas

2019-05-20 19:29 GMT+02:00 Tierney, Luke:
> Your analysis looks pretty complete to me and your solutions seemsplausible.  
> That said, I don't know that I would have the level of
> confidence yet that we haven't missed an important point that I would
> want before going down this route.
> 
> Losing stack checking is risky; it might be eventually possible to
> provide some support for this to be handled via a thread-local
> variable. Ensuring that R_ToplevelExec can't jump before entering the
> body function would be a good idea; if you want to propose a patch we
> can have a look.
> 
> Best,
> 
> luke
> 
> On Sun, 19 May 2019, Andreas Kersting wrote:
> 
>> Hi,
>>
>> As the subject suggests, I am looking for the most robust way to call an 
>> (arbitrary) function from the R API from another but the main POSIX thread 
>> in a package's code.
>>
>> I know that, "[c]alling any of the R API from threaded code is ‘for experts 
>> only’ and strongly discouraged. Many functions in the R API modify internal 
>> R data structures and might corrupt these data structures if called 
>> simultaneously from multiple threads. Most R API functions can signal 
>> errors, which must only happen on the R main thread." 
>> (https://cran.r-project.org/doc/manuals/r-release/R-exts.html#OpenMP-support)
>>
>> Let me start with my understanding of the related issues and possible 
>> solutions:
>>
>> 1) R API functions are generally not thread-safe and hence one must ensure, 
>> e.g. by using mutexes, that no two threads use the R API simultaneously
>>
>> 2) R uses longjmps on error and interrupts as well as for condition handling 
>> and it is undefined behaviour to do a longjmp from one thread to another; 
>> interrupts can be suspended before creating the threads by setting 
>> R_interrupts_suspended = TRUE; by wrapping the calls to functions from the R 
>> API with R_ToplevelExec(), longjmps across thread boundaries can be avoided; 
>> the only reason for R_ToplevelExec() itself to fail with an R-style error 
>> (longjmp) is a pointer protection stack overflow
>>
>> 3) R_CheckStack() might be executed (indirectly), which will (probably) 
>> signal a stack overflow because it only works correctly when called form the 
>> main thread (see 
>> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Threading-issues);
>>  in particular, any function that does allocations, e.g. via allocVector3() 
>> might end up calling it via GC -> finalizer -> ... -> eval; the only way 
>> around this problem which I could find is to adjust R_CStackLimit, which is 
>> outside of the official API; it can be set to -1 to disable the check or be 
>> changed to a value appropriate for the current thread
>>
>> 4) R sets signal handlers for several signals and some of them make use of 
>> the R API; hence, issues 1) - 3) apply; signal masks can be used to block 
>> delivery of signals to secondary threads in general and to the main thread 
>> while other threads are using the R API
>>
>>
>> I basically have the following questions:
>>
>> a) Is my understanding of the issues accurate?
>> b) Are there more things to consider when calling the R API from secondary 
>> threads?
>> c) Are the solutions proposed appropriate? Are there scenarios in which they 
>> will fail to solve the issue? Or might they even cause new problems?
>> d) Are there alternative/better solutions?
>>
>> Any feedback on this is highly appreciated.
>>
>> Below you can find a template which, combines the proposed solutions (and 
>> skips all non-illustrative checks of return values). Additionally, 
>> R_CheckUserInterrupt() is used in combination with R_UnwindProtect() to 
>> regularly check for interrupts from the main thread, while still being able 
>> to cleanly cancel the threads before fun_running_in_main_thread() is left 
>> via a longjmp. This is e.g. required if the secondary threads use memory 
>> which was allocated in fun_running_in_main_thread() using e.g. R_alloc().
>>
>> Best regards,
>> Andreas Kersting
>>
>>
>>
>> #include 
>> #include 
>> #include 
>> #include 
>>
>> extern uintptr_t R_CStackLimit;
>> extern int R_PPStackTop;
>> extern int R_PPStackSize;
>>
>> #include 
>> LibExtern Rboolean R_interrupts_suspended;
>> LibExtern int R_interrupts_pending;
>> extern void Rf_onintr(void);
>>
>> // mutex for exclusive access to the R API:
>> static pthread_mutex_t r_api_mutex = PTHREAD_MUTEX_INITIALIZER;
>>
>> // a wrapper arround R_CheckUserInterrupt() which can be passed to 
>> R_UnwindProtect():
>> SEXP check_interrupt(void *data) {
>>  R_CheckUserInterrupt();
>>  return R_NilValue;
>> }
>>
>> // a wrapper arround Rf_onintr() which can be passed to R_UnwindProtect():
>> SEXP my_onintr(void *data) {
>>  Rf_onintr();
>>

Re: [Rd] anyNA() performance on vectors of POSIXct

[Rd] Patch to replace "his" in Writing R Extensions

Re: [Rd] [External] Patch to replace "his" in Writing R Extensions

Re: [Rd] anyNA() performance on vectors of POSIXct

Re: [Rd] print.() not called when autoprinting

Re: [Rd] print.() not called when autoprinting

Re: [Rd] print.() not called when autoprinting

Re: [Rd] anyNA() performance on vectors of POSIXct

Re: [Rd] print.() not called when autoprinting

Re: [Rd] most robust way to call R API functions from a secondary thread

Re: [Rd] [External] most robust way to call R API functions from a secondary thread

11 matches

Site Navigation

Mail list logo

Footer information