[Rd] most robust way to call R API functions from a secondary thread

2019-05-19 Thread Andreas Kersting
Hi,

As the subject suggests, I am looking for the most robust way to call an 
(arbitrary) function from the R API from another but the main POSIX thread in a 
package's code.

I know that, "[c]alling any of the R API from threaded code is ‘for experts 
only’ and strongly discouraged. Many functions in the R API modify internal R 
data structures and might corrupt these data structures if called 
simultaneously from multiple threads. Most R API functions can signal errors, 
which must only happen on the R main thread." 
(https://cran.r-project.org/doc/manuals/r-release/R-exts.html#OpenMP-support)

Let me start with my understanding of the related issues and possible solutions:

1) R API functions are generally not thread-safe and hence one must ensure, 
e.g. by using mutexes, that no two threads use the R API simultaneously

2) R uses longjmps on error and interrupts as well as for condition handling 
and it is undefined behaviour to do a longjmp from one thread to another; 
interrupts can be suspended before creating the threads by setting 
R_interrupts_suspended = TRUE; by wrapping the calls to functions from the R 
API with R_ToplevelExec(), longjmps across thread boundaries can be avoided; 
the only reason for R_ToplevelExec() itself to fail with an R-style error 
(longjmp) is a pointer protection stack overflow

3) R_CheckStack() might be executed (indirectly), which will (probably) signal 
a stack overflow because it only works correctly when called form the main 
thread (see 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Threading-issues); 
in particular, any function that does allocations, e.g. via allocVector3() 
might end up calling it via GC -> finalizer -> ... -> eval; the only way around 
this problem which I could find is to adjust R_CStackLimit, which is outside of 
the official API; it can be set to -1 to disable the check or be changed to a 
value appropriate for the current thread

4) R sets signal handlers for several signals and some of them make use of the 
R API; hence, issues 1) - 3) apply; signal masks can be used to block delivery 
of signals to secondary threads in general and to the main thread while other 
threads are using the R API


I basically have the following questions:

a) Is my understanding of the issues accurate?
b) Are there more things to consider when calling the R API from secondary 
threads?
c) Are the solutions proposed appropriate? Are there scenarios in which they 
will fail to solve the issue? Or might they even cause new problems?
d) Are there alternative/better solutions?

Any feedback on this is highly appreciated.

Below you can find a template which, combines the proposed solutions (and skips 
all non-illustrative checks of return values). Additionally, 
R_CheckUserInterrupt() is used in combination with R_UnwindProtect() to 
regularly check for interrupts from the main thread, while still being able to 
cleanly cancel the threads before fun_running_in_main_thread() is left via a 
longjmp. This is e.g. required if the secondary threads use memory which was 
allocated in fun_running_in_main_thread() using e.g. R_alloc().

Best regards,
Andreas Kersting



#include 
#include 
#include 
#include 

extern uintptr_t R_CStackLimit;
extern int R_PPStackTop;
extern int R_PPStackSize;

#include 
LibExtern Rboolean R_interrupts_suspended;
LibExtern int R_interrupts_pending;
extern void Rf_onintr(void);

// mutex for exclusive access to the R API:
static pthread_mutex_t r_api_mutex = PTHREAD_MUTEX_INITIALIZER;

// a wrapper arround R_CheckUserInterrupt() which can be passed to 
R_UnwindProtect():
SEXP check_interrupt(void *data) {
  R_CheckUserInterrupt();
  return R_NilValue;
}

// a wrapper arround Rf_onintr() which can be passed to R_UnwindProtect():
SEXP my_onintr(void *data) {
  Rf_onintr();
  return R_NilValue;
}

// function called by R_UnwindProtect() to cleanup on interrupt
void cleanfun(void *data, Rboolean jump) {
  if (jump) {
// terminate threads cleanly ...
  }
}

void fun_calling_R_API(void *data) {
  // call some R API function, e.g. mkCharCE() ...
}

void *threaded_fun(void *td) {

  // ...

  pthread_mutex_lock(&r_api_mutex);

  // avoid false stack overflow error:
  intptr_t R_CStackLimit_old = R_CStackLimit;
  R_CStackLimit = -1;


  // R_ToplevelExec() below will call PROTECT 4x:
  if (R_PPStackTop > R_PPStackSize - 4) {
// ppstack would overflow in R_ToplevelExec() -> handle this ...
  }

  // avoid longjmp to different thread:
  Rboolean ok = R_ToplevelExec(fun_calling_R_API, (void *) &some_data);

  // re-enable stack size checking:
  R_CStackLimit = R_CStackLimit_old;
  pthread_mutex_unlock(&r_api_mutex);

  if (!ok) {
// handle error ...
  }

  // ...
}

SEXP fun_running_in_main_thread() {

  // ...

  /* create continuation token for R_UnwindProtect():
   *
   * do this explicitly here before the threads are created because this might
   * fail in allocation or with pointer protection stack overflow
   */
  SEXP

Re: [Rd] [R-pkg-devel] Three-argument S3method declaration does not seem to affect dispatching from inside the package.

2019-05-19 Thread Iñaki Ucar
On Sat, 18 May 2019 at 23:34, Pavel Krivitsky  wrote:
>
> > The issue here is that you are registering a non-standard name
> > (.gen.formula) for that generic and then defining what would be the
> > standard name (gen.formula) for... what purpose? IMHO, this is a bad
> > practice and should be avoided.
>
> The situation initially arose when I wanted to soft-deprecate calling a
> particular method by its full name in order to clean up the package's
> namespace.
>
> To use our working example, I wanted calls to gen.formula() to issue a
> deprecation warning, but calls to gen(formula) not to. The simplest way
> to do that that I could find was to create a function, say,
> .gen.formula() that would implement the method and declare it as the S3
> export, and modify gen.formula() to issue the warning before passing on
> to .gen.formula(). Then, direct calls to gen.formula() would produce a
> warning, but gen(formula) would by pass it.

IMO the simplest way to do this is to check who the caller was:

foo <- function(x) UseMethod("foo")
foo.bar <- function(x) {
  sc <- sys.call(-1)
  if (is.null(sc) || sc[[1]] != "foo")
.Deprecated(msg="Calling 'foo.bar' directly is deprecated")
}

x <- 1
class(x) <- "bar"

foo(x)  # silent
foo.bar(x)  # a warning is issued

> > > That is, for a call from inside a package, the order of precedence
> > > would be as follows:
> > >1. S3method() in that package's NAMESPACE.
> > >2. Appropriately named function in that package (exported or
> > > not).
> > >3. Appropriately named function in calling environment (which
> > > may be
> > >   GlobalEnv).
> > >4. S3method() in other loaded packages' NAMESPACEs.
> > >5. Appropriately named functions exported by other loaded
> > > packages'
> > >   NAMESPACEs.
> > >
> > > For a call from outside a package, the precedence is the same, but
> > > 1 and 2 are not relevant.
> > >
> > > As far as I can tell, this is the current behaviour except for the
> > > relative ordering of 1 and 2.
> >
> > Nope. Current behaviour (see details in ?UseMethod) is:
> >
> > "To support this, UseMethod and NextMethod search for methods in two
> > places: in the environment in which the generic function is called,
> > and in the registration data base for the environment in which the
> > generic is defined".
>
> Can you be more specific where the sequence above contradicts the
> current implementation (except for swapping 1 and 2)? As far as I can
> tell, it's just a more concrete description of what's in the
> documentation.

The description in the documentation means that point 3) in your list
goes always first, which automatically implies 2) if the generic is
defined in the same package.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R-pkg-devel] Three-argument S3method declaration does not seem to affect dispatching from inside the package.

2019-05-19 Thread Pavel Krivitsky
Hi, Inaki,

On Sun, 2019-05-19 at 16:59 +0200, Iñaki Ucar wrote:
> IMO the simplest way to do this is to check who the caller was:
> 
> foo <- function(x) UseMethod("foo")
> foo.bar <- function(x) {
>   sc <- sys.call(-1)
>   if (is.null(sc) || sc[[1]] != "foo")
> .Deprecated(msg="Calling 'foo.bar' directly is deprecated")
> }
> 
> x <- 1
> class(x) <- "bar"
> 
> foo(x)  # silent
> foo.bar(x)  # a warning is issued

f <- getS3method("foo","bar")
f(x) # spurious warning

foo.baz <- function(x) NextMethod("foo")
class(x) <- c("baz","bar")
foo(x) # spurious warning

Believe me, I spent a lot of time trying to get this to work, and I
tried even more sophisticated call stack alchemy, but people kept
getting false positives and negatives. (Take a look at my attempt in
the statnet.common package.)

> The description in the documentation means that point 3) in your list
> goes always first, which automatically implies 2) if the generic is
> defined in the same package.

Are you sure which package defines the generic matters? I've just ran
some tests with two packages and moving the generic around doesn't seem
to affect things: the calling function determines whose method is used.

It seems to me like there is no contradiction after all, except that I
propose that the registered method should take precedence within a
namespace.

The only situation in which it would change R's behaviour would be when
a package/namespace contains a function foo.bar() AND a NAMESPACE
containing S3method(foo,bar,not.foo.bar) AND calls foo() on objects of
type bar from inside the package. It is extremely unlikely to break any
existing code.

Best,
Pavel

-- 
Pavel Krivitsky
Lecturer in Statistics
National Institute of Applied Statistics Research Australia (NIASRA)
School of Mathematics and Applied Statistics | Building 39C Room 154
University of Wollongong NSW 2522 Australia
T +61 2 4221 3713
Web (NIASRA): http://niasra.uow.edu.au/index.html
Web (Personal): http://www.krivitsky.net/research
ORCID: -0002-9101-3362

NOTICE: This email is intended for the addressee named and may contain
confidential information. If you are not the intended recipient, please
delete it and notify the sender. Please consider the environment before
printing this email.
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R-pkg-devel] Three-argument S3method declaration does not seem to affect dispatching from inside the package.

2019-05-19 Thread Iñaki Ucar
On Sun, 19 May 2019 at 23:23, Pavel Krivitsky  wrote:
>
> Hi, Inaki,
>
> On Sun, 2019-05-19 at 16:59 +0200, Iñaki Ucar wrote:
> > IMO the simplest way to do this is to check who the caller was:
> >
> > foo <- function(x) UseMethod("foo")
> > foo.bar <- function(x) {
> >   sc <- sys.call(-1)
> >   if (is.null(sc) || sc[[1]] != "foo")
> > .Deprecated(msg="Calling 'foo.bar' directly is deprecated")
> > }
> >
> > x <- 1
> > class(x) <- "bar"
> >
> > foo(x)  # silent
> > foo.bar(x)  # a warning is issued
>
> f <- getS3method("foo","bar")
> f(x) # spurious warning
>
> foo.baz <- function(x) NextMethod("foo")
> class(x) <- c("baz","bar")
> foo(x) # spurious warning

Checking the enclosing environment and whether was called through
NextMethod respectively covers these cases too.

> > The description in the documentation means that point 3) in your list
> > goes always first, which automatically implies 2) if the generic is
> > defined in the same package.
>
> Are you sure which package defines the generic matters? I've just ran
> some tests with two packages and moving the generic around doesn't seem
> to affect things: the calling function determines whose method is used.

If package A defines generic foo and package B defines method foo.bar
without registering nor exporting it, then foo can't find foo.bar.

> It seems to me like there is no contradiction after all, except that I
> propose that the registered method should take precedence within a
> namespace.
>
> The only situation in which it would change R's behaviour would be when
> a package/namespace contains a function foo.bar() AND a NAMESPACE
> containing S3method(foo,bar,not.foo.bar) AND calls foo() on objects of
> type bar from inside the package. It is extremely unlikely to break any
> existing code.

To try to avoid changing current behaviour if foo.bar is found, R
would need to check whether the enclosing environment is identical to
the enclosing environment of the registered method, and in that case,
give precedence to the latter (which, BTW, is exactly what you need to
do to fix the first spurious warning above).

And still, funny things may happen. For example, pkgA defines generic
foo, exports foo.bar and registers other.foo.bar instead of foo.bar.
Following your proposal, if I load pkgA and call foo for an object of
class bar, other.foo.bar is called. Then I load pkgB, which registers
just another method for foo.bar, and call foo again. What happens is
that the registered method belongs now to pkgB, which is a different
namespace, so we got different precedence, and foo.bar is called
instead.

Exceptions leads us to inconsistencies like this. I can't speak for R
core, but I don't think that the use case is compelling enough to take
that path.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Give update.formula() an option not to simplify or reorder the result -- request for comments

2019-05-19 Thread Abby Spurdle
Hi Pavel
(Back On List)

And my two cents...

> At this time, the update.formula() method always performs a number of
> transformations on the results, eliminating redundant variables and
> reordering interactions to be after the main effects.
> This the proposal is to add an option simplify= (defaulting to TRUE,
> for backwards compatibility) that if FALSE will skip the simplification
> step.
> Any thoughts? One particular question that Martin raised is whether the
> UI should be just a single logical argument, or something else.

Firstly, note that the constructor for formula objects behaves differently
to the update method, so I think any changes should be consistent between
the two functions.
> #constructor - doesn't simplify
> y ~ x + x
y ~ x + x
> #update method - does simplify
> update (y ~ x, ~. + x)
y ~ x

Interestingly, this doesn't simplify.
> update (y ~ I (x), ~. + x)
y ~ I(x) + x

I think that simplification could mean different things.
So, there could be something like:
> update (y ~ x, ~. + x, strip=FALSE)
y ~ I (2 * x)

I don't know how easy that would be to implement.
(Symbolic computation on par with computer algebra systems is a discussion
in itself...).
And you could have one argument (say, method="simplify") rather than two or
more logical arguments.

It would also be possible to allow partial forms of simplification, by
specifying which terms should be collapsed, however, I doubt any possible
usefulness of this, would justify the complexity.
However, feel free to disagree.

You made an interesting comment.

> This is not
> always the desired behavior, because formulas are increasingly used
> for purposes other than specifying linear models.

Can I ask what these purposes are?


kind regards
Abs

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel