[Rd] Interrupting C++ code execution

2011-04-25 Thread schattenpflanze

Hello,

I am writing an R interface for one of my C++ programs. The computations 
in C++ are very time consuming (several hours), so the user needs to be 
able to interrupt them. Currently, the only way I found to do so is 
calling R_CheckUserInterrupt() frequently. Unfortunately, there are 
several problems with that:


1. Calling R_CheckUserInterrupt() interrupts immediately, so I have no 
possibility to exit my code gracefully. In particular, I suppose that 
objects created on the heap (e.g., STL containers) are not destructed 
properly.


2. Calling R_CheckUserInterrupt() within a parallel OpenMP loop causes 
memory corruptions. Even if I do so within a critical section, it 
usually results in segfaults, crashes, or invalid variable contents 
afterwards. I suppose this is due to the threads not being destroyed 
properly. Since most of the time critical computations are done in 
parallel, this means I can hardly interrupt anything.


Having a function similar to R_CheckUserInterrupt() but returning a 
boolean variable (has an interrupt occurred or not?) would solve these 
problems. Is there a way to find out about user interrupt requests (the 
user pressing ctrl+c or maybe a different set of keys) without 
interrupting immediately?


I would appreciate your advice on this topic.


Best regards,
Peter

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Interrupting C++ code execution

2011-04-25 Thread schattenpflanze

Thank you for your response, Simon.


1. Calling R_CheckUserInterrupt() interrupts immediately, so I have
no possibility to exit my code gracefully. In particular, I suppose
that objects created on the heap (e.g., STL containers) are not
destructed properly.

In general, you're responsible for the cleanup. See R-devel archives
for discussion on the interactions of C++ and R error handling.
Generally, you should not use local objects and you should use
on.exit to make sure you clean up.
I am using Rcpp (Rcpp-modules, to be precise). This means, I do actually 
not write any R code. Moreover, the C++ code does not use the R API. My 
C++ functions are 'exposed' to R via Rcpp, which creates suitable S4 
classes. Rcpp does the exception handling.
In particular, there is no obvious possibility for me to add an 
'on.exit' statement to a particular exposed C++ method.



Generally, you should not use local objects
We are talking about large amounts of code, dozens of nested function 
calls, and even external libraries. So "not using local objects" is 
definitely no option.



2. Calling R_CheckUserInterrupt() within a parallel OpenMP loop
causes memory corruptions. Even if I do so within a critical
section, it usually results in segfaults, crashes, or invalid
variable contents afterwards. I suppose this is due to the threads
not being destroyed properly. Since most of the time critical
computations are done in parallel, this means I can hardly
interrupt anything.

As you know R is not thread-safe so you cannot call any R API from a
thread - including OMP threads - so obviously you can't call
R_CheckUserInterrupt().
That is very interesting. Not being thread safe does not necessarily 
imply that a function cannot be called from within a thread (as long as 
it is not done concurrently from several threads). In particular, the 
main program itself is also a thread, isn't it?
Since no cleanup is done, however, it is now clear that calling 
R_CheckUserInterrupt() _anywhere_ in my program, parallel section or 
not, is a bad idea.



Since you're using threads the safe way is to
perform your computations on a separate thread and let R handle
events so that you can abort your computation thread as part of
on.exit.
Starting the computations in a separate thread is a nice idea. I could 
then call R_CheckUserInterrupt() every x milliseconds in the function 
which dispatches the worker thread. Unfortunately, I see no obvious way 
of adding an "on.exit" statement to an Rcpp module method. So I would 
probably have to call an R function from C++ (e.g., using RInside) which 
contains the on.exit statement, which in turn calls again a C++ function 
setting a global 'abort' flag and waits for the threads to be 
terminated. Hmmm.


How does on.exit work? Could I mimic that behaviour directly in C++?


Having a function similar to R_CheckUserInterrupt() but returning a
boolean variable (has an interrupt occurred or not?) would solve
these problems. Is there a way to find out about user interrupt
requests (the user pressing ctrl+c or maybe a different set of
keys) without interrupting immediately?

Checking for interrupts may involve running the OS event loop (to
allow the user to interact with R) and thus is not guaranteed to
return.

I see.


There is no general solution - if you're worried only about
your, local code, then on unix, for example, you could use custom
signal handlers to set a flag and co-operatively interrupt your
program. On Windows there is the UserBreak flag which can be set by a
separate thread and thus you may check on it. That said, all this is
very much platform-specific.
Being able to set a flag is all I need and would be the perfect solution 
imho. However, I do not yet see how I could achieve that.


How can I write a signal handler within C++ code which does not create a 
GUI and has no dedicated event dispatching thread?
Would it be possible to use, e.g., a Qt keyboard event handler within 
the C++ code? Would a keyboard event be visible to such an event 
handler? Is it not intercepted by R / the terminal window / the OS?


Does any existing R package contain signal handlers?


Best regards,
Peter

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Interrupting C++ code execution

2011-04-25 Thread schattenpflanze

Dear Simon,

thanks again for your explanations. Your previous e-mail clarified 
several points for me.



Actually, it just came to me that there is a hack you could use. [...]
That actually looks quite nice. At least when compared to my currently 
only alternative of not interrupting at all. I will test it, in 
particular with respect to computational speed. Perhaps I can at least 
call it once per second.


Best regards,
Peter



The
problem with it is that it will eat all errors, even if they were not
yours (e.g. those resulting from events triggered the event loop), so
I would not recommend it for general use. But here we go:

static void chkIntFn(void *dummy) { R_CheckUserInterrupt(); }

// this will call the above in a top-level context so it won't
longjmp-out of your context bool checkInterrupt() { return
(R_ToplevelExec(chkIntFn, NULL) == FALSE); }

// your code somewhere ... if (checkInterrupt()) { // user
interrupted ... }

You must call it on the main thread and you should be prepared that
it may take some time and may interact with the OS...

Cheers, Simon


On Apr 25, 2011, at 12:23 PM, Simon Urbanek wrote:



On Apr 25, 2011, at 11:09 AM, schattenpfla...@arcor.de wrote:


Thank you for your response, Simon.


1. Calling R_CheckUserInterrupt() interrupts immediately, so
I have no possibility to exit my code gracefully. In
particular, I suppose that objects created on the heap (e.g.,
STL containers) are not destructed properly.

In general, you're responsible for the cleanup. See R-devel
archives for discussion on the interactions of C++ and R error
handling. Generally, you should not use local objects and you
should use on.exit to make sure you clean up.

I am using Rcpp (Rcpp-modules, to be precise). This means, I do
actually not write any R code. Moreover, the C++ code does not
use the R API. My C++ functions are 'exposed' to R via Rcpp,
which creates suitable S4 classes. Rcpp does the exception
handling. In particular, there is no obvious possibility for me
to add an 'on.exit' statement to a particular exposed C++
method.


Generally, you should not use local objects

We are talking about large amounts of code, dozens of nested
function calls, and even external libraries. So "not using local
objects" is definitely no option.



But that would imply that the library calls R! Note that we're
talking about the stack at the point of R API call, so you can do
what you want until you cal R API. At the moment you touch R API
you should have no local C++ objects on the stack (all the way
down) - that's what I meant.



2. Calling R_CheckUserInterrupt() within a parallel OpenMP
loop causes memory corruptions. Even if I do so within a
critical section, it usually results in segfaults, crashes,
or invalid variable contents afterwards. I suppose this is
due to the threads not being destroyed properly. Since most
of the time critical computations are done in parallel, this
means I can hardly interrupt anything.

As you know R is not thread-safe so you cannot call any R API
from a thread - including OMP threads - so obviously you can't
call R_CheckUserInterrupt().

That is very interesting. Not being thread safe does not
necessarily imply that a function cannot be called from within a
thread (as long as it is not done concurrently from several
threads). In particular, the main program itself is also a
thread, isn't it?


Yes, but each thread has a separate stack, and you can only enter R
with the same stack you left (because the stack will be restored to
the state of the calling context).



Since no cleanup is done, however, it is now clear that calling
R_CheckUserInterrupt() _anywhere_ in my program, parallel section
or not, is a bad idea.


Since you're using threads the safe way is to perform your
computations on a separate thread and let R handle events so
that you can abort your computation thread as part of on.exit.

Starting the computations in a separate thread is a nice idea. I
could then call R_CheckUserInterrupt() every x milliseconds in
the function which dispatches the worker thread. Unfortunately, I
see no obvious way of adding an "on.exit" statement to an Rcpp
module method. So I would probably have to call an R function
from C++ (e.g., using RInside) which contains the on.exit
statement, which in turn calls again a C++ function setting a
global 'abort' flag and waits for the threads to be terminated.
Hmmm.

How does on.exit work?


It sets the conexit object of the current context structure to the
closure to be evaluated when the context is left. endcontext() then
simply evaluates that closure when the context is left.



Could I mimic that behaviour directly in C++?



Unfortunately there is no C-level onexit hook and the internal
structure of RCNTXT is not revealed to packages. So AFAICS the
closest you can get is to use eval to call on.exit().

However, I think it would be useful to have a provision for
creating a context with a C-level hook - the question is whether
the others have

Re: [Rd] Interrupting C++ code execution

2011-04-26 Thread schattenpflanze
I have tested the solutions suggested by Simon and Thomas on a Linux 
machine. These are my findings:



On Windows you can look at the variable "UserBreak", available from
Rembedded.h. Outside of Windows, you can look at R_interrupts_pending,
available from R_ext/GraphicsDevice.h. R_ext/GraphicsDevice.h also has
R_interrupts_suspended, which you may or may not want to take into account,
depending on your use-case.
I did not manage to get this to work. Neither R_interrupts_pending nor 
R_interrupts_suspended seem to change when I press ctrl+c. Perhaps this 
is due to the fact that I run R in a terminal without any graphical 
interface?



static void chkIntFn(void *dummy) {
  R_CheckUserInterrupt();
}
// this will call the above in a top-level context so it won't longjmp-out of 
your context
bool checkInterrupt() {
  return (R_ToplevelExec(chkIntFn, NULL) == FALSE);
}
// your code somewhere ...
if (checkInterrupt()) { // user interrupted ... }
This solution works perfectly! It takes slightly longer to call this 
function than the plan R_CheckUserInterrupt() call, but in any 
reasonable scenario, the additional time is absolutely insignificant.


Inside OpenMP parallel for constructs, one has to make sure that only 
the thread satisfying omp_get_thread_num()==0 makes the call (the 
'master' construct cannot be nested inside a loop). I can then set a 
flag, which is queried by every thread in every loop cycle, causing fast 
termination of the parallel loop. After the loop, I throw an exception. 
Thus, my code is terminated gracefully with minimal effort. I can do 
additional cleanup operations (which usually is not necessary, since I 
use smart pointers), and report details on the interrupt to the user.


With my limited testing, so far I have not noticed any downsides. Of 
course, there is the obvious drawback of not being supported officially 
(and thus maybe being subject to change), the question of portability, 
and the question of interoperability with other errors.


Moreover, I have found an old thread discussing almost the same topic:
http://tolstoy.newcastle.edu.au/R/e4/devel/08/05/1686.html .
The thread was created in 2008, so the issue is not really a new one. 
The solution proposed there is actually the same as the one suggested by 
Simon, namely using R_ToplevelExec().


An officially supported, portable solution would of course be much 
appreciated!



Best regards,
Peter

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel