Re: [Rd] save.image Non-responsive to Interrupt

2023-05-02 Thread Ivan Krylov
В Sat, 29 Apr 2023 00:00:02 +
Dario Strbenac via R-devel  пишет:

> Could save.image() be redesigned so that it promptly responds to
> Ctrl+C? It prevents the command line from being used for a number of
> hours if the contents of the workspace are large.

This is ultimately caused by serialize() being non-interruptible. A
relatively simple way to hang an R session for a long-ish time would
therefore be:

f <- xzfile(nullfile(), 'a+b')
x <- rep(0, 1e9) # approx. 8 gigabytes, adjust for your RAM size
serialize(x, f)
close(f)

This means that calling R_CheckUserInterrupt() between saving
individual objects is not enough: R also needs to check for interrupts
while saving sufficiently long vectors.

Since the serialize() infrastructure is carefully written to avoid
resource leaks on allocation failures, it looks relatively safe to
liberally sprinkle R_CheckUserInterrupt() where it makes sense to do
so, i.e. once per WriteItem() (which calls itself recursively and
non-recursively) and once per every downstream for loop iteration.
Valgrind doesn't show any new leaks if I apply the patch, interrupt
serialize() and then exit. R also passes make check after the applied
patch.

Do these changes make sense, or am I overlooking some other problem?

-- 
Best regards,
Ivan
--- src/main/serialize.c	(revision 84337)
+++ src/main/serialize.c	(working copy)
@@ -879,8 +879,10 @@
 R_xlen_t len = XLENGTH(s);
 OutInteger(stream, 0); /* place holder to allow names if we want to */
 WriteLENGTH(stream, s);
-for (R_xlen_t i = 0; i < len; i++)
+for (R_xlen_t i = 0; i < len; i++) {
+	R_CheckUserInterrupt();
 	WriteItem(STRING_ELT(s, i), ref_table, stream);
+}
 }
 
 #include 
@@ -901,6 +903,7 @@
 	R_xlen_t done, this;
 	XDR xdrs;
 	for (done = 0; done < length; done += this) {
+	R_CheckUserInterrupt();
 	this = min2(CHUNK_SIZE, length - done);
 	xdrmem_create(&xdrs, buf, (int)(this * sizeof(int)), XDR_ENCODE);
 	for(int cnt = 0; cnt < this; cnt++)
@@ -916,6 +919,7 @@
 	/* write in chunks to avoid overflowing ints */
 	R_xlen_t done, this;
 	for (done = 0; done < length; done += this) {
+	R_CheckUserInterrupt();
 	this = min2(CHUNK_SIZE, length - done);
 	stream->OutBytes(stream, INTEGER(s) + done,
 			 (int)(sizeof(int) * this));
@@ -923,8 +927,10 @@
 	break;
 }
 default:
-	for (R_xlen_t cnt = 0; cnt < length; cnt++)
+	for (R_xlen_t cnt = 0; cnt < length; cnt++) {
 	OutInteger(stream, INTEGER(s)[cnt]);
+	R_CheckUserInterrupt();
+	}
 }
 }
 
@@ -938,6 +944,7 @@
 	R_xlen_t done, this;
 	XDR xdrs;
 	for (done = 0; done < length; done += this) {
+	R_CheckUserInterrupt();
 	this = min2(CHUNK_SIZE, length - done);
 	xdrmem_create(&xdrs, buf, (int)(this * sizeof(double)), XDR_ENCODE);
 	for(int cnt = 0; cnt < this; cnt++)
@@ -952,6 +959,7 @@
 {
 	R_xlen_t done, this;
 	for (done = 0; done < length; done += this) {
+	R_CheckUserInterrupt();
 	this = min2(CHUNK_SIZE, length - done);
 	stream->OutBytes(stream, REAL(s) + done,
 			 (int)(sizeof(double) * this));
@@ -959,8 +967,10 @@
 	break;
 }
 default:
-	for (R_xlen_t cnt = 0; cnt < length; cnt++)
+	for (R_xlen_t cnt = 0; cnt < length; cnt++) {
+	R_CheckUserInterrupt();
 	OutReal(stream, REAL(s)[cnt]);
+	}
 }
 }
 
@@ -975,6 +985,7 @@
 	XDR xdrs;
 	Rcomplex *c = COMPLEX(s);
 	for (done = 0; done < length; done += this) {
+	R_CheckUserInterrupt();
 	this = min2(CHUNK_SIZE, length - done);
 	xdrmem_create(&xdrs, buf, (int)(this * sizeof(Rcomplex)), XDR_ENCODE);
 	for(int cnt = 0; cnt < this; cnt++) {
@@ -991,6 +1002,7 @@
 {
 	R_xlen_t done, this;
 	for (done = 0; done < length; done += this) {
+	R_CheckUserInterrupt();
 	this = min2(CHUNK_SIZE, length - done);
 	stream->OutBytes(stream, COMPLEX(s) + done,
 			 (int)(sizeof(Rcomplex) * this));
@@ -998,8 +1010,10 @@
 	break;
 }
 default:
-	for (R_xlen_t cnt = 0; cnt < length; cnt++)
+	for (R_xlen_t cnt = 0; cnt < length; cnt++) {
+	R_CheckUserInterrupt();
 	OutComplex(stream, COMPLEX(s)[cnt]);
+	}
 }
 }
 
@@ -1034,6 +1048,7 @@
 
  tailcall:
 R_CheckStack();
+R_CheckUserInterrupt();
 if (ALTREP(s) && stream->version >= 3) {
 	SEXP info = ALTREP_SERIALIZED_CLASS(s);
 	SEXP state = ALTREP_SERIALIZED_STATE(s);
@@ -1186,15 +1201,19 @@
 	case STRSXP:
 	len = XLENGTH(s);
 	WriteLENGTH(stream, s);
-	for (R_xlen_t ix = 0; ix < len; ix++)
+	for (R_xlen_t ix = 0; ix < len; ix++) {
+		R_CheckUserInterrupt();
 		WriteItem(STRING_ELT(s, ix), ref_table, stream);
+	}
 	break;
 	case VECSXP:
 	case EXPRSXP:
 	len = XLENGTH(s);
 	WriteLENGTH(stream, s);
-	for (R_xlen_t ix = 0; ix < len; ix++)
+	for (R_xlen_t ix = 0; ix < len; ix++) {
+		R_CheckUserInterrupt();
 		WriteItem(VECTOR_ELT(s, ix), ref_table, stream);
+	}
 	break;
 	case BCODESXP:
 	WriteBC(s, ref_table, stream);
@@ -1208,6 +1227,7 @@
 	{
 		R_xlen_t 

Re: [Rd] save.image Non-responsive to Interrupt

2023-05-02 Thread Martin Maechler
> Ivan Krylov 
> on Tue, 2 May 2023 14:59:36 +0300 writes:

> В Sat, 29 Apr 2023 00:00:02 +
> Dario Strbenac via R-devel  пишет:

>> Could save.image() be redesigned so that it promptly responds to
>> Ctrl+C? It prevents the command line from being used for a number of
>> hours if the contents of the workspace are large.

> This is ultimately caused by serialize() being non-interruptible. A
> relatively simple way to hang an R session for a long-ish time would
> therefore be:

> f <- xzfile(nullfile(), 'a+b')
> x <- rep(0, 1e9) # approx. 8 gigabytes, adjust for your RAM size
> serialize(x, f)
> close(f)

> This means that calling R_CheckUserInterrupt() between saving
> individual objects is not enough: R also needs to check for interrupts
> while saving sufficiently long vectors.

> Since the serialize() infrastructure is carefully written to avoid
> resource leaks on allocation failures, it looks relatively safe to
> liberally sprinkle R_CheckUserInterrupt() where it makes sense to do
> so, i.e. once per WriteItem() (which calls itself recursively and
> non-recursively) and once per every downstream for loop iteration.
> Valgrind doesn't show any new leaks if I apply the patch, interrupt
> serialize() and then exit. R also passes make check after the applied
> patch.

> Do these changes make sense, or am I overlooking some other problem?

Thank you, Ivan!

They do make sense... but :

OTOH, in the past we have had to *disable*  R_CheckUserInterrupt()
in parts of R's code because it was too expensive,
{see current src/main/{seq.c,unique.c}  for a series of commented-out
 R_CheckUserInterrupt() for such speed-loss reasons}

so  adding these may need a lot of care when we simultaneously
want to remain  efficient for "morally valid" use of serialization...
where we really don't want to pay too much of a premium.

{{ saving the whole user workspace is not "valid" in that sense
   in my view.  I tell all my (non-beginner) Rstudio-using
   students they should turn *off* the automatic saving and
   loading at session end / beginning; and for reproducibility
   only saveRDS() [or save()] *explicitly* a few precious
   objects ..
}}

Again, we don't want to punish people who know what they are
doing, just because other R users manage to hang their R session
by too little thinking ... 

Your patch adds 15 such interrupt checking calls which may
really be too much -- I'm not claiming they are: with our
recursive objects it's surely not very easy to determine the
"minimally necessary" such calls.

In addition, we may still consider adding an extra optional
argument, say   `check.interrupt = TRUE`
which we may default to TRUE when  save.image() is called
but e.g., not when serialize() is called..

Martin

> -- 
> Best regards,
> Ivan
> x[DELETED ATTACHMENT external: Rd_IvanKrylov_interrupt-serialize.patch, 
text/x-patch]
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] is(x,"ANY") is FALSE

2023-05-02 Thread Russell Almond
I’m somewhat puzzled by the following bit of code on R 4.2.3 (also R 4.2.2)

> df <- data.frame(x=1:3)
> is (df,"ANY")
[1] FALSE

This seem to be false when the first argument is any S3 class, while I would 
think that “ANY” would be true for S3, S4 and reference classes, as well as 
primitive types.

This also seems to be a regression, as code that was previously working no 
longer works.


A little more context on my use.

I’m defining a slot for a reference class using a type union, 
setClass(“MongoDB”,c(“NULL”,”ANY”))

[I should be using setOldClass(“mongo”) here, but I was having trouble 
promoting the S3 “mongo” class I was getting from the library.]

Then when I try to set the corresopnding slot I’m getting an error, because 
“mongo” is not of type MongoDB (even through that is a class union which 
contains “ANY”).  

I can work around the problem by setting the slot to type “ANY”, but then I 
loose the documentation that the intention is that it should a mongo database 
connection.

Did I miss something here?  Or is this an unintended consequence of some other 
change?

Thanks,
  —Russell Almond




[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] save.image Non-responsive to Interrupt

2023-05-02 Thread Jeroen Ooms
On Tue, May 2, 2023 at 3:29 PM Martin Maechler
 wrote:
>
> > Ivan Krylov
> > on Tue, 2 May 2023 14:59:36 +0300 writes:
>
> > В Sat, 29 Apr 2023 00:00:02 +
> > Dario Strbenac via R-devel  пишет:
>
> >> Could save.image() be redesigned so that it promptly responds to
> >> Ctrl+C? It prevents the command line from being used for a number of
> >> hours if the contents of the workspace are large.
>
> > This is ultimately caused by serialize() being non-interruptible. A
> > relatively simple way to hang an R session for a long-ish time would
> > therefore be:
>
> > f <- xzfile(nullfile(), 'a+b')
> > x <- rep(0, 1e9) # approx. 8 gigabytes, adjust for your RAM size
> > serialize(x, f)
> > close(f)
>
> > This means that calling R_CheckUserInterrupt() between saving
> > individual objects is not enough: R also needs to check for interrupts
> > while saving sufficiently long vectors.
>
> > Since the serialize() infrastructure is carefully written to avoid
> > resource leaks on allocation failures, it looks relatively safe to
> > liberally sprinkle R_CheckUserInterrupt() where it makes sense to do
> > so, i.e. once per WriteItem() (which calls itself recursively and
> > non-recursively) and once per every downstream for loop iteration.
> > Valgrind doesn't show any new leaks if I apply the patch, interrupt
> > serialize() and then exit. R also passes make check after the applied
> > patch.
>
> > Do these changes make sense, or am I overlooking some other problem?
>
> Thank you, Ivan!
>
> They do make sense... but :
>
> OTOH, in the past we have had to *disable*  R_CheckUserInterrupt()
> in parts of R's code because it was too expensive,
> {see current src/main/{seq.c,unique.c}  for a series of commented-out
>  R_CheckUserInterrupt() for such speed-loss reasons}
>
> so  adding these may need a lot of care when we simultaneously
> want to remain  efficient for "morally valid" use of serialization...
> where we really don't want to pay too much of a premium.

Alternatively, one could consider making R throttle or debounce calls
to R_CheckUserInterrupt such that a repeated calls within x time are
ignored, cf: https://www.freecodecamp.org/news/javascript-debounce-example/

The reasoning being that it may be difficult for (contributed) code to
determine when/where it is appropriate to check for interrupts, given
varying code paths and cpu speed. Maybe it makes more sense to call
R_CheckUserInterrupt frequently wherever it is safe to do so, and let
R decide if reasonable time has elapsed to actually run the (possibly
expensive) ui check again.

Basic example: https://github.com/r-devel/r-svn/pull/125/files




>
> {{ saving the whole user workspace is not "valid" in that sense
>in my view.  I tell all my (non-beginner) Rstudio-using
>students they should turn *off* the automatic saving and
>loading at session end / beginning; and for reproducibility
>only saveRDS() [or save()] *explicitly* a few precious
>objects ..
> }}
>
> Again, we don't want to punish people who know what they are
> doing, just because other R users manage to hang their R session
> by too little thinking ...
>
> Your patch adds 15 such interrupt checking calls which may
> really be too much -- I'm not claiming they are: with our
> recursive objects it's surely not very easy to determine the
> "minimally necessary" such calls.
>
> In addition, we may still consider adding an extra optional
> argument, say   `check.interrupt = TRUE`
> which we may default to TRUE when  save.image() is called
> but e.g., not when serialize() is called..
>
> Martin
>
> > --
> > Best regards,
> > Ivan
> > x[DELETED ATTACHMENT external: Rd_IvanKrylov_interrupt-serialize.patch, 
> text/x-patch]
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [Sender Not Verified] is(x,"ANY") is FALSE

2023-05-02 Thread Michael Lawrence (MICHAFLA) via R-devel
Hi,

This does seem to be a bug in is(); I'll take a look. I'm curious
about what stopped you from using setOldClass(), since that seems to
be the most appropriate solution.

Michael

On Tue, May 2, 2023 at 2:59 PM Russell Almond  wrote:
>
> I’m somewhat puzzled by the following bit of code on R 4.2.3 (also R 4.2.2)
>
> > df <- data.frame(x=1:3)
> > is (df,"ANY")
> [1] FALSE
>
> This seem to be false when the first argument is any S3 class, while I would 
> think that “ANY” would be true for S3, S4 and reference classes, as well as 
> primitive types.
>
> This also seems to be a regression, as code that was previously working no 
> longer works.
>
>
> A little more context on my use.
>
> I’m defining a slot for a reference class using a type union,
> setClass(“MongoDB”,c(“NULL”,”ANY”))
>
> [I should be using setOldClass(“mongo”) here, but I was having trouble 
> promoting the S3 “mongo” class I was getting from the library.]
>
> Then when I try to set the corresopnding slot I’m getting an error, because 
> “mongo” is not of type MongoDB (even through that is a class union which 
> contains “ANY”).
>
> I can work around the problem by setting the slot to type “ANY”, but then I 
> loose the documentation that the intention is that it should a mongo database 
> connection.
>
> Did I miss something here?  Or is this an unintended consequence of some 
> other change?
>
> Thanks,
>   —Russell Almond
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Michael Lawrence
Senior Principal Scientist, Director of Data Science and Statistical Computing
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] save.image Non-responsive to Interrupt

2023-05-02 Thread Henrik Bengtsson
Along the lines of calling R_CheckUserInterrupt() only onces in a while:

> OTOH, in the past we have had to *disable*  R_CheckUserInterrupt()
> in parts of R's code because it was too expensive,
> {see current src/main/{seq.c,unique.c}  for a series of commented-out
> R_CheckUserInterrupt() for such speed-loss reasons}

First, here are links to these two files viewable online:

 * https://github.com/wch/r-source/blob/trunk/src/main/seq.c

 * https://github.com/wch/r-source/blob/trunk/src/main/unique.c

When not commented out, R_CheckUserInterrupt() would have been called
every 1,000,000 times per:

  /* interval at which to check interrupts */
  #define NINTERRUPT 100

and

  if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt()

in each iteration. That '(i+1) % NINTERRUPT == 0' expression can be
quite expensive too.  However, if we change the code to use NINTERRUPT
= 2^k where k = {1, 2, ...}, say

  #define NINTERRUPT 1048576

the compiler would optimize the condition to use "the modulo of powers
of 2 can alternatively be expressed as a bitwise AND operation"
(Thomas Lumley, 2015-06-15).  The speedup is quite impressive, cf.
.
Alternatively, one can increment a counter and reset to zero after
calling R_CheckUserInterrupt(); I think that's equally performant.

Regarding making serialize() / unserialize() interruptible: I think
can be a good idea since we work with larger objects these days.
However, if we implement this, we probably have to consider what
happens when an interrupt happens. For example, transfers between a
client and a server are no longer atomic at this level, which means we
might end up in a corrupt state. This may, for instance, happen to
database transactions, and PSOCK parallel worker communication.  A
quick fix would be to use base::suspendInterrupts(), but better would
of course be to handle interrupts gracefully.

My $.02 + $0.02

/Henrik

On Tue, May 2, 2023 at 3:56 PM Jeroen Ooms  wrote:
>
> On Tue, May 2, 2023 at 3:29 PM Martin Maechler
>  wrote:
> >
> > > Ivan Krylov
> > > on Tue, 2 May 2023 14:59:36 +0300 writes:
> >
> > > В Sat, 29 Apr 2023 00:00:02 +
> > > Dario Strbenac via R-devel  пишет:
> >
> > >> Could save.image() be redesigned so that it promptly responds to
> > >> Ctrl+C? It prevents the command line from being used for a number of
> > >> hours if the contents of the workspace are large.
> >
> > > This is ultimately caused by serialize() being non-interruptible. A
> > > relatively simple way to hang an R session for a long-ish time would
> > > therefore be:
> >
> > > f <- xzfile(nullfile(), 'a+b')
> > > x <- rep(0, 1e9) # approx. 8 gigabytes, adjust for your RAM size
> > > serialize(x, f)
> > > close(f)
> >
> > > This means that calling R_CheckUserInterrupt() between saving
> > > individual objects is not enough: R also needs to check for interrupts
> > > while saving sufficiently long vectors.
> >
> > > Since the serialize() infrastructure is carefully written to avoid
> > > resource leaks on allocation failures, it looks relatively safe to
> > > liberally sprinkle R_CheckUserInterrupt() where it makes sense to do
> > > so, i.e. once per WriteItem() (which calls itself recursively and
> > > non-recursively) and once per every downstream for loop iteration.
> > > Valgrind doesn't show any new leaks if I apply the patch, interrupt
> > > serialize() and then exit. R also passes make check after the applied
> > > patch.
> >
> > > Do these changes make sense, or am I overlooking some other problem?
> >
> > Thank you, Ivan!
> >
> > They do make sense... but :
> >
> > OTOH, in the past we have had to *disable*  R_CheckUserInterrupt()
> > in parts of R's code because it was too expensive,
> > {see current src/main/{seq.c,unique.c}  for a series of commented-out
> >  R_CheckUserInterrupt() for such speed-loss reasons}
> >
> > so  adding these may need a lot of care when we simultaneously
> > want to remain  efficient for "morally valid" use of serialization...
> > where we really don't want to pay too much of a premium.
>
> Alternatively, one could consider making R throttle or debounce calls
> to R_CheckUserInterrupt such that a repeated calls within x time are
> ignored, cf: https://www.freecodecamp.org/news/javascript-debounce-example/
>
> The reasoning being that it may be difficult for (contributed) code to
> determine when/where it is appropriate to check for interrupts, given
> varying code paths and cpu speed. Maybe it makes more sense to call
> R_CheckUserInterrupt frequently wherever it is safe to do so, and let
> R decide if reasonable time has elapsed to actually run the (possibly
> expensive) ui check again.
>
> Basic example: https://github.com/r-devel/r-svn/pull/125/files
>
>
>
>
> >
> > {{ saving the whole user workspace is not "valid" in that sense
> >in my view.  I tell all my (non-beginn