Along the lines of calling R_CheckUserInterrupt() only onces in a while: > OTOH, in the past we have had to *disable* R_CheckUserInterrupt() > in parts of R's code because it was too expensive, > {see current src/main/{seq.c,unique.c} for a series of commented-out > R_CheckUserInterrupt() for such speed-loss reasons}
First, here are links to these two files viewable online: * https://github.com/wch/r-source/blob/trunk/src/main/seq.c * https://github.com/wch/r-source/blob/trunk/src/main/unique.c When not commented out, R_CheckUserInterrupt() would have been called every 1,000,000 times per: /* interval at which to check interrupts */ #define NINTERRUPT 1000000 and if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt() in each iteration. That '(i+1) % NINTERRUPT == 0' expression can be quite expensive too. However, if we change the code to use NINTERRUPT = 2^k where k = {1, 2, ...}, say #define NINTERRUPT 1048576 the compiler would optimize the condition to use "the modulo of powers of 2 can alternatively be expressed as a bitwise AND operation" (Thomas Lumley, 2015-06-15). The speedup is quite impressive, cf. <https://www.jottr.org/2015/06/05/checkuserinterrupt/>. Alternatively, one can increment a counter and reset to zero after calling R_CheckUserInterrupt(); I think that's equally performant. Regarding making serialize() / unserialize() interruptible: I think can be a good idea since we work with larger objects these days. However, if we implement this, we probably have to consider what happens when an interrupt happens. For example, transfers between a client and a server are no longer atomic at this level, which means we might end up in a corrupt state. This may, for instance, happen to database transactions, and PSOCK parallel worker communication. A quick fix would be to use base::suspendInterrupts(), but better would of course be to handle interrupts gracefully. My $.02 + $0.02 /Henrik On Tue, May 2, 2023 at 3:56 PM Jeroen Ooms <jeroeno...@gmail.com> wrote: > > On Tue, May 2, 2023 at 3:29 PM Martin Maechler > <maech...@stat.math.ethz.ch> wrote: > > > > >>>>> Ivan Krylov > > >>>>> on Tue, 2 May 2023 14:59:36 +0300 writes: > > > > > В Sat, 29 Apr 2023 00:00:02 +0000 > > > Dario Strbenac via R-devel <r-devel@r-project.org> пишет: > > > > >> Could save.image() be redesigned so that it promptly responds to > > >> Ctrl+C? It prevents the command line from being used for a number of > > >> hours if the contents of the workspace are large. > > > > > This is ultimately caused by serialize() being non-interruptible. A > > > relatively simple way to hang an R session for a long-ish time would > > > therefore be: > > > > > f <- xzfile(nullfile(), 'a+b') > > > x <- rep(0, 1e9) # approx. 8 gigabytes, adjust for your RAM size > > > serialize(x, f) > > > close(f) > > > > > This means that calling R_CheckUserInterrupt() between saving > > > individual objects is not enough: R also needs to check for interrupts > > > while saving sufficiently long vectors. > > > > > Since the serialize() infrastructure is carefully written to avoid > > > resource leaks on allocation failures, it looks relatively safe to > > > liberally sprinkle R_CheckUserInterrupt() where it makes sense to do > > > so, i.e. once per WriteItem() (which calls itself recursively and > > > non-recursively) and once per every downstream for loop iteration. > > > Valgrind doesn't show any new leaks if I apply the patch, interrupt > > > serialize() and then exit. R also passes make check after the applied > > > patch. > > > > > Do these changes make sense, or am I overlooking some other problem? > > > > Thank you, Ivan! > > > > They do make sense... but : > > > > OTOH, in the past we have had to *disable* R_CheckUserInterrupt() > > in parts of R's code because it was too expensive, > > {see current src/main/{seq.c,unique.c} for a series of commented-out > > R_CheckUserInterrupt() for such speed-loss reasons} > > > > so adding these may need a lot of care when we simultaneously > > want to remain efficient for "morally valid" use of serialization... > > where we really don't want to pay too much of a premium. > > Alternatively, one could consider making R throttle or debounce calls > to R_CheckUserInterrupt such that a repeated calls within x time are > ignored, cf: https://www.freecodecamp.org/news/javascript-debounce-example/ > > The reasoning being that it may be difficult for (contributed) code to > determine when/where it is appropriate to check for interrupts, given > varying code paths and cpu speed. Maybe it makes more sense to call > R_CheckUserInterrupt frequently wherever it is safe to do so, and let > R decide if reasonable time has elapsed to actually run the (possibly > expensive) ui check again. > > Basic example: https://github.com/r-devel/r-svn/pull/125/files > > > > > > > > {{ saving the whole user workspace is not "valid" in that sense > > in my view. I tell all my (non-beginner) Rstudio-using > > students they should turn *off* the automatic saving and > > loading at session end / beginning; and for reproducibility > > only saveRDS() [or save()] *explicitly* a few precious > > objects .. > > }} > > > > Again, we don't want to punish people who know what they are > > doing, just because other R users manage to hang their R session > > by too little thinking ... > > > > Your patch adds 15 such interrupt checking calls which may > > really be too much -- I'm not claiming they are: with our > > recursive objects it's surely not very easy to determine the > > "minimally necessary" such calls. > > > > In addition, we may still consider adding an extra optional > > argument, say `check.interrupt = TRUE` > > which we may default to TRUE when save.image() is called > > but e.g., not when serialize() is called.. > > > > Martin > > > > > -- > > > Best regards, > > > Ivan > > > x[DELETED ATTACHMENT external: > > Rd_IvanKrylov_interrupt-serialize.patch, text/x-patch] > > > ______________________________________________ > > > R-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel