[Rd] local variable assignment: first copies from higher frame?
hi all -- this might not be the correct list for this question/discussion, though R-help didn't seem like the correct venue, either, so... i'm looking for just some extra clarification of how local variables are defined/bound, beyond the simple cases given in the Language document. the particular instance is when there is variable assignment inside a function. normally, this creates a local variable, but there appears to be an additional preceding step that does a bit more: the local variable is initialized to the value of any same-named variable bound in a containing frame. in a sense, the lexical scoping rule is first applied to acquire a value, and this value is then applied to the new local variable, and is then immediately changed by the assignment operation. i only noticed this when assigning variables to entries within a 'list' structure, like so: tempf <- function(x, local = TRUE) { executing_environment <- environment() closure_environment <- parent.env(executing_environment) print(executing_environment) cat(str(mget("my_list", envir = executing_environment, inherits = FALSE, ifnotfound = NA)[[1]])) print(closure_environment) cat(str(mget("my_list", envir = closure_environment, inherits = FALSE, ifnotfound = NA)[[1]])) if(local) { my_list$x <- x } else { my_list$x <<- x } print(executing_environment) cat(str(mget("my_list", envir = executing_environment, inherits = FALSE, ifnotfound = NA)[[1]])) print(closure_environment) cat(str(mget("my_list", envir = closure_environment, inherits = FALSE, ifnotfound = NA)[[1]])) } > my_list <- list(x = 1, y = 2) > tempf(0, local = TRUE) logi NA List of 2 $ x: num 1 $ y: num 2 List of 2 $ x: num 0 $ y: num 2 List of 2 $ x: num 1 $ y: num 2 > tempf(0, local = FALSE) logi NA List of 2 $ x: num 1 $ y: num 2 logi NA List of 2 $ x: num 0 $ y: num 2 what surprised me in the first "local = TRUE" case is that 'y' is still 2 in the executing environment. so, i think my question comes down to this: when a new local variable is created in an assignment operation, is the full value of any matching variable in a containing frame first copied to the new local variable? and if so, was this chosen as a strategy specifically to allow for these sorts of "indexed" assignment operations? (where i'm assigning to only a single location within the vector object)? and finally, are the other entries in the vector fully copied over, or are they treated as "promises" similar to formal parameters, albeit now as single entries within a containing vector? thanks for any help on digging down a bit on the implementation here! -murat __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] local variable assignment: first copies from higher frame?
ah, that makes perfect sense in the functional programming sense of things. thanks! On Wed, Aug 14, 2013 at 10:19 PM, Peter Meilstrup wrote: > Not anything that complicated -- your answer is in the R language definition > under 'Subset assignment' and the part in "Function calls" that describes > assignment functions. > > Whenever a call is found on the left side of a `<-`, it is munged by > sticking a "<-" on the function name and pulling out the first argument. So > > my_list$x <- x > > which is syntactically equivalent to > > `$`(my_list, x) <- x > > is effectively transformed into something like: > > my_list <- `$<-`(my_list, x, x) > > The function `$<-` gets its argument from wherever it is found, and returns > a modified version. > > Peter __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] proper use of reg.finalizer to close connections
Hi all, I have a question about finalizers... I have a package that manages state for a few connections, and I'd like to ensure that these connections are 'cleanly' closed upon either (i) R quitting or (ii) an unloading of the package. So, in a pared-down example package with a single R file, it looks something like: # BEGIN PACKAGE CODE # .CONNS <- new.env(parent = emptyenv()) .CONNS$resource1 <- NULL .CONNS$resource2 <- NULL ## some more .CONNS resources... reg.finalizer(.CONNS, function(x) sapply(names(x), disconnect), onexit = TRUE) connect <- function(x) { ## here lies code to connect and update .CONNS[[x]] } disconnect <- function(x) { print(sprintf("disconnect(%s)", x)) ## here lies code to disconnect and update .CONNS[[x]] } # END PACKAGE CODE # The print(...) statement in disconnect(...) is there as a trace, as I hoped that I'd see disconnect(...) being called when I quit (or detach(..., unload = TRUE)). But, it doesn't appear that disconnect(...) is ever called when the package (and .CONNS) falls out of memory/scope (and I ran gc() after detach(...), just to be sure). In a second 'shot-in-the-dark' attempt, I placed the reg.finalizer call inside an .onLoad function, but that didn't seem to work, either. I'm guessing my use of reg.finalizer is way off-base here... but I cannot infer from the reg.finalizer man page what I might be doing wrong. Is there a way to see, at the R-system level, what functions have been registered as finalizers? Thanks for any pointers! -Murat __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] proper use of reg.finalizer to close connections
Ah, thanks for the ls() vs names() tip! (But sadly, it didn't solve the issue... ) So, after some more tinkering, I believe the finalizer is being called _sometimes_. I changed the reg.finalizer(...) call to just this: reg.finalizer(.CONNS, function(x) print("foo"), onexit = TRUE) Now, when I load the package and detach(..., unload = TRUE), nothing prints. And when I quit, nothing prints. If I, however, create an environment on the workspace, like so: > e <- new.env(parent = emptyenv()) > reg.finalizer(e, function(x) print("bar"), onexit = TRUE) When I quit (or rm(e)), "bar" is printed. But no "foo" (corresponding to same sequence of code, just in the package instead). BUT(!), when I _install_ the package, "foo" is printed at the end of the "**testing if installed package can be loaded" installation segment. So, somehow the R script that tests for package loading/unloading is triggering the finalizer (which is good). Yet, I cannot seem to trigger it myself when either quitting or forcing a package unload (which is bad). Any ideas why the installation script would successfully trigger a finalizer while standard unloading or quitting wouldn't? Cheers and thanks! -m On Sun, Oct 26, 2014 at 8:03 PM, Gábor Csárdi wrote: > Hmmm, I guess you will want to put the actual objects that represent > the connections into the environment, at least this seems to be the > easiest to me. Btw. you need ls() to list the contents of an > environment, instead of names(). E.g. > > e <- new.env() > e$foo <- 10 > e$bar <- "aaa" > names(e) > #> NULL > ls(e) > #> [1] "bar" "foo" > reg.finalizer(e, function(x) { print(ls(x)) }) > #> NULL > rm(e) > gc() > #> [1] "bar" "foo" > #> used (Mb) gc trigger (Mb) max used (Mb) > #> Ncells 1528877 81.72564037 137.0 2564037 137.0 > #> Vcells 3752538 28.7 7930384 60.6 7930356 60.6 > > More precisely, you probably want to represent each connection as a > separate environment, with its own finalizer. Hope this helps, > Gabor > > On Sun, Oct 26, 2014 at 9:49 PM, Murat Tasan wrote: >> Hi all, I have a question about finalizers... >> I have a package that manages state for a few connections, and I'd >> like to ensure that these connections are 'cleanly' closed upon either >> (i) R quitting or (ii) an unloading of the package. >> So, in a pared-down example package with a single R file, it looks >> something like: >> >> # BEGIN PACKAGE CODE # >> .CONNS <- new.env(parent = emptyenv()) >> .CONNS$resource1 <- NULL >> .CONNS$resource2 <- NULL >> ## some more .CONNS resources... >> >> reg.finalizer(.CONNS, function(x) sapply(names(x), disconnect), onexit = >> TRUE) >> >> connect <- function(x) { >> ## here lies code to connect and update .CONNS[[x]] >> } >> disconnect <- function(x) { >> print(sprintf("disconnect(%s)", x)) >> ## here lies code to disconnect and update .CONNS[[x]] >> } >> # END PACKAGE CODE # >> >> The print(...) statement in disconnect(...) is there as a trace, as I >> hoped that I'd see disconnect(...) being called when I quit (or >> detach(..., unload = TRUE)). >> But, it doesn't appear that disconnect(...) is ever called when the >> package (and .CONNS) falls out of memory/scope (and I ran gc() after >> detach(...), just to be sure). >> >> In a second 'shot-in-the-dark' attempt, I placed the reg.finalizer >> call inside an .onLoad function, but that didn't seem to work, either. >> >> I'm guessing my use of reg.finalizer is way off-base here... but I >> cannot infer from the reg.finalizer man page what I might be doing >> wrong. >> Is there a way to see, at the R-system level, what functions have been >> registered as finalizers? >> >> Thanks for any pointers! >> >> -Murat >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] proper use of reg.finalizer to close connections
Ah (again)! Even with my fumbling presentation of the issue, you gave me the hint that solved it, thanks! Yes, the reg.finalizer call needs to be wrapped in an .onLoad hook so it's not called once during package installation and then never again. And once I switched to using ls() (instead of names()), everything works as expected. So, the package code effectively looks like so: .CONNS <- new.env(parent = emptyenv()) .onLoad <- function(libname, pkgname) { reg.finalizer(.CONNS, function(x) sapply(ls(x), .disconnect)) } .disconnect <- function(x) { ## handle disconnection of .CONNS[[x]] here } Cheers and thanks! -m On Sun, Oct 26, 2014 at 8:53 PM, Gábor Csárdi wrote: > Well, to be honest I don't understand fully what you are trying to do. > If you want to run code when the package is detached or when it is > unloaded, then use a hook: > http://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Load-hooks > > If you want to run code when an object is freed, then use a finalizer. > > Note that when you install a package, R runs all the code in the > package and only stores the results of the code in the installed > package. So if you create an object outside of a function in your > package, then only the object will be stored in the package, but not > the code that creates it. The object will be simply loaded when you > load the package, but it will not be re-created. > > Now, I am not sure what happens if you set the finalizer on such an > object in the package. I can imagine that the finalizer will not be > saved into the package, and is only used once, when > building/installing the package. In this case you'll need to set the > finalizer in .onLoad(). > > Gabor > > On Sun, Oct 26, 2014 at 10:35 PM, Murat Tasan wrote: >> Ah, thanks for the ls() vs names() tip! >> (But sadly, it didn't solve the issue... ) >> >> So, after some more tinkering, I believe the finalizer is being called >> _sometimes_. >> I changed the reg.finalizer(...) call to just this: >> >> reg.finalizer(.CONNS, function(x) print("foo"), onexit = TRUE) >> >> Now, when I load the package and detach(..., unload = TRUE), nothing prints. >> And when I quit, nothing prints. >> >> If I, however, create an environment on the workspace, like so: >>> e <- new.env(parent = emptyenv()) >>> reg.finalizer(e, function(x) print("bar"), onexit = TRUE) >> When I quit (or rm(e)), "bar" is printed. >> But no "foo" (corresponding to same sequence of code, just in the >> package instead). >> >> BUT(!), when I _install_ the package, "foo" is printed at the end of >> the "**testing if installed package can be loaded" installation >> segment. >> So, somehow the R script that tests for package loading/unloading is >> triggering the finalizer (which is good). >> Yet, I cannot seem to trigger it myself when either quitting or >> forcing a package unload (which is bad). >> >> Any ideas why the installation script would successfully trigger a >> finalizer while standard unloading or quitting wouldn't? >> >> Cheers and thanks! >> >> -m >> >> On Sun, Oct 26, 2014 at 8:03 PM, Gábor Csárdi wrote: >>> Hmmm, I guess you will want to put the actual objects that represent >>> the connections into the environment, at least this seems to be the >>> easiest to me. Btw. you need ls() to list the contents of an >>> environment, instead of names(). E.g. >>> >>> e <- new.env() >>> e$foo <- 10 >>> e$bar <- "aaa" >>> names(e) >>> #> NULL >>> ls(e) >>> #> [1] "bar" "foo" >>> reg.finalizer(e, function(x) { print(ls(x)) }) >>> #> NULL >>> rm(e) >>> gc() >>> #> [1] "bar" "foo" >>> #> used (Mb) gc trigger (Mb) max used (Mb) >>> #> Ncells 1528877 81.72564037 137.0 2564037 137.0 >>> #> Vcells 3752538 28.77930384 60.6 7930356 60.6 >>> >>> More precisely, you probably want to represent each connection as a >>> separate environment, with its own finalizer. Hope this helps, >>> Gabor >>> >>> On Sun, Oct 26, 2014 at 9:49 PM, Murat Tasan wrote: >>>> Hi all, I have a question about finalizers... >>>> I have a package that manages state for a few connections, and I'd >>>> like to ensure that these connections are 'cleanly' closed upon either >>>> (i) R quitting or (ii) an unloading of the package. >&g
Re: [Rd] proper use of reg.finalizer to close connections
Ah, good point, I hadn't thought of that detail. Would moving reg.finalizer back outside of .onLoad and hooking it to the package's environment itself work (more safely)? Something like: finalizerFunction <- ## cleanup code reg.finalizer(parent.env(), finalizerFunction) -m On Oct 26, 2014 11:03 PM, "Henrik Bengtsson" wrote: > On Sun, Oct 26, 2014 at 8:14 PM, Murat Tasan wrote: > > Ah (again)! > > Even with my fumbling presentation of the issue, you gave me the hint > > that solved it, thanks! > > > > Yes, the reg.finalizer call needs to be wrapped in an .onLoad hook so > > it's not called once during package installation and then never again. > > And once I switched to using ls() (instead of names()), everything > > works as expected. > > > > So, the package code effectively looks like so: > > > > .CONNS <- new.env(parent = emptyenv()) > > .onLoad <- function(libname, pkgname) { > > reg.finalizer(.CONNS, function(x) sapply(ls(x), .disconnect)) > > } > > .disconnect <- function(x) { > > ## handle disconnection of .CONNS[[x]] here > > } > > In your example above, I would be concerned about what happens if you > detach/unload your package, because then you're finalizer is still > registered and will be called whenever '.CONNS' is being garbage > collector (or there after). However, the finalizer function calls > .disconnect(), which is no longer available. > > Finalizers should be used with great care, because you're not in > control in what order things are occurring and what "resources" are > around when the finalizer function is eventually called and when it is > called. I've been bitten by this a few times and it can be very hard > to reproduce and troubleshoot such bugs. See also the 'Note' of > ?reg.finalizer. > > My $.02 > > /Henrik > > > > > Cheers and thanks! > > > > -m > > > > > > > > > > On Sun, Oct 26, 2014 at 8:53 PM, Gábor Csárdi > wrote: > >> Well, to be honest I don't understand fully what you are trying to do. > >> If you want to run code when the package is detached or when it is > >> unloaded, then use a hook: > >> http://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Load-hooks > >> > >> If you want to run code when an object is freed, then use a finalizer. > >> > >> Note that when you install a package, R runs all the code in the > >> package and only stores the results of the code in the installed > >> package. So if you create an object outside of a function in your > >> package, then only the object will be stored in the package, but not > >> the code that creates it. The object will be simply loaded when you > >> load the package, but it will not be re-created. > >> > >> Now, I am not sure what happens if you set the finalizer on such an > >> object in the package. I can imagine that the finalizer will not be > >> saved into the package, and is only used once, when > >> building/installing the package. In this case you'll need to set the > >> finalizer in .onLoad(). > >> > >> Gabor > >> > >> On Sun, Oct 26, 2014 at 10:35 PM, Murat Tasan wrote: > >>> Ah, thanks for the ls() vs names() tip! > >>> (But sadly, it didn't solve the issue... ) > >>> > >>> So, after some more tinkering, I believe the finalizer is being called > >>> _sometimes_. > >>> I changed the reg.finalizer(...) call to just this: > >>> > >>> reg.finalizer(.CONNS, function(x) print("foo"), onexit = TRUE) > >>> > >>> Now, when I load the package and detach(..., unload = TRUE), nothing > prints. > >>> And when I quit, nothing prints. > >>> > >>> If I, however, create an environment on the workspace, like so: > >>>> e <- new.env(parent = emptyenv()) > >>>> reg.finalizer(e, function(x) print("bar"), onexit = TRUE) > >>> When I quit (or rm(e)), "bar" is printed. > >>> But no "foo" (corresponding to same sequence of code, just in the > >>> package instead). > >>> > >>> BUT(!), when I _install_ the package, "foo" is printed at the end of > >>> the "**testing if installed package can be loaded" installation > >>> segment. > >>> So, somehow the R script that tests for package loading/unloading is > >>> triggering the finalizer
Re: [Rd] proper use of reg.finalizer to close connections
Eh, after some flailing, I think I solved it. I _think_ this pattern should guarantee that the finalizer function is still present when needed: .STATE_CONTAINER <- new.env(parent = emptyenv()) .STATE_CONTAINER$some_state_variable <- ## some code .STATE_CONTAINER$some_other_state_variable <- ## some code .myFinalizer <- function(name_of_state_variable_to_clean_up) .onLoad <- function(libname, pkgname) { reg.finalizer( e = parent.env(environment()), f = function(env) sapply(ls(env$.STATE_CONTAINER), .myFinalizer), onexit = TRUE) } This way, the finalizer is registered on the enclosing environment of the .onLoad function, which should be the package environment itself. And that means .myFinalizer should still be around when it's called during q() or unload/gc(). Effectively, the finalizer is tied to the entire package, rather than the state variable container(s), which might not be the most elegant solution, but it should work well enough for most purposes. Cheers and thanks for the advice, -m On Mon, Oct 27, 2014 at 12:18 AM, Murat Tasan wrote: > Ah, good point, I hadn't thought of that detail. > Would moving reg.finalizer back outside of .onLoad and hooking it to the > package's environment itself work (more safely)? > Something like: > finalizerFunction <- ## cleanup code > reg.finalizer(parent.env(), finalizerFunction) > > -m > > On Oct 26, 2014 11:03 PM, "Henrik Bengtsson" wrote: >> >> On Sun, Oct 26, 2014 at 8:14 PM, Murat Tasan wrote: >> > Ah (again)! >> > Even with my fumbling presentation of the issue, you gave me the hint >> > that solved it, thanks! >> > >> > Yes, the reg.finalizer call needs to be wrapped in an .onLoad hook so >> > it's not called once during package installation and then never again. >> > And once I switched to using ls() (instead of names()), everything >> > works as expected. >> > >> > So, the package code effectively looks like so: >> > >> > .CONNS <- new.env(parent = emptyenv()) >> > .onLoad <- function(libname, pkgname) { >> > reg.finalizer(.CONNS, function(x) sapply(ls(x), .disconnect)) >> > } >> > .disconnect <- function(x) { >> > ## handle disconnection of .CONNS[[x]] here >> > } >> >> In your example above, I would be concerned about what happens if you >> detach/unload your package, because then you're finalizer is still >> registered and will be called whenever '.CONNS' is being garbage >> collector (or there after). However, the finalizer function calls >> .disconnect(), which is no longer available. >> >> Finalizers should be used with great care, because you're not in >> control in what order things are occurring and what "resources" are >> around when the finalizer function is eventually called and when it is >> called. I've been bitten by this a few times and it can be very hard >> to reproduce and troubleshoot such bugs. See also the 'Note' of >> ?reg.finalizer. >> >> My $.02 >> >> /Henrik >> >> > >> > Cheers and thanks! >> > >> > -m >> > >> > >> > >> > >> > On Sun, Oct 26, 2014 at 8:53 PM, Gábor Csárdi >> > wrote: >> >> Well, to be honest I don't understand fully what you are trying to do. >> >> If you want to run code when the package is detached or when it is >> >> unloaded, then use a hook: >> >> http://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Load-hooks >> >> >> >> If you want to run code when an object is freed, then use a finalizer. >> >> >> >> Note that when you install a package, R runs all the code in the >> >> package and only stores the results of the code in the installed >> >> package. So if you create an object outside of a function in your >> >> package, then only the object will be stored in the package, but not >> >> the code that creates it. The object will be simply loaded when you >> >> load the package, but it will not be re-created. >> >> >> >> Now, I am not sure what happens if you set the finalizer on such an >> >> object in the package. I can imagine that the finalizer will not be >> >> saved into the package, and is only used once, when >> >> building/installing the package. In this case you'll need to set the >> >> finalizer in .onLoad(). >> >> >> >> Gabor >> >> >> >> On Sun, Oct 26, 2014 at 10:35
Re: [Rd] proper use of reg.finalizer to close connections
yup... for context, the finalizer code calls functions from packages that are imported by my package. so, i think (unless something else has gone seriously wrong), those imported namespaces should still be available prior to my package's unloading. (and if imported namespaces are detached prior to the dependent package's unloading, well, then, perhaps i'll just re-write all of this in .) thanks again! -m On Mon, Oct 27, 2014 at 11:27 AM, Henrik Bengtsson wrote: > ...and don't forget to make sure all the function that .myFinalizer() > calls are also around. /Henrik > > On Mon, Oct 27, 2014 at 10:10 AM, Murat Tasan wrote: >> Eh, after some flailing, I think I solved it. >> I _think_ this pattern should guarantee that the finalizer function is >> still present when needed: >> >> .STATE_CONTAINER <- new.env(parent = emptyenv()) >> .STATE_CONTAINER$some_state_variable <- ## some code >> .STATE_CONTAINER$some_other_state_variable <- ## some code >> >> .myFinalizer <- function(name_of_state_variable_to_clean_up) >> >> .onLoad <- function(libname, pkgname) { >> reg.finalizer( >> e = parent.env(environment()), >> f = function(env) sapply(ls(env$.STATE_CONTAINER), .myFinalizer), >> onexit = TRUE) >> } >> >> This way, the finalizer is registered on the enclosing environment of >> the .onLoad function, which should be the package environment itself. >> And that means .myFinalizer should still be around when it's called >> during q() or unload/gc(). >> Effectively, the finalizer is tied to the entire package, rather than >> the state variable container(s), which might not be the most elegant >> solution, but it should work well enough for most purposes. >> >> Cheers and thanks for the advice, >> >> -m >> >> On Mon, Oct 27, 2014 at 12:18 AM, Murat Tasan wrote: >>> Ah, good point, I hadn't thought of that detail. >>> Would moving reg.finalizer back outside of .onLoad and hooking it to the >>> package's environment itself work (more safely)? >>> Something like: >>> finalizerFunction <- ## cleanup code >>> reg.finalizer(parent.env(), finalizerFunction) >>> >>> -m >>> >>> On Oct 26, 2014 11:03 PM, "Henrik Bengtsson" wrote: >>>> >>>> On Sun, Oct 26, 2014 at 8:14 PM, Murat Tasan wrote: >>>> > Ah (again)! >>>> > Even with my fumbling presentation of the issue, you gave me the hint >>>> > that solved it, thanks! >>>> > >>>> > Yes, the reg.finalizer call needs to be wrapped in an .onLoad hook so >>>> > it's not called once during package installation and then never again. >>>> > And once I switched to using ls() (instead of names()), everything >>>> > works as expected. >>>> > >>>> > So, the package code effectively looks like so: >>>> > >>>> > .CONNS <- new.env(parent = emptyenv()) >>>> > .onLoad <- function(libname, pkgname) { >>>> > reg.finalizer(.CONNS, function(x) sapply(ls(x), .disconnect)) >>>> > } >>>> > .disconnect <- function(x) { >>>> > ## handle disconnection of .CONNS[[x]] here >>>> > } >>>> >>>> In your example above, I would be concerned about what happens if you >>>> detach/unload your package, because then you're finalizer is still >>>> registered and will be called whenever '.CONNS' is being garbage >>>> collector (or there after). However, the finalizer function calls >>>> .disconnect(), which is no longer available. >>>> >>>> Finalizers should be used with great care, because you're not in >>>> control in what order things are occurring and what "resources" are >>>> around when the finalizer function is eventually called and when it is >>>> called. I've been bitten by this a few times and it can be very hard >>>> to reproduce and troubleshoot such bugs. See also the 'Note' of >>>> ?reg.finalizer. >>>> >>>> My $.02 >>>> >>>> /Henrik >>>> >>>> > >>>> > Cheers and thanks! >>>> > >>>> > -m >>>> > >>>> > >>>> > >>>> > >>>> > On Sun, Oct 26, 2014 at 8:53 PM, Gábor Csárdi >>>> > wrote: >>>> >&g
[Rd] common base functions stripping S3 class
Hi all --- this is less a specific question and more general regarding S3 classes. I've noticed that quite a few very common default implementations of generic functions (e.g. `unique`, `[`, `as.data.frame`) strip away class information. In some cases, it appears conditionals have been created to re-assign the class, but only for a few special types. For example, in `unique.default`, if the argument inherits (_only_) from "POSIXct" or "Date", the initial class is re-assigned to the returned object. But for any other custom S3 classes, it means we have to catch these frequent cases and write a lot relatively plain wrappers, e.g.: unique.MyClass <- function(x, incomparables = FALSE, ...) { structure(unique(unclass(x)), class = class(x)) } It's certainly nice to be able to create a very simple wrapper class on a base type, so that we can override common functions like plot(x). (An example is a simple class attribute that dictates a particular plot style for a vector of integers.) But it would be even nicer to not have to detect and override all the un-class events that occur when manipulating these objects with everyday functions, e.g. when adding that 'classed' integer vector to a data frame. Apart from moving to S4 classes, how have most dealt with this? Might there be a list of common functions for which the default implementation strips class information? (Such a list could be a handy "consider overriding _this_" guide for implementors of any new classes.) Cheers and thanks for any tips! -murat __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] order(..., na.last = NA) performance hit
I've just recently noticed that using the na.last = NA setting with order incurs a HUGE performance hit. It appears that much of order(...) (the R wrapper, not the internal calls) is written in as general a manner as possible to handle the large number of input types. But the canonical case of ordering a single vector of numerics suffers greatly with the current implementation. Below is a single trivial example, but overall I've been noticing somewhere on the order of a 10X performance hit when using na.last = NA. Would it be worth (i) attempting a re-write of the wrapping order(...) function, or (ii) at least mentioning the performance implications in the help page for order(...)? Here's an example of the performance hit: x <- runif(1e6) x[runif(1e6) > 0.9] <- NA ## add some (~10%) NA values order2 <- function(x) { iix <- order(x, na.last = TRUE) iix[!is.na(x[iix])] } system.time(y1 <- order(x, na.last = TRUE)) ##user system elapsed ##0.480.000.48 system.time(y2 <- order(x, na.last = NA)) ##user system elapsed ## 3.060 0.056 3.118 system.time(y3 <- order2(x)) ##user system elapsed ## 0.520 0.004 0.520 all(y2 == y3) ## [1] TRUE identical(y2, y3) ## [1] TRUE Cheers, -murat __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel