[Rd] local variable assignment: first copies from higher frame?

2013-08-14 Thread Murat Tasan
hi all -- this might not be the correct list for this
question/discussion, though R-help didn't seem like the correct venue,
either, so...

i'm looking for just some extra clarification of how local variables
are defined/bound, beyond the simple cases given in the Language
document.

the particular instance is when there is variable assignment inside a function.
normally, this creates a local variable, but there appears to be an
additional preceding step that does a bit more: the local variable is
initialized to the value of any same-named variable bound in a
containing frame.
in a sense, the lexical scoping rule is first applied to acquire a
value, and this value is then applied to the new local variable, and
is then immediately changed by the assignment operation.

i only noticed this when assigning variables to entries within a
'list' structure, like so:

tempf <- function(x, local = TRUE)
  {
executing_environment <- environment()
closure_environment <- parent.env(executing_environment)

print(executing_environment)
cat(str(mget("my_list", envir = executing_environment, inherits =
FALSE, ifnotfound = NA)[[1]]))
print(closure_environment)
cat(str(mget("my_list", envir = closure_environment, inherits =
FALSE, ifnotfound = NA)[[1]]))

if(local) {
  my_list$x <- x
} else {
  my_list$x <<- x
}

print(executing_environment)
cat(str(mget("my_list", envir = executing_environment, inherits =
FALSE, ifnotfound = NA)[[1]]))
print(closure_environment)
cat(str(mget("my_list", envir = closure_environment, inherits =
FALSE, ifnotfound = NA)[[1]]))
  }

> my_list <- list(x = 1, y = 2)
> tempf(0, local = TRUE)

 logi NA

List of 2
 $ x: num 1
 $ y: num 2

List of 2
 $ x: num 0
 $ y: num 2

List of 2
 $ x: num 1
 $ y: num 2
> tempf(0, local = FALSE)

 logi NA

List of 2
 $ x: num 1
 $ y: num 2

 logi NA

List of 2
 $ x: num 0
 $ y: num 2

what surprised me in the first "local = TRUE" case is that 'y' is
still 2 in the executing environment.
so, i think my question comes down to this: when a new local variable
is created in an assignment operation, is the full value of any
matching variable in a containing frame first copied to the new local
variable?
and if so, was this chosen as a strategy specifically to allow for
these sorts of "indexed" assignment operations? (where i'm assigning
to only a single location within the vector object)?
and finally, are the other entries in the vector fully copied over, or
are they treated as "promises" similar to formal parameters, albeit
now as single entries within a containing vector?

thanks for any help on digging down a bit on the implementation here!

-murat

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] local variable assignment: first copies from higher frame?

2013-08-16 Thread Murat Tasan
ah, that makes perfect sense in the functional programming sense of things.
thanks!

On Wed, Aug 14, 2013 at 10:19 PM, Peter Meilstrup
 wrote:
> Not anything that complicated -- your answer is in the R language definition
> under 'Subset assignment' and the part in "Function calls" that describes
> assignment functions.
>
> Whenever a call is found on the left side of a `<-`, it is munged by
> sticking a "<-" on the function name and pulling out the first argument. So
>
> my_list$x <- x
>
> which is syntactically equivalent to
>
> `$`(my_list, x) <- x
>
> is effectively transformed into something like:
>
> my_list <- `$<-`(my_list, x, x)
>
> The function `$<-` gets its argument from wherever it is found, and returns
> a modified version.
>
> Peter

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] proper use of reg.finalizer to close connections

2014-10-26 Thread Murat Tasan
Hi all, I have a question about finalizers...
I have a package that manages state for a few connections, and I'd
like to ensure that these connections are 'cleanly' closed upon either
(i) R quitting or (ii) an unloading of the package.
So, in a pared-down example package with a single R file, it looks
something like:

# BEGIN PACKAGE CODE #
.CONNS <- new.env(parent = emptyenv())
.CONNS$resource1 <- NULL
.CONNS$resource2 <- NULL
## some more .CONNS resources...

reg.finalizer(.CONNS, function(x) sapply(names(x), disconnect), onexit = TRUE)

connect <- function(x) {
  ## here lies code to connect and update .CONNS[[x]]
}
disconnect <- function(x) {
  print(sprintf("disconnect(%s)", x))
  ## here lies code to disconnect and update .CONNS[[x]]
}
# END PACKAGE CODE #

The print(...) statement in disconnect(...) is there as a trace, as I
hoped that I'd see disconnect(...) being called when I quit (or
detach(..., unload = TRUE)).
But, it doesn't appear that disconnect(...) is ever called when the
package (and .CONNS) falls out of memory/scope (and I ran gc() after
detach(...), just to be sure).

In a second 'shot-in-the-dark' attempt, I placed the reg.finalizer
call inside an .onLoad function, but that didn't seem to work, either.

I'm guessing my use of reg.finalizer is way off-base here... but I
cannot infer from the reg.finalizer man page what I might be doing
wrong.
Is there a way to see, at the R-system level, what functions have been
registered as finalizers?

Thanks for any pointers!

-Murat

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] proper use of reg.finalizer to close connections

2014-10-26 Thread Murat Tasan
Ah, thanks for the ls() vs names() tip!
(But sadly, it didn't solve the issue... )

So, after some more tinkering, I believe the finalizer is being called
_sometimes_.
I changed the reg.finalizer(...) call to just this:

reg.finalizer(.CONNS, function(x) print("foo"), onexit  = TRUE)

Now, when I load the package and detach(..., unload = TRUE), nothing prints.
And when I quit, nothing prints.

If I, however, create an environment on the workspace, like so:
> e <- new.env(parent = emptyenv())
> reg.finalizer(e, function(x) print("bar"), onexit = TRUE)
When I quit (or rm(e)), "bar" is printed.
But no "foo" (corresponding to same sequence of code, just in the
package instead).

BUT(!), when I _install_ the package, "foo" is printed at the end of
the "**testing if installed package can be loaded" installation
segment.
So, somehow the R script that tests for package loading/unloading is
triggering the finalizer (which is good).
Yet, I cannot seem to trigger it myself when either quitting or
forcing a package unload (which is bad).

Any ideas why the installation script would successfully trigger a
finalizer while standard unloading or quitting wouldn't?

Cheers and thanks!

-m

On Sun, Oct 26, 2014 at 8:03 PM, Gábor Csárdi  wrote:
> Hmmm, I guess you will want to put the actual objects that represent
> the connections into the environment, at least this seems to be the
> easiest to me. Btw. you need ls() to list the contents of an
> environment, instead of names(). E.g.
>
> e <- new.env()
> e$foo <- 10
> e$bar <- "aaa"
> names(e)
> #> NULL
> ls(e)
> #> [1] "bar" "foo"
> reg.finalizer(e, function(x) { print(ls(x)) })
> #> NULL
> rm(e)
> gc()
> #> [1] "bar" "foo"
> #>   used (Mb) gc trigger  (Mb) max used  (Mb)
> #> Ncells 1528877 81.72564037 137.0  2564037 137.0
> #> Vcells 3752538 28.7    7930384  60.6  7930356  60.6
>
> More precisely, you probably want to represent each connection as a
> separate environment, with its own finalizer. Hope this helps,
> Gabor
>
> On Sun, Oct 26, 2014 at 9:49 PM, Murat Tasan  wrote:
>> Hi all, I have a question about finalizers...
>> I have a package that manages state for a few connections, and I'd
>> like to ensure that these connections are 'cleanly' closed upon either
>> (i) R quitting or (ii) an unloading of the package.
>> So, in a pared-down example package with a single R file, it looks
>> something like:
>>
>> # BEGIN PACKAGE CODE #
>> .CONNS <- new.env(parent = emptyenv())
>> .CONNS$resource1 <- NULL
>> .CONNS$resource2 <- NULL
>> ## some more .CONNS resources...
>>
>> reg.finalizer(.CONNS, function(x) sapply(names(x), disconnect), onexit = 
>> TRUE)
>>
>> connect <- function(x) {
>>   ## here lies code to connect and update .CONNS[[x]]
>> }
>> disconnect <- function(x) {
>>   print(sprintf("disconnect(%s)", x))
>>   ## here lies code to disconnect and update .CONNS[[x]]
>> }
>> # END PACKAGE CODE #
>>
>> The print(...) statement in disconnect(...) is there as a trace, as I
>> hoped that I'd see disconnect(...) being called when I quit (or
>> detach(..., unload = TRUE)).
>> But, it doesn't appear that disconnect(...) is ever called when the
>> package (and .CONNS) falls out of memory/scope (and I ran gc() after
>> detach(...), just to be sure).
>>
>> In a second 'shot-in-the-dark' attempt, I placed the reg.finalizer
>> call inside an .onLoad function, but that didn't seem to work, either.
>>
>> I'm guessing my use of reg.finalizer is way off-base here... but I
>> cannot infer from the reg.finalizer man page what I might be doing
>> wrong.
>> Is there a way to see, at the R-system level, what functions have been
>> registered as finalizers?
>>
>> Thanks for any pointers!
>>
>> -Murat
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] proper use of reg.finalizer to close connections

2014-10-26 Thread Murat Tasan
Ah (again)!
Even with my fumbling presentation of the issue, you gave me the hint
that solved it, thanks!

Yes, the reg.finalizer call needs to be wrapped in an .onLoad hook so
it's not called once during package installation and then never again.
And once I switched to using ls() (instead of names()), everything
works as expected.

So, the package code effectively looks like so:

.CONNS <- new.env(parent = emptyenv())
.onLoad <- function(libname, pkgname) {
reg.finalizer(.CONNS, function(x) sapply(ls(x), .disconnect))
}
.disconnect <- function(x) {
## handle disconnection of .CONNS[[x]] here
}

Cheers and thanks!

-m




On Sun, Oct 26, 2014 at 8:53 PM, Gábor Csárdi  wrote:
> Well, to be honest I don't understand fully what you are trying to do.
> If you want to run code when the package is detached or when it is
> unloaded, then use a hook:
> http://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Load-hooks
>
> If you want to run code when an object is freed, then use a finalizer.
>
> Note that when you install a package, R runs all the code in the
> package and only stores the results of the code in the installed
> package. So if you create an object outside of a function in your
> package, then only the object will be stored in the package, but not
> the code that creates it. The object will be simply loaded when you
> load the package, but it will not be re-created.
>
> Now, I am not sure what happens if you set the finalizer on such an
> object in the package. I can imagine that the finalizer will not be
> saved into the package, and is only used once, when
> building/installing the package. In this case you'll need to set the
> finalizer in .onLoad().
>
> Gabor
>
> On Sun, Oct 26, 2014 at 10:35 PM, Murat Tasan  wrote:
>> Ah, thanks for the ls() vs names() tip!
>> (But sadly, it didn't solve the issue... )
>>
>> So, after some more tinkering, I believe the finalizer is being called
>> _sometimes_.
>> I changed the reg.finalizer(...) call to just this:
>>
>> reg.finalizer(.CONNS, function(x) print("foo"), onexit  = TRUE)
>>
>> Now, when I load the package and detach(..., unload = TRUE), nothing prints.
>> And when I quit, nothing prints.
>>
>> If I, however, create an environment on the workspace, like so:
>>> e <- new.env(parent = emptyenv())
>>> reg.finalizer(e, function(x) print("bar"), onexit = TRUE)
>> When I quit (or rm(e)), "bar" is printed.
>> But no "foo" (corresponding to same sequence of code, just in the
>> package instead).
>>
>> BUT(!), when I _install_ the package, "foo" is printed at the end of
>> the "**testing if installed package can be loaded" installation
>> segment.
>> So, somehow the R script that tests for package loading/unloading is
>> triggering the finalizer (which is good).
>> Yet, I cannot seem to trigger it myself when either quitting or
>> forcing a package unload (which is bad).
>>
>> Any ideas why the installation script would successfully trigger a
>> finalizer while standard unloading or quitting wouldn't?
>>
>> Cheers and thanks!
>>
>> -m
>>
>> On Sun, Oct 26, 2014 at 8:03 PM, Gábor Csárdi  wrote:
>>> Hmmm, I guess you will want to put the actual objects that represent
>>> the connections into the environment, at least this seems to be the
>>> easiest to me. Btw. you need ls() to list the contents of an
>>> environment, instead of names(). E.g.
>>>
>>> e <- new.env()
>>> e$foo <- 10
>>> e$bar <- "aaa"
>>> names(e)
>>> #> NULL
>>> ls(e)
>>> #> [1] "bar" "foo"
>>> reg.finalizer(e, function(x) { print(ls(x)) })
>>> #> NULL
>>> rm(e)
>>> gc()
>>> #> [1] "bar" "foo"
>>> #>   used (Mb) gc trigger  (Mb) max used  (Mb)
>>> #> Ncells 1528877 81.72564037 137.0  2564037 137.0
>>> #> Vcells 3752538 28.77930384  60.6  7930356  60.6
>>>
>>> More precisely, you probably want to represent each connection as a
>>> separate environment, with its own finalizer. Hope this helps,
>>> Gabor
>>>
>>> On Sun, Oct 26, 2014 at 9:49 PM, Murat Tasan  wrote:
>>>> Hi all, I have a question about finalizers...
>>>> I have a package that manages state for a few connections, and I'd
>>>> like to ensure that these connections are 'cleanly' closed upon either
>>>> (i) R quitting or (ii) an unloading of the package.
>&g

Re: [Rd] proper use of reg.finalizer to close connections

2014-10-26 Thread Murat Tasan
Ah, good point, I hadn't thought of that detail.
Would moving reg.finalizer back outside of .onLoad and hooking it to the
package's environment itself work (more safely)?
Something like:
finalizerFunction <- ## cleanup code
reg.finalizer(parent.env(), finalizerFunction)

-m
 On Oct 26, 2014 11:03 PM, "Henrik Bengtsson"  wrote:

> On Sun, Oct 26, 2014 at 8:14 PM, Murat Tasan  wrote:
> > Ah (again)!
> > Even with my fumbling presentation of the issue, you gave me the hint
> > that solved it, thanks!
> >
> > Yes, the reg.finalizer call needs to be wrapped in an .onLoad hook so
> > it's not called once during package installation and then never again.
> > And once I switched to using ls() (instead of names()), everything
> > works as expected.
> >
> > So, the package code effectively looks like so:
> >
> > .CONNS <- new.env(parent = emptyenv())
> > .onLoad <- function(libname, pkgname) {
> > reg.finalizer(.CONNS, function(x) sapply(ls(x), .disconnect))
> > }
> > .disconnect <- function(x) {
> > ## handle disconnection of .CONNS[[x]] here
> > }
>
> In your example above, I would be concerned about what happens if you
> detach/unload your package, because then you're finalizer is still
> registered and will be called whenever '.CONNS' is being garbage
> collector (or there after).  However, the finalizer function calls
> .disconnect(), which is no longer available.
>
> Finalizers should be used with great care, because you're not in
> control in what order things are occurring and what "resources" are
> around when the finalizer function is eventually called and when it is
> called.  I've been bitten by this a few times and it can be very hard
> to reproduce and troubleshoot such bugs.  See also the 'Note' of
> ?reg.finalizer.
>
> My $.02
>
> /Henrik
>
> >
> > Cheers and thanks!
> >
> > -m
> >
> >
> >
> >
> > On Sun, Oct 26, 2014 at 8:53 PM, Gábor Csárdi 
> wrote:
> >> Well, to be honest I don't understand fully what you are trying to do.
> >> If you want to run code when the package is detached or when it is
> >> unloaded, then use a hook:
> >> http://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Load-hooks
> >>
> >> If you want to run code when an object is freed, then use a finalizer.
> >>
> >> Note that when you install a package, R runs all the code in the
> >> package and only stores the results of the code in the installed
> >> package. So if you create an object outside of a function in your
> >> package, then only the object will be stored in the package, but not
> >> the code that creates it. The object will be simply loaded when you
> >> load the package, but it will not be re-created.
> >>
> >> Now, I am not sure what happens if you set the finalizer on such an
> >> object in the package. I can imagine that the finalizer will not be
> >> saved into the package, and is only used once, when
> >> building/installing the package. In this case you'll need to set the
> >> finalizer in .onLoad().
> >>
> >> Gabor
> >>
> >> On Sun, Oct 26, 2014 at 10:35 PM, Murat Tasan  wrote:
> >>> Ah, thanks for the ls() vs names() tip!
> >>> (But sadly, it didn't solve the issue... )
> >>>
> >>> So, after some more tinkering, I believe the finalizer is being called
> >>> _sometimes_.
> >>> I changed the reg.finalizer(...) call to just this:
> >>>
> >>> reg.finalizer(.CONNS, function(x) print("foo"), onexit  = TRUE)
> >>>
> >>> Now, when I load the package and detach(..., unload = TRUE), nothing
> prints.
> >>> And when I quit, nothing prints.
> >>>
> >>> If I, however, create an environment on the workspace, like so:
> >>>> e <- new.env(parent = emptyenv())
> >>>> reg.finalizer(e, function(x) print("bar"), onexit = TRUE)
> >>> When I quit (or rm(e)), "bar" is printed.
> >>> But no "foo" (corresponding to same sequence of code, just in the
> >>> package instead).
> >>>
> >>> BUT(!), when I _install_ the package, "foo" is printed at the end of
> >>> the "**testing if installed package can be loaded" installation
> >>> segment.
> >>> So, somehow the R script that tests for package loading/unloading is
> >>> triggering the finalizer

Re: [Rd] proper use of reg.finalizer to close connections

2014-10-27 Thread Murat Tasan
Eh, after some flailing, I think I solved it.
I _think_ this pattern should guarantee that the finalizer function is
still present when needed:

.STATE_CONTAINER <- new.env(parent = emptyenv())
.STATE_CONTAINER$some_state_variable <- ## some code
.STATE_CONTAINER$some_other_state_variable <- ## some code

.myFinalizer <- function(name_of_state_variable_to_clean_up)

.onLoad <- function(libname, pkgname) {
reg.finalizer(
e = parent.env(environment()),
f = function(env) sapply(ls(env$.STATE_CONTAINER), .myFinalizer),
onexit = TRUE)
}

This way, the finalizer is registered on the enclosing environment of
the .onLoad function, which should be the package environment itself.
And that means .myFinalizer should still be around when it's called
during q() or unload/gc().
Effectively, the finalizer is tied to the entire package, rather than
the state variable container(s), which might not be the most elegant
solution, but it should work well enough for most purposes.

Cheers and thanks for the advice,

-m

On Mon, Oct 27, 2014 at 12:18 AM, Murat Tasan  wrote:
> Ah, good point, I hadn't thought of that detail.
> Would moving reg.finalizer back outside of .onLoad and hooking it to the
> package's environment itself work (more safely)?
> Something like:
> finalizerFunction <- ## cleanup code
> reg.finalizer(parent.env(), finalizerFunction)
>
> -m
>
> On Oct 26, 2014 11:03 PM, "Henrik Bengtsson"  wrote:
>>
>> On Sun, Oct 26, 2014 at 8:14 PM, Murat Tasan  wrote:
>> > Ah (again)!
>> > Even with my fumbling presentation of the issue, you gave me the hint
>> > that solved it, thanks!
>> >
>> > Yes, the reg.finalizer call needs to be wrapped in an .onLoad hook so
>> > it's not called once during package installation and then never again.
>> > And once I switched to using ls() (instead of names()), everything
>> > works as expected.
>> >
>> > So, the package code effectively looks like so:
>> >
>> > .CONNS <- new.env(parent = emptyenv())
>> > .onLoad <- function(libname, pkgname) {
>> > reg.finalizer(.CONNS, function(x) sapply(ls(x), .disconnect))
>> > }
>> > .disconnect <- function(x) {
>> > ## handle disconnection of .CONNS[[x]] here
>> > }
>>
>> In your example above, I would be concerned about what happens if you
>> detach/unload your package, because then you're finalizer is still
>> registered and will be called whenever '.CONNS' is being garbage
>> collector (or there after).  However, the finalizer function calls
>> .disconnect(), which is no longer available.
>>
>> Finalizers should be used with great care, because you're not in
>> control in what order things are occurring and what "resources" are
>> around when the finalizer function is eventually called and when it is
>> called.  I've been bitten by this a few times and it can be very hard
>> to reproduce and troubleshoot such bugs.  See also the 'Note' of
>> ?reg.finalizer.
>>
>> My $.02
>>
>> /Henrik
>>
>> >
>> > Cheers and thanks!
>> >
>> > -m
>> >
>> >
>> >
>> >
>> > On Sun, Oct 26, 2014 at 8:53 PM, Gábor Csárdi 
>> > wrote:
>> >> Well, to be honest I don't understand fully what you are trying to do.
>> >> If you want to run code when the package is detached or when it is
>> >> unloaded, then use a hook:
>> >> http://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Load-hooks
>> >>
>> >> If you want to run code when an object is freed, then use a finalizer.
>> >>
>> >> Note that when you install a package, R runs all the code in the
>> >> package and only stores the results of the code in the installed
>> >> package. So if you create an object outside of a function in your
>> >> package, then only the object will be stored in the package, but not
>> >> the code that creates it. The object will be simply loaded when you
>> >> load the package, but it will not be re-created.
>> >>
>> >> Now, I am not sure what happens if you set the finalizer on such an
>> >> object in the package. I can imagine that the finalizer will not be
>> >> saved into the package, and is only used once, when
>> >> building/installing the package. In this case you'll need to set the
>> >> finalizer in .onLoad().
>> >>
>> >> Gabor
>> >>
>> >> On Sun, Oct 26, 2014 at 10:35 

Re: [Rd] proper use of reg.finalizer to close connections

2014-10-27 Thread Murat Tasan
yup... for context, the finalizer code calls functions from packages
that are imported by my package.
so, i think (unless something else has gone seriously wrong), those
imported namespaces should still be available prior to my package's
unloading.
(and if imported namespaces are detached prior to the dependent
package's unloading, well, then, perhaps i'll just re-write all of
this in .)

thanks again!

-m

On Mon, Oct 27, 2014 at 11:27 AM, Henrik Bengtsson  
wrote:
> ...and don't forget to make sure all the function that .myFinalizer()
> calls are also around. /Henrik
>
> On Mon, Oct 27, 2014 at 10:10 AM, Murat Tasan  wrote:
>> Eh, after some flailing, I think I solved it.
>> I _think_ this pattern should guarantee that the finalizer function is
>> still present when needed:
>>
>> .STATE_CONTAINER <- new.env(parent = emptyenv())
>> .STATE_CONTAINER$some_state_variable <- ## some code
>> .STATE_CONTAINER$some_other_state_variable <- ## some code
>>
>> .myFinalizer <- function(name_of_state_variable_to_clean_up)
>>
>> .onLoad <- function(libname, pkgname) {
>> reg.finalizer(
>> e = parent.env(environment()),
>> f = function(env) sapply(ls(env$.STATE_CONTAINER), .myFinalizer),
>> onexit = TRUE)
>> }
>>
>> This way, the finalizer is registered on the enclosing environment of
>> the .onLoad function, which should be the package environment itself.
>> And that means .myFinalizer should still be around when it's called
>> during q() or unload/gc().
>> Effectively, the finalizer is tied to the entire package, rather than
>> the state variable container(s), which might not be the most elegant
>> solution, but it should work well enough for most purposes.
>>
>> Cheers and thanks for the advice,
>>
>> -m
>>
>> On Mon, Oct 27, 2014 at 12:18 AM, Murat Tasan  wrote:
>>> Ah, good point, I hadn't thought of that detail.
>>> Would moving reg.finalizer back outside of .onLoad and hooking it to the
>>> package's environment itself work (more safely)?
>>> Something like:
>>> finalizerFunction <- ## cleanup code
>>> reg.finalizer(parent.env(), finalizerFunction)
>>>
>>> -m
>>>
>>> On Oct 26, 2014 11:03 PM, "Henrik Bengtsson"  wrote:
>>>>
>>>> On Sun, Oct 26, 2014 at 8:14 PM, Murat Tasan  wrote:
>>>> > Ah (again)!
>>>> > Even with my fumbling presentation of the issue, you gave me the hint
>>>> > that solved it, thanks!
>>>> >
>>>> > Yes, the reg.finalizer call needs to be wrapped in an .onLoad hook so
>>>> > it's not called once during package installation and then never again.
>>>> > And once I switched to using ls() (instead of names()), everything
>>>> > works as expected.
>>>> >
>>>> > So, the package code effectively looks like so:
>>>> >
>>>> > .CONNS <- new.env(parent = emptyenv())
>>>> > .onLoad <- function(libname, pkgname) {
>>>> > reg.finalizer(.CONNS, function(x) sapply(ls(x), .disconnect))
>>>> > }
>>>> > .disconnect <- function(x) {
>>>> > ## handle disconnection of .CONNS[[x]] here
>>>> > }
>>>>
>>>> In your example above, I would be concerned about what happens if you
>>>> detach/unload your package, because then you're finalizer is still
>>>> registered and will be called whenever '.CONNS' is being garbage
>>>> collector (or there after).  However, the finalizer function calls
>>>> .disconnect(), which is no longer available.
>>>>
>>>> Finalizers should be used with great care, because you're not in
>>>> control in what order things are occurring and what "resources" are
>>>> around when the finalizer function is eventually called and when it is
>>>> called.  I've been bitten by this a few times and it can be very hard
>>>> to reproduce and troubleshoot such bugs.  See also the 'Note' of
>>>> ?reg.finalizer.
>>>>
>>>> My $.02
>>>>
>>>> /Henrik
>>>>
>>>> >
>>>> > Cheers and thanks!
>>>> >
>>>> > -m
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Sun, Oct 26, 2014 at 8:53 PM, Gábor Csárdi 
>>>> > wrote:
>>>> >&g

[Rd] common base functions stripping S3 class

2014-11-16 Thread Murat Tasan
Hi all --- this is less a specific question and more general regarding
S3 classes.
I've noticed that quite a few very common default implementations of
generic functions (e.g. `unique`, `[`, `as.data.frame`) strip away
class information.
In some cases, it appears conditionals have been created to re-assign
the class, but only for a few special types.
For example, in `unique.default`, if the argument inherits (_only_)
from "POSIXct" or "Date", the initial class is re-assigned to the
returned object.
But for any other custom S3 classes, it means we have to catch these
frequent cases and write a lot relatively plain wrappers, e.g.:

unique.MyClass <- function(x, incomparables = FALSE, ...) {
structure(unique(unclass(x)), class = class(x))
}

It's certainly nice to be able to create a very simple wrapper class
on a base type, so that we can override common functions like plot(x).
(An example is a simple class attribute that dictates a particular
plot style for a vector of integers.)
But it would be even nicer to not have to detect and override all the
un-class events that occur when manipulating these objects with
everyday functions, e.g. when adding that 'classed' integer vector to
a data frame.

Apart from moving to S4 classes, how have most dealt with this?
Might there be a list of common functions for which the default
implementation strips class information?
(Such a list could be a handy "consider overriding _this_" guide for
implementors of any new classes.)

Cheers and thanks for any tips!

-murat

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] order(..., na.last = NA) performance hit

2015-01-19 Thread Murat Tasan
I've just recently noticed that using the na.last = NA setting with
order incurs a HUGE performance hit.
It appears that much of order(...) (the R wrapper, not the internal
calls) is written in as general a manner as possible to handle the
large number of input types.
But the canonical case of ordering a single vector of numerics suffers
greatly with the current implementation.
Below is a single trivial example, but overall I've been noticing
somewhere on the order of a 10X performance hit when using na.last =
NA.
Would it be worth (i) attempting a re-write of the wrapping order(...)
function, or (ii) at least mentioning the performance implications in
the help page for order(...)?

Here's an example of the performance hit:

x <- runif(1e6)
x[runif(1e6) > 0.9] <- NA ## add some (~10%) NA values
order2 <- function(x) {
iix <- order(x, na.last = TRUE)
iix[!is.na(x[iix])]
}

system.time(y1 <- order(x, na.last = TRUE))
##user  system elapsed
##0.480.000.48

system.time(y2 <- order(x, na.last = NA))
##user  system elapsed
##   3.060   0.056   3.118

system.time(y3 <- order2(x))
##user  system elapsed
##   0.520   0.004   0.520

all(y2 == y3)
## [1] TRUE
identical(y2, y3)
## [1] TRUE


Cheers,

-murat

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel