[Rd] modifying a persistent (reference) class

2014-08-01 Thread Ross Boylan
I saved objects that were defined using several reference classes.
Later I modified the definition of reference classes a bit, creating new
functions and deleting old ones.  The total number of functions did not
change.  When I read them back I could only access some of the original
data.

I asked on the user list and someone suggested sticking with the old
class definitions, creating new classes, reading in the old data, and
converting it to the new classes.  This would be awkward (I want the
"new" classes to have the same name as the "old" ones), and I can
probably just leave the old definitions and define the new functions I
need outside of the reference classes.

Are there any better alternatives?

On reflection, it's a little surprising that changing the code for a
reference class makes any difference to an existing instance, since all
the function definitions seem to be attached to the instance.  One
problem I've had in the past was precisely that redefining a method in a
reference class did not change the behavior of existing instances.  So
I've tried to follow the advice to keep the methods light-weight.

In this case I was trying to move from a show method (that just printed)
to a summary method that returned a summary object.  So I wanted to add
a summary method and redefine the show to call summary in the base
class, removing all the subclass definitions of show.

Regular S4 classes are obviously not as sensitive since they usually
don't include the functions that operate on them, but I suppose if you
changed the slots you'd be in similar trouble.

Some systems keep track of versions of class definitions and allow one
to write code to migrate old to new forms automatically when the data
are read in.  Does R have anything like that?

The system on which I encountered the problems was running R 2.15.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] modifying a persistent (reference) class

2014-08-01 Thread Brian Lee Yung Rowe
Ross,

This is generally a hard problem in software systems. The only language I know 
that explicitly addresses it is Erlang. Ultimately you need a system upgrade 
process, which defines how to update the data in your system to match a new 
version of the system. You could do this by writing a script that 
1) loads the old version of your library
2) loads your data/serialized reference classes
3) exports data to some intermediate format (eg a list)
4) loads new version of library
5) imports data from intermediate format

Once you've gone through the upgrade process, arguably it's better to persist 
the data in a format that is decoupled from your objects since then future 
upgrades would simply be
1) load new library
2) import data from intermediate format

which is no different from day-to-day operation of your app/system (ie you're 
always writing to and reading from the intermediate format). 

Warm regards,
Brian

•••••
Brian Lee Yung Rowe
Founder, Zato Novo
Professor, M.S. Data Analytics, CUNY

> On Aug 1, 2014, at 1:54 PM, Ross Boylan  wrote:
> 
> I saved objects that were defined using several reference classes.
> Later I modified the definition of reference classes a bit, creating new
> functions and deleting old ones.  The total number of functions did not
> change.  When I read them back I could only access some of the original
> data.
> 
> I asked on the user list and someone suggested sticking with the old
> class definitions, creating new classes, reading in the old data, and
> converting it to the new classes.  This would be awkward (I want the
> "new" classes to have the same name as the "old" ones), and I can
> probably just leave the old definitions and define the new functions I
> need outside of the reference classes.
> 
> Are there any better alternatives?
> 
> On reflection, it's a little surprising that changing the code for a
> reference class makes any difference to an existing instance, since all
> the function definitions seem to be attached to the instance.  One
> problem I've had in the past was precisely that redefining a method in a
> reference class did not change the behavior of existing instances.  So
> I've tried to follow the advice to keep the methods light-weight.
> 
> In this case I was trying to move from a show method (that just printed)
> to a summary method that returned a summary object.  So I wanted to add
> a summary method and redefine the show to call summary in the base
> class, removing all the subclass definitions of show.
> 
> Regular S4 classes are obviously not as sensitive since they usually
> don't include the functions that operate on them, but I suppose if you
> changed the slots you'd be in similar trouble.
> 
> Some systems keep track of versions of class definitions and allow one
> to write code to migrate old to new forms automatically when the data
> are read in.  Does R have anything like that?
> 
> The system on which I encountered the problems was running R 2.15.
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] modifying a persistent (reference) class

2014-08-01 Thread Ross Boylan
On Fri, 2014-08-01 at 14:42 -0400, Brian Lee Yung Rowe wrote:
> Ross,
> 
> 
> This is generally a hard problem in software systems. The only
> language I know that explicitly addresses it is Erlang. Ultimately you
> need a system upgrade process, which defines how to update the data in
> your system to match a new version of the system. You could do this by
> writing a script that 
> 1) loads the old version of your library
> 2) loads your data/serialized reference classes
> 3) exports data to some intermediate format (eg a list)
> 4) loads new version of library
> 5) imports data from intermediate format
My recollection is that in Gemstone's smalltalk database you can define
methods associated with a class that describe how to change an instance
from one version to another.  You also have the choice of upgrading all
persistent objects at once or doing so lazily, i.e., as they are
retrieved.

The brittleness of the representation depends partly on the details.  If
a class has 2 slots, a and b, and the only thing on disk is the contents
of a and the contents of b, almost any change will screw things up.
However, if the slot name is persisted with the instance it's much
easier to reconstruct the instance of the class changes (if slot c is
added and not on disk, set it to nil; if b is removed, throw it out when
reading from disk).  Once could also persist the class definition, or
key elements of it, with individual instances referring to the
definition.

I don't know which, if any of these strategies, R uses for reference or
other classes.
> 
> 
> Once you've gone through the upgrade process, arguably it's better to
> persist the data in a format that is decoupled from your objects since
> then future upgrades would simply be
> 1) load new library
> 2) import data from intermediate format
Arguably :)  As I said, some representations could do this
automatically.  And there are still issues such as a change in the type
of a slot, or rules for filling new slots, that would require
intervention.

In my experience with other object systems, usually methods are
attributes of the class.  For R reference classes they appear to be
attributes of the instance, potentially modifiable on a per-instance
basis.

Ross
> 
> 
> which is no different from day-to-day operation of your app/system (ie
> you're always writing to and reading from the intermediate format). 
> 
> 
> Warm regards,
> Brian
> 
> •
> Brian Lee Yung Rowe
> Founder, Zato Novo
> Professor, M.S. Data Analytics, CUNY
> 
> On Aug 1, 2014, at 1:54 PM, Ross Boylan  wrote:
> 
> 
> > I saved objects that were defined using several reference classes.
> > Later I modified the definition of reference classes a bit, creating
> > new
> > functions and deleting old ones.  The total number of functions did
> > not
> > change.  When I read them back I could only access some of the
> > original
> > data.
> > 
> > I asked on the user list and someone suggested sticking with the old
> > class definitions, creating new classes, reading in the old data,
> > and
> > converting it to the new classes.  This would be awkward (I want the
> > "new" classes to have the same name as the "old" ones), and I can
> > probably just leave the old definitions and define the new functions
> > I
> > need outside of the reference classes.
> > 
> > Are there any better alternatives?
> > 
> > On reflection, it's a little surprising that changing the code for a
> > reference class makes any difference to an existing instance, since
> > all
> > the function definitions seem to be attached to the instance.  One
> > problem I've had in the past was precisely that redefining a method
> > in a
> > reference class did not change the behavior of existing instances.
> >  So
> > I've tried to follow the advice to keep the methods light-weight.
> > 
> > In this case I was trying to move from a show method (that just
> > printed)
> > to a summary method that returned a summary object.  So I wanted to
> > add
> > a summary method and redefine the show to call summary in the base
> > class, removing all the subclass definitions of show.
> > 
> > Regular S4 classes are obviously not as sensitive since they usually
> > don't include the functions that operate on them, but I suppose if
> > you
> > changed the slots you'd be in similar trouble.
> > 
> > Some systems keep track of versions of class definitions and allow
> > one
> > to write code to migrate old to new forms automatically when the
> > data
> > are read in.  Does R have anything like that?
> > 
> > The system on which I encountered the problems was running R 2.15.
> > 
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> > 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] modifying a persistent (reference) class

2014-08-01 Thread Brian Lee Yung Rowe
Ross,

Ah I didn't think about Smalltalk. Doesn't surprise me that they supported 
upgrades of this sort. That aside I think the question is whether it's 
realistic for a language like R to support such a mechanism automatically. 
Smalltalk and Erlang both have tight semantics that would be hard to establish 
in R (given the multiple object systems and dispatching systems). 

I'm a functional guy so to me it's natural to separate the data from the 
functions/methods. Having spent years writing OOP code I walked away concluding 
that OOP makes things more complicated for the sake of being OOP (eg no first 
class functions). Obviously that's changing, and in a language like R it's less 
of an issue. However, something like object serialization smells suspiciously 
similar. If you know that serializing objects is brittle, why not look for an 
alternative approach as opposed to chasing that rainbow?

Warm regards,
Brian

•••••
Brian Lee Yung Rowe
Founder, Zato Novo
Professor, M.S. Data Analytics, CUNY

> On Aug 1, 2014, at 3:33 PM, Ross Boylan  wrote:
> 
>> On Fri, 2014-08-01 at 14:42 -0400, Brian Lee Yung Rowe wrote:
>> Ross,
>> 
>> 
>> This is generally a hard problem in software systems. The only
>> language I know that explicitly addresses it is Erlang. Ultimately you
>> need a system upgrade process, which defines how to update the data in
>> your system to match a new version of the system. You could do this by
>> writing a script that 
>> 1) loads the old version of your library
>> 2) loads your data/serialized reference classes
>> 3) exports data to some intermediate format (eg a list)
>> 4) loads new version of library
>> 5) imports data from intermediate format
> My recollection is that in Gemstone's smalltalk database you can define
> methods associated with a class that describe how to change an instance
> from one version to another.  You also have the choice of upgrading all
> persistent objects at once or doing so lazily, i.e., as they are
> retrieved.
> 
> The brittleness of the representation depends partly on the details.  If
> a class has 2 slots, a and b, and the only thing on disk is the contents
> of a and the contents of b, almost any change will screw things up.
> However, if the slot name is persisted with the instance it's much
> easier to reconstruct the instance of the class changes (if slot c is
> added and not on disk, set it to nil; if b is removed, throw it out when
> reading from disk).  Once could also persist the class definition, or
> key elements of it, with individual instances referring to the
> definition.
> 
> I don't know which, if any of these strategies, R uses for reference or
> other classes.
>> 
>> 
>> Once you've gone through the upgrade process, arguably it's better to
>> persist the data in a format that is decoupled from your objects since
>> then future upgrades would simply be
>> 1) load new library
>> 2) import data from intermediate format
> Arguably :)  As I said, some representations could do this
> automatically.  And there are still issues such as a change in the type
> of a slot, or rules for filling new slots, that would require
> intervention.
> 
> In my experience with other object systems, usually methods are
> attributes of the class.  For R reference classes they appear to be
> attributes of the instance, potentially modifiable on a per-instance
> basis.
> 
> Ross
>> 
>> 
>> which is no different from day-to-day operation of your app/system (ie
>> you're always writing to and reading from the intermediate format). 
>> 
>> 
>> Warm regards,
>> Brian
>> 
>> •••••
>> Brian Lee Yung Rowe
>> Founder, Zato Novo
>> Professor, M.S. Data Analytics, CUNY
>> 
>> On Aug 1, 2014, at 1:54 PM, Ross Boylan  wrote:
>> 
>> 
>>> I saved objects that were defined using several reference classes.
>>> Later I modified the definition of reference classes a bit, creating
>>> new
>>> functions and deleting old ones.  The total number of functions did
>>> not
>>> change.  When I read them back I could only access some of the
>>> original
>>> data.
>>> 
>>> I asked on the user list and someone suggested sticking with the old
>>> class definitions, creating new classes, reading in the old data,
>>> and
>>> converting it to the new classes.  This would be awkward (I want the
>>> "new" classes to have the same name as the "old" ones), and I can
>>> probably just leave the old definitions and define the new functions
>>> I
>>> need outside of the reference classes.
>>> 
>>> Are there any better alternatives?
>>> 
>>> On reflection, it's a little surprising that changing the code for a
>>> reference class makes any difference to an existing instance, since
>>> all
>>> the function definitions seem to be attached to the instance.  One
>>> problem I've had in the past was precisely that redefining a method
>>> in a
>>> reference class did not change the behavior of existing instances.
>>> So
>>> I've tried to follow the advice to keep the methods light-weight

Re: [Rd] modifying a persistent (reference) class

2014-08-01 Thread Ross Boylan
On Fri, 2014-08-01 at 16:06 -0400, Brian Lee Yung Rowe wrote:
> Ross,
> 
> 
> Ah I didn't think about Smalltalk. Doesn't surprise me that they
> supported upgrades of this sort. That aside I think the question is
> whether it's realistic for a language like R to support such a
> mechanism automatically. Smalltalk and Erlang both have tight
> semantics that would be hard to establish in R (given the multiple
> object systems and dispatching systems). 
> 
> 
> I'm a functional guy so to me it's natural to separate the data from
> the functions/methods. Having spent years writing OOP code I walked
> away concluding that OOP makes things more complicated for the sake of
> being OOP (eg no first class functions). 
In smalltalk everything is an object, and that includes functions,
including class methods.
> Obviously that's changing, and in a language like R it's less of an
> issue. However, something like object serialization smells
> suspiciously similar. If you know that serializing objects is brittle,
> why not look for an alternative approach as opposed to chasing that
> rainbow?
My immediate problem is/was that I have serialized objects representing
weeks of CPU time.  I have to work with them, not some other
representation they might have.  And it's much more natural to work with
R's native persistence than some other scheme I cook up.

I think persistence requires serialization.  The serialization can be
more or less brittle, but I don't think there is an alternative to
serialization.

Since I just worked around my immediate problem a few minutes ago (by
retaining the original class definitions and using setMethod to create
summary methods), my interests are a bit more theoretical.

First, I'd like to understand more about exactly what is saved to disk
for reference and other classes, in particular how much meta-information
they contain.  And my mental model for reference class persistence is
clearly wrong, because in that model instances based on old definitions
come back intact (albeit not with the new method definitions or other
new slots), whereas mine seemed to come back damaged.

Second, I'm still hoping for some elegant way around this problem (how
to redefine classes and still use saved versions from older definitions)
for the future, both with reference and regular classes.  Or at least
some rules about what changes, if any, are safe to make in class
definitions after an instance has been persisted.
> 
Third, if changes to R could make things better, I'm hoping some
developers might take them up.  I realize that is unlikely to happen,
for many good reasons, but I can still hope :)

Ross
> 
> Warm regards,
> Brian
> 
> •
> Brian Lee Yung Rowe
> Founder, Zato Novo
> Professor, M.S. Data Analytics, CUNY
> 
> On Aug 1, 2014, at 3:33 PM, Ross Boylan  wrote:
> 
> 
> > On Fri, 2014-08-01 at 14:42 -0400, Brian Lee Yung Rowe wrote:
> > > Ross,
> > > 
> > > 
> > > This is generally a hard problem in software systems. The only
> > > language I know that explicitly addresses it is Erlang. Ultimately
> > > you
> > > need a system upgrade process, which defines how to update the
> > > data in
> > > your system to match a new version of the system. You could do
> > > this by
> > > writing a script that 
> > > 1) loads the old version of your library
> > > 2) loads your data/serialized reference classes
> > > 3) exports data to some intermediate format (eg a list)
> > > 4) loads new version of library
> > > 5) imports data from intermediate format
> > My recollection is that in Gemstone's smalltalk database you can
> > define
> > methods associated with a class that describe how to change an
> > instance
> > from one version to another.  You also have the choice of upgrading
> > all
> > persistent objects at once or doing so lazily, i.e., as they are
> > retrieved.
> > 
> > The brittleness of the representation depends partly on the
> > details.  If
> > a class has 2 slots, a and b, and the only thing on disk is the
> > contents
> > of a and the contents of b, almost any change will screw things up.
> > However, if the slot name is persisted with the instance it's much
> > easier to reconstruct the instance of the class changes (if slot c
> > is
> > added and not on disk, set it to nil; if b is removed, throw it out
> > when
> > reading from disk).  Once could also persist the class definition,
> > or
> > key elements of it, with individual instances referring to the
> > definition.
> > 
> > I don't know which, if any of these strategies, R uses for reference
> > or
> > other classes.
> > > 
> > > 
> > > Once you've gone through the upgrade process, arguably it's better
> > > to
> > > persist the data in a format that is decoupled from your objects
> > > since
> > > then future upgrades would simply be
> > > 1) load new library
> > > 2) import data from intermediate format
> > Arguably :)  As I said, some representations could do this
> > automatically.  And there are still issues such as a change in the
> 

Re: [Rd] modifying a persistent (reference) class

2014-08-01 Thread Gábor Csárdi
On Fri, Aug 1, 2014 at 4:47 PM, Ross Boylan  wrote:
[...]
> First, I'd like to understand more about exactly what is saved to disk
> for reference and other classes, in particular how much meta-information
> they contain.  And my mental model for reference class persistence is
> clearly wrong, because in that model instances based on old definitions
> come back intact (albeit not with the new method definitions or other
> new slots), whereas mine seemed to come back damaged.
>
> Second, I'm still hoping for some elegant way around this problem (how
> to redefine classes and still use saved versions from older definitions)
> for the future, both with reference and regular classes.  Or at least
> some rules about what changes, if any, are safe to make in class
> definitions after an instance has been persisted.
>>
> Third, if changes to R could make things better, I'm hoping some
> developers might take them up.  I realize that is unlikely to happen,
> for many good reasons, but I can still hope :)

I believe that the brand new R6 class system can do this. I mean your
saved instances from old classes will be read back properly, with the
old methods. They are on CRAN and also here if you want to experiment:
https://github.com/wch/R6

Best,
Gabor

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] modifying a persistent (reference) class

2014-08-01 Thread Winston Chang
R6 objects are basically just environments, so they're probably pretty
simple to save and restore (I haven't tested it out, though).

-Winston

On Fri, Aug 1, 2014 at 4:00 PM, Gábor Csárdi  wrote:
> On Fri, Aug 1, 2014 at 4:47 PM, Ross Boylan  wrote:
> [...]
>> First, I'd like to understand more about exactly what is saved to disk
>> for reference and other classes, in particular how much meta-information
>> they contain.  And my mental model for reference class persistence is
>> clearly wrong, because in that model instances based on old definitions
>> come back intact (albeit not with the new method definitions or other
>> new slots), whereas mine seemed to come back damaged.
>>
>> Second, I'm still hoping for some elegant way around this problem (how
>> to redefine classes and still use saved versions from older definitions)
>> for the future, both with reference and regular classes.  Or at least
>> some rules about what changes, if any, are safe to make in class
>> definitions after an instance has been persisted.
>>>
>> Third, if changes to R could make things better, I'm hoping some
>> developers might take them up.  I realize that is unlikely to happen,
>> for many good reasons, but I can still hope :)
>
> I believe that the brand new R6 class system can do this. I mean your
> saved instances from old classes will be read back properly, with the
> old methods. They are on CRAN and also here if you want to experiment:
> https://github.com/wch/R6
>
> Best,
> Gabor
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel