[Python-Dev] Issue 10194 - Adding a gc.remap() function

2010-10-25 Thread Peter Ingebretson
I have a patch that adds a new function to the gc module.  The gc.remap() 
function uses the tp_traverse mechanism to find all references to any keys 
in a provided mapping, and remaps these references in-place to instead point 
to the value corresponding to each key.

The motivation for adding this method is to enable writing a module that 
provide an enhanced version of imp.reload.  The builtin reload function 
is very useful for iterating on a single module within the Python interpreter 
shell, but in more complex situations it very limited.

In particular, instances of classes declared in the reloaded module will 
continue to reference the old versions of the classes, and other modules 
that imported elements of the old module using the 'from ... import ...' 
syntax will continue to refer to the stale version of the functions or classes 
that they imported.

The gc.remap() function enables writing a new version of reload which uses 
imp.reload to reload a module and then replaces all references to stale objects 
from the old module to instead point to equivalent newly defined objects.  
This still has many limitations, for instance if an __init__ function has been 
changed the new __init__ will not be run on old instances.  On the other hand, 
in many cases this is sufficient to continue iterating on code without needing 
to restart the Python environment, which can be a significant time savings.

I initially tried to implement this reloading strategy entirely in Python using 
gc.getreferrers() to find references to objects defined in the old module, 
but I found it was too difficult to reliably replace references in objects once 
they had been found.  Since the GC already has a way to find all fields that 
refer to objects, it seemed fairly straightforward to extend that mechanism to 
additionally modify references.

This reloading strategy is documented in more detail here:

http://doublestar.org/in-place-python-reloading/

A potentially controversial aspect of this change is that the signature of the 
visitproc has been modified to take (PyObject **) as an argument instead of 
(PyObject *) so that a visitor can modify fields visited with Py_VISIT.  A few 
traverse functions in the standard library also had to be changed to use 
Py_VISIT on the actual members rather than on aliased pointers.

I also have a prototype of an enhanced reload function using gc.remap.  This 
is only a partial implementation of the proposal, in particular it does not 
rehash dictionaries that have been invalidated as a result of reloading, and 
it does not support custom __reload__ hooks.  A link to the code as well as 
some examples are here:

http://doublestar.org/python-hot-loading-prototype/

Please let me know if you have any feedback on the reloading proposal, the 
hot loading prototype, or on the patch.

Thanks,
Peter



  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issue 10194 - Adding a gc.remap() function

2010-10-26 Thread Peter Ingebretson
--- On Tue, 10/26/10, Hrvoje Niksic  wrote:

> What about objects that don't implement tp_traverse because
> they cannot take part in cycles?

A significant majority of objects that can hold references to other 
objects can take part in cycles and do implement tp_traverse.  My 
original thought was that modifying any references not visible to 
the cyclic GC would be out of the scope of gc.remap.

Even adding a 'tp_extended_traverse' method might not help solve 
this problem because untracked objects are not in any generation list, 
so there is no general way to find all of them.

> Changing immutable objects such as tuples and frozensets
> doesn't exactly sound appealing.

My original Python-only approach cloned immutable objects that 
referenced objects that were to be remapped, and then added the 
old and new immutable object to the mapping.  This worked well, 
although it was somewhat complicated because it had to happen in 
dependency order (e.g., to handle tuples of tuples in frozensets).

I thought about keeping this, but I am now convinced that as long 
as you are doing something as drastic as changing references in the 
heap you may as well change immutable objects.

The main argument is that preserving immutable objects increases the 
complexity of remapping and does not actually solve many problems.  
The primary reason for objects to be immutable is so that their 
comparison operators and hash value can remain consistent.  Changing, 
for example, the contents of a tuple that a dictionary key references 
has the same effect as changing the identity of the tuple -- both 
modify the hash value of the key and thus invalidate the dictionary.  
The full reload processs needs to rehash collections invalidated by 
hash values changing, so we might as well modify the contents of tuples.

> > the signature of visitproc has been modified to take (PyObject **) 
> > instead of (PyObject *) so that a visitor can modify fields
> > visited with Py_VISIT.
> 
> This sounds like a bad idea -- visitproc is not limited to
> visiting struct members.  Visited objects can be stored
> in data structures where their address cannot be directly
> obtained.
>
> If you want to go this route, rather create an extended
> visit procedure (visitchangeproc?) that accepts a function
> that can change the reference.  A convenience function
> or macro could implement this for the common case of struct
> member or PyObject**.

This is a compelling argument.  I considered adding an extended 
traverse / visit path, but decided against it after not finding 
any cases in the base distribution that required it.  The 
disadvantage of creating an additional method is that C types will 
have yet another method to implement for the gc (tp_traverse, 
tp_clear, and now tp_traverse_modify(?)).  On the other hand, you've 
convinced me that this is necessary in some cases, so it might as 
well be used in all of them.  Jon Parise also pointed out in a 
private communication that this eliminates the minor performance 
impact on tp_traverse, which is another advantage over my change.

If a 'tp_traverse_modify' function were added, many types could 
replace their custom tp_clear function with a generic method 
that makes use of (visitchangeproc), which somewhat mitigates adding 
another method.



  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issue 10194 - Adding a gc.remap() function

2010-10-26 Thread Peter Ingebretson
--- On Tue, 10/26/10, Benjamin Peterson  wrote:
> Is there any reason that you'd want to do this?
> > http://doublestar.org/python-hot-loading-prototype/

I have a relatively large application written in Python, and a 
specific use case where it will significantly increase our speed 
of iteration to be able to change and test modules without needing 
to restart the application.  We have experimented with different 
approaches to reloading and this one seems the most promising by 
a wide margin.

> Overall, I think this adds lots of backwards incompatible
> code for an obscure use-case that will cause subtle and 
> complicated bugs. So, -1.

Would you still object to the change if (visitproc), Py_VISIT and 
tp_traverse were reverted to their previous state, and a separate 
path was added for modifying references using (visitchangeproc), 
Py_VISIT_CHANGE, and tp_traverse_change?

Thanks,
Peter


  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issue 10194 - Adding a gc.remap() function

2010-10-26 Thread Peter Ingebretson
--- On Tue, 10/26/10, "Martin v. Löwis"  wrote:
> I think this then mandates a PEP; I'm -1 on the feature also. 

I am happy to write up a PEP for this feature.  I'll start that 
process now, though if anyone feels that this idea has no chance of 
acceptance please let me know.

Thanks,
Peter



  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issue 10194 - Adding a gc.remap() function

2010-10-26 Thread Peter Ingebretson
--- On Tue, 10/26/10, "Martin v. Löwis"  wrote:
>>> I think this then mandates a PEP; I'm -1 on the feature also. 
>> 
>> I am happy to write up a PEP for this feature.  I'll start that 
>> process now, though if anyone feels that this idea has no chance of 
>> acceptance please let me know.
> 
> If it could actually work in a reasonable way, I would be +0. If,
> as I think, it can't possibly work correctly, I'll be -1.
> 
> In this evaluation, I compare this to Smalltalk's
> Object>>#become:
> What you propose should have a similar effect, IMO, although
> it's probably not necessary to provide the two-way nature
> of become.

Thanks, I didn't know about Object>>#become until now but it 
is a perfect comparison.  The two-way nature of become appears to 
be due to the implementation detail of swapping two entries in a 
table, but the current spec for gc.remap can achieve the same effect
with:
>>> gc.remap({a:b, b:a})

Of course #become and gc.remap also share the same power and danger.

I'm retracting the patch in 10194 and will submit a new one later 
as part of the PEP that uses a parallel traverse mechanism.  Still, 
if you are concerned that this approach cannot work I encourage you 
to try out the patch associated with 10194 by playing around with 
gc.remap in the interpreter or looking at the unit tests.  I was 
surprised when I made the change initially by how little code was 
required and by how well it seemed to work in practice.

Thanks,
Peter



  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issue 10194 - Adding a gc.remap() function

2010-10-26 Thread Peter Ingebretson
--- On Tue, 10/26/10, Neil Schemenauer  wrote:
> > I am happy to write up a PEP for this feature. 
> > I'll start that process now, though if anyone 
> > feels that this idea has no chance of 
> > acceptance please let me know.
> 
> I think a feature that allows modules to be more
> reliability reloaded could be accepted.  Martin's 
> suggestion sounds like it could be useful.  I would 
> recommend trying to limit the scope of the  feature 
> and clearly define what it intends to achieve (e.g.
> use cases).

> The idea of replacing references does not seem to 
> have much hope, IMHO.

I agree that the important feature is module reloading, 
whether it is implemented via remapping references 
or by replacing the state of existing objects is an 
implementation detail.  I will try to keep the scope 
of the PEP focused, and if necessary I will split it 
up into two.

Thanks,
Peter


  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issue 10194 - Adding a gc.remap() function

2010-10-26 Thread Peter Ingebretson
--- On Tue, 10/26/10, P.J. Eby  wrote:
> If all you really want this for is reloading, it would
> probably make more sense to simply modify the existing class
> and function objects using the reloaded values as a
> template, then save the modified classes and functions back
> to the module.
> 
> Have you tried http://pypi.python.org/pypi/plone.reload
> or http://svn.python.org/projects/sandbox/trunk/xreload/xreload.py,
> or any other existing code reloaders, or tried extending
> them for your specific use case?
 
I've investigated several reloading frameworks, including the 
ones you mentions as well as http://code.google.com/p/reimport/ 
and http://code.google.com/p/livecoding/.

The approach of using the gc to remap references seemed to 
have the fewest overall limitations, but requiring C API changes 
is a big downside.  I'm going to have to do a more detailed 
comparison of the features offered by each approach.

--- On Tue, 10/26/10, exar...@twistedmatrix.com  
wrote:
> This can be implemented with ctypes right now (I half did
> it several years ago).
> 
> Jean-Paul

Is there a trick to doing it this way, or are you suggesting 
building a ctypes wrapper for each C type in the Python 
library, and then effectively reimplementing tp_traverse 
in Python?



  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com