[Python-Dev] On a new version of pickle [PEP 3154]: self-referential frozensets

2012-06-23 Thread M Stefan

Hello,

I'm one of this year's Google Summer of Code students working
on improving pickle by creating a new version. My name is Stefan and
my mentor is Alexandre Vassalotti.

If you're interested, you can monitor the progress in the dedicated
blog at [2] and the bitbucket repository at [3].

One of the goals for picklev4 is to add native opcodes for pickling
of sets and frozensets. Currently these 4 opcodes were added:
* EMPTY_SET, EMPTY_FROZENSET: push an empty set/frozenset in the stack
* UPDATE_SET: update the set in the stack with the top stack slice
stack before: ... pyset mark stackslice
stack after : ... pyset
effect: pyset.update(stackslice)   # inplace union
* UNION_FROZENSET: like UPDATE_SET, but create a new frozenset
stack before: ... pyfrozenset mark stackslice
stack after : ... pyfrozenset.union(stackslice)

While this design allows pickling of self-referential sets, self-referential
frozensets are still problematic. For instance, trying to pickle `fs':
a=A(); fs=frozenset([a]); a.fs = fs
(when unpickling, the object a has to be initialized before it is added to
 the frozenset)

The only way I can think of to make this work is to postpone
the initialization of all the objects inside the frozenset until after 
UNION_FROZENSET.
I believe this is doable, but there might be memory penalties if the 
approach
is to simply store all the initialization opcodes in memory until 
pickling the frozenset is finished.


Currently, pickle.dumps(fs,4) generates:
EMPTY_FROZENSET
BINPUT 0
MARK
BINGLOBAL_COMMON '0 A' # same as GLOBAL '__main__ A' in v3
EMPTY_TUPLE
NEWOBJ
EMPTY_DICT
SHORT_BINUNICODE 'fs'
BINGET 0 # retrieves the frozenset which is empty at this 
point, and it

 # will never be filled because it's immutable
SETITEM
BUILD   # a.__setstate__({'fs' : frozenset()})
UNION_FROZENSET
By postponing the initialization of a, it should instead generate:
EMPTY_FROZENSET
BINPUT 0
MARK
BINGLOBAL_COMMON '0 A' # same as GLOBAL '__main__ A' in v3
EMPTY_TUPLE
NEWOBJ # create the object but don't initialize its state yet
BINPUT 1
UNION_FROZENSET
BINGET 1
EMPTY_DICT
SHORT_BINUNICODE 'fs'
BINGET 0
SETITEM
BUILD
POP

While self-referential frozensets are uncommon, a far more problematic
situation is with the self-referential objects created with REDUCE. While
pickle uses the idea of creating empty collections and then filling them,
reduce tipically creates already-filled objects. For instance:
cnt = collections.Counter(); cnt[a]=3; a.cnt=cnt; cnt.__reduce__()
(, ({<__main__.A object at 0x0286E8F8>: 3},))
where the A object contains a reference to the counter. Unpickling an
object pickled with this reduce function is not possible, because the reduce
function, which "explains" how to create the object, is asking for the 
object

to exist before being created.
The fix here would be to pass Counter's dictionary in the state argument,
as opposed to the "constructor parameters" one, as follows:
(, (), {<__main__.A object at 0x0286E8F8>: 3})
When unpickling this, an empty Counter will be created first, and then
__setstate__ will be called to fill it, at which point self-references 
are allowed.

I assume this modification has to be done in the implementations of the data
structures rather than in pickle itself. Pickle could try to fix this by 
detecting

when reduce returns a class type as the first tuple arg and move the
dict ctor parameter to the state, but this may not always be intended.
It's also a bit strange that __getstate__ is never used anywhere in 
pickle directly.


I'm looking forward to hearing your suggestions and opinions in this matter.

Regards,
  Stefan

[1] http://www.python.org/dev/peps/pep-3154/
[2] http://pypickle4.wordpress.com/
[3] http://bitbucket.org/mstefanro/pickle4
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 0424: A method for exposing a length hint

2012-07-16 Thread M Stefan

On 7/16/2012 9:54 AM, Stefan Behnel wrote:

Mark Shannon, 15.07.2012 16:14:

Alex Gaynor wrote:

CPython currently defines an ``__length_hint__`` method on several types,
such
as various iterators. This method is then used by various other functions
(such as ``map``) to presize lists based on the estimated returned by

Don't use "map" as an example.
map returns an iterator so it doesn't need __length_hint__

Right. It's a good example for something else, though. As I mentioned
before, iterators should be able to propagate the length hint of an
underlying iterator, e.g. in generator expressions or map(). I consider
that an important feature that the protocol must support.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/mstefanro%40gmail.com


map() is quite problematic in this matter, and may actually
benefit from the existence of __length_hint__.
It is very easy to create an infinite loop currently by doing stuff like 
x=[1]; x+=map(str,x)

[61081 refs]
>>> x=[1]; x+=map(str,x)
Traceback (most recent call last):
  ...
MemoryError
[120959834 refs]
>>> len(x)
120898752

Obviously, this won't cause an infinite loop in Python2 where map is 
non-lazy.
Also, this won't work for all mutable containers, because not all of 
them permit

adding elements while iterating:
>>> s=set([1]); s.update(map(str,s))
Traceback (most recent call last):
  ...
RuntimeError: Set changed size during iteration
[61101 refs]
>>> s
{1, '1'}
[61101 refs]
>>> del s
[61099 refs]

If map objects were to disallow changing the size of the container
while iterating (I can't really think of an use-case in which such a
limitation would be harmful), it might as well be with __length_hint__.

Also, what would iter([1,2,3]).__length_hint__() return? 3 or unknown?
If 3, then the semantics of l=[1,2,3]; l += iter(l) will change 
(infinite loop

without __length_hint__ vs. list of 6 elements with __length_hint__).
If unknown, then it doesn't seem like there are very many places where
__length_hint__ can return anything but unknown.

Regards,
  Stefan M
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Unbinding of methods

2012-07-19 Thread M Stefan

Hey,

As part of pickle4, I found it interesting to add the possibility
of pickling bound functions (instance methods). This is done by
pickling f.__self__ and f.__func__ separately, and then adding
a BIND opcode to tie them together.

While this appears to work fine for python methods (non-builtin), some
issues arise with builtins. These are partly caused because
not all builtin function types support __func__, partly because
not all of them fill __module__ when they should and partly
because there are many (7) types a function can actually have:

ClassMethodDescriptorType = type(??)
BuiltinFunctionType = type(len)
FunctionType = type(f)
MethodType = type(A().f())
MethodDescriptorType = type(list.append)
WrapperDescriptorType = type(list.__add__)
MethodWrapperType = type([].__add__)

AllFunctionTypes = (ClassMethodDescriptorType, BuiltinFunctionType,
FunctionType, MethodType, MethodDescriptorType,
WrapperDescriptorType, MethodWrapperType)
repr(AllFunctionTypes) = (
,
, ,
, ,
, )

I have created a patch at [1], which adds __func__ to some other
function types, as well as:
 1) adds AllFunctionTypes etc. to Lib/types.py
 2) inspect.isanyfunction(), inspect.isanyboundfunction(),
inspect.isanyunboundfunction()
 3) functools.unbind
Note that I am not knowledgeable of cpython internals and therefore
the patch needs to be carefully reviewed.

Possible issues: Should classmethods be considered bound or unbound?
If cm is a classmethod, then should
cm.__func__.__self__ = cm.__self__ or cm.__func__.__self__ = None?
Currently does the latter:
>>> cm.__self__, hasattr(cm,'__self__'), hasattr(cm.__func__, 
'__self__')

(, True, False)
This requires treating classmethods separately when pickling,
so I'm not sure if this is ideal.

Let me know if I should have opened an issue instead. I look
forward to hearing your opinions/suggestions on this matter.

Regards,
  Stefan M

[1] https://gist.github.com/3145210
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unbinding of methods

2012-07-19 Thread M Stefan

On 7/19/2012 9:54 PM, Antoine Pitrou wrote:

On Thu, 19 Jul 2012 19:53:27 +0300
M Stefan  wrote:

Hey,

As part of pickle4, I found it interesting to add the possibility
of pickling bound functions (instance methods). This is done by
pickling f.__self__ and f.__func__ separately, and then adding
a BIND opcode to tie them together.

Instead of a specific opcode, can't you use a suitable __reduce__ magic
(or __getnewargs__, perhaps)? We want to limit the number of opcodes
except for performance-critical types (and I don't think bound methods
are performance-critical for the purpose of serialization).

Yes, I agree that doing it with __reduce__ would be better approach than
adding a new opcode, I'll consider switching.

I have created a patch at [1], which adds __func__ to some other
function types, as well as:
   1) adds AllFunctionTypes etc. to Lib/types.py
   2) inspect.isanyfunction(), inspect.isanyboundfunction(),
  inspect.isanyunboundfunction()
   3) functools.unbind

That sounds like a lot of changes if the goal is simply to make those
types picklable.

Regards

Antoine.


Indeed they are, I just thought there may be a chance this code would be 
used elsewhere
too. It's a bit weird that you can use inspect to check for certain 
types of functions but
not others, as well as be able to "unbind" certain types of methods but 
not others.

Admittedly, these changes have little use-case and are not a priority.


Yours,
 Stefan M

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com