Re: [Python-Dev] Idea: more compact, interned string key only dict for namespace.

2016-06-23 Thread INADA Naoki
I've checked time and maxrss of sphinx-build.

In case of sphinx,


## master

$ rm -rf build/
$ /usr/bin/time ~/local/python-master/bin/sphinx-build -b html -d
build/doctrees -D latex_paper_size=  . build/html -QN

71.76user 0.27system 1:12.06elapsed 99%CPU (0avgtext+0avgdata
176248maxresident)k
80inputs+202888outputs (2major+58234minor)pagefaults 0swaps

71.86user 0.28system 1:12.16elapsed 99%CPU (0avgtext+0avgdata
176312maxresident)k
0inputs+201480outputs (0major+59897minor)pagefaults 0swaps


## compact-dict w/ shared

$ rm -rf build/
$ /usr/bin/time ~/local/python-compact/bin/sphinx-build -b html -d
build/doctrees -D latex_paper_size=  . build/html -QN

72.18user 0.27system 1:12.47elapsed 99%CPU (0avgtext+0avgdata
158104maxresident)k
728inputs+200792outputs (0major+53409minor)pagefaults 0swaps

72.79user 0.30system 1:13.11elapsed 99%CPU (0avgtext+0avgdata
157916maxresident)k
0inputs+200792outputs (0major+54072minor)pagefaults 0swaps


## compact w/o shared key

(Only shared key removed.  No interned key only dict)

$ rm -rf build/
$ /usr/bin/time ~/local/python-intern/bin/sphinx-build -b html -d
build/doctrees -D latex_paper_size=  . build/html -QN

71.79user 0.34system 1:12.16elapsed 99%CPU (0avgtext+0avgdata
165884maxresident)k
480inputs+200792outputs (0major+56947minor)pagefaults 0swaps

71.84user 0.27system 1:12.13elapsed 99%CPU (0avgtext+0avgdata
166888maxresident)k
640inputs+200792outputs (5major+56834minor)pagefaults 0swaps


-- 
INADA Naoki  
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why are class dictionaries not accessible?

2016-06-23 Thread Random832
On Wed, Jun 22, 2016, at 11:11, Guido van Rossum wrote:
> This is done in order to force all mutations of the class dict to go
> through attribute assignments on the class. The latter takes care of
> updating the class struct, e.g. if you were to add an `__add__` method
> dynamically it would update tp_as_number->nb_add. If you could modify the
> dict object directly it would be more difficult to arrange for this side
> effect.

Why is this different from the fact that updating a normal object's dict
bypasses descriptors and any special logic in __setattr__? Dunder
methods are already "special" in the sense that you can't use them as
object attributes; I wouldn't be surprised by "assigning a dunder method
via the class's dict breaks things".
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP XXX: Compact ordered dict

2016-06-23 Thread Eric Snow
On Mon, Jun 20, 2016 at 11:02 PM, INADA Naoki  wrote:
> On Tue, Jun 21, 2016 at 12:17 PM, Oleg Broytman  wrote:
>> (if a PEP is needed at all)
>
> I don't think so. My PEP is not for changing Python Language,
> just describe implementation detail.
>
> Python 3.5 has new OrderedDict implemented in C without PEP.
> My patch is relatively small than it.  And the idea has been well known.

How about, for 3.6, target re-implementing OrderedDict using the
compact dict approach (and leave dict alone for now).  That way we
have an extra release cycle to iron out the kinks before switching
dict over for 3.7. :)

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why are class dictionaries not accessible?

2016-06-23 Thread Guido van Rossum
On Thu, Jun 23, 2016 at 8:01 AM, Random832  wrote:

> On Wed, Jun 22, 2016, at 11:11, Guido van Rossum wrote:
> > This is done in order to force all mutations of the class dict to go
> > through attribute assignments on the class. The latter takes care of
> > updating the class struct, e.g. if you were to add an `__add__` method
> > dynamically it would update tp_as_number->nb_add. If you could modify the
> > dict object directly it would be more difficult to arrange for this side
> > effect.
>
> Why is this different from the fact that updating a normal object's dict
> bypasses descriptors and any special logic in __setattr__? Dunder
> methods are already "special" in the sense that you can't use them as
> object attributes; I wouldn't be surprised by "assigning a dunder method
> via the class's dict breaks things".
>

It was a long time when I wrote this, but IIRC the breakage could express
itself as a segfault or other C-level crash due to some internal state
invariant of the type object being violated, not just an exception. The
existence of ctypes notwithstanding, we take C-level crashes very seriously.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP XXX: Compact ordered dict

2016-06-23 Thread INADA Naoki
On Fri, Jun 24, 2016 at 12:03 AM, Eric Snow  wrote:
> On Mon, Jun 20, 2016 at 11:02 PM, INADA Naoki  wrote:
>> On Tue, Jun 21, 2016 at 12:17 PM, Oleg Broytman  wrote:
>>> (if a PEP is needed at all)
>>
>> I don't think so. My PEP is not for changing Python Language,
>> just describe implementation detail.
>>
>> Python 3.5 has new OrderedDict implemented in C without PEP.
>> My patch is relatively small than it.  And the idea has been well known.
>
> How about, for 3.6, target re-implementing OrderedDict using the
> compact dict approach (and leave dict alone for now).  That way we
> have an extra release cycle to iron out the kinks before switching
> dict over for 3.7. :)
>
> -eric

I can't.  Since OrderedDict inherits dict.  OrderedDict implementation
based on dict
implementation.
Since I'm not expert of Python object system,  I don't know how to
separate OrderedDict
implementation from dict.


-- 
INADA Naoki  
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why are class dictionaries not accessible?

2016-06-23 Thread Devin Jeanpierre
On Thu, Jun 23, 2016 at 8:19 AM, Guido van Rossum  wrote:
>
> It was a long time when I wrote this, but IIRC the breakage could express
> itself as a segfault or other C-level crash due to some internal state
> invariant of the type object being violated, not just an exception. The
> existence of ctypes notwithstanding, we take C-level crashes very seriously.
>

Big digression: one can still obtain the dict if they really want to, even
without using ctypes. I suppose don't actually mutate it unless you want to
segfault.

>>> import gc
>>> class A(object): pass
>>> type(A.__dict__)

>>> type(gc.get_referents(A.__dict__)[0])

>>> gc.get_referents(A.__dict__)[0]['abc'] = 1
>>> A.abc
1
>>>

(One can also get it right from A, but A can have other references, so
maybe that's less reliable.)

I think I wanted this at the time so I could better measure the sizes of
objects. sys.getsizeof(A.__dict__) is very different
from sys.getsizeof(gc.get_referents(A.__dict__)[0]), and also different
from sys.getsizeof(A). For example:

>>> import gc
>>> class A(object): pass
>>> sys.getsizeof(A); sys.getsizeof(A.__dict__);
sys.getsizeof(gc.get_referents(A.__dict__)[0])
976
48
288
>>> for i in range(1): setattr(A, 'attr_%s' % i, i)
>>> sys.getsizeof(A); sys.getsizeof(A.__dict__);
sys.getsizeof(gc.get_referents(A.__dict__)[0])
976
48
393312

(Fortunately, if you want to walk the object graph to measure memory usage
per object type, you're probably going to be using gc.get_referents already
anyway, so this is just confirmation that you're getting what you want in
one corner case.)

-- Devin
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why are class dictionaries not accessible?

2016-06-23 Thread Guido van Rossum
"Er, among our chief weapons are fear, surprise, ctypes, gc, and fanatical
devotion to the Pope!"

On Thu, Jun 23, 2016 at 1:03 PM, Devin Jeanpierre 
wrote:

> On Thu, Jun 23, 2016 at 8:19 AM, Guido van Rossum 
>  wrote:
>>
>> It was a long time when I wrote this, but IIRC the breakage could express
>> itself as a segfault or other C-level crash due to some internal state
>> invariant of the type object being violated, not just an exception. The
>> existence of ctypes notwithstanding, we take C-level crashes very seriously.
>>
>
> Big digression: one can still obtain the dict if they really want to, even
> without using ctypes. I suppose don't actually mutate it unless you want to
> segfault.
>
> >>> import gc
> >>> class A(object): pass
> >>> type(A.__dict__)
> 
> >>> type(gc.get_referents(A.__dict__)[0])
> 
> >>> gc.get_referents(A.__dict__)[0]['abc'] = 1
> >>> A.abc
> 1
> >>>
>
> (One can also get it right from A, but A can have other references, so
> maybe that's less reliable.)
>
> I think I wanted this at the time so I could better measure the sizes of
> objects. sys.getsizeof(A.__dict__) is very different
> from sys.getsizeof(gc.get_referents(A.__dict__)[0]), and also different
> from sys.getsizeof(A). For example:
>
> >>> import gc
> >>> class A(object): pass
> >>> sys.getsizeof(A); sys.getsizeof(A.__dict__);
> sys.getsizeof(gc.get_referents(A.__dict__)[0])
> 976
> 48
> 288
> >>> for i in range(1): setattr(A, 'attr_%s' % i, i)
> >>> sys.getsizeof(A); sys.getsizeof(A.__dict__);
> sys.getsizeof(gc.get_referents(A.__dict__)[0])
> 976
> 48
> 393312
>
> (Fortunately, if you want to walk the object graph to measure memory usage
> per object type, you're probably going to be using gc.get_referents already
> anyway, so this is just confirmation that you're getting what you want in
> one corner case.)
>
> -- Devin
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Compact ordered dict is not ordered for split table. (was: PEP XXX: Compact ordered dict

2016-06-23 Thread Nathaniel Smith
On Tue, Jun 21, 2016 at 8:40 PM, INADA Naoki  wrote:
> There are three options I can think.
>
>
> 1) Revert key-shared dict (PEP412).
>
> pros: Removing key-shared dict makes dict implementation simple.
>
> cons: In some applications, PEP 412 is far more compact than compact
> ordered dict.  (Note: Using __slots__ may help such situation).
>
>
> 2) Don't make "keeping insertion order" is Python Language Spec.
>
> pros: Best efficiency
>
> cons: Different behavior between normal dict and instance.__dict__ may
> confuse people.
>
>
> 3) More strict rule for key sharing dict.
>
> My idea is:
> * Increasing number of entries (inserting new key) can be possible
> only if refcnt of keys == 1.
>
> * Inserting new item (with existing key) into dict is allowed only when
>   insertion position == number of items in the dict (PyDictObject.ma_used).
>
> pros: We can have "dict keeping insertion order".
>
> cons: Can't use key-sharing dict for many cases.  Small and harmless
> change may cause
> sudden memory usage increase.  (__slots__ is more predicable).

IIUC, key-sharing dicts are a best-effort optimization where if I have
a class like:

class Foo:
def __init__(self, a, b):
self.a = a
self.b = b

f1 = Foo(1, 2)
f2 = Foo(3, 4)

then f1.__dict__ and f2.__dict__ can share their key arrays... but if
I do f1.c = "c", then f1.__dict__ gets automatically switched to a
regular dict. The idea being that in, say, 99% of cases, different
objects of the same type all share the same set of keys, and in the
other 1%, oh well, we fall back on the regular behavior.

It seems to me that all this works fine for ordered dicts too, if we
add the restriction that key arrays can be shared if and only if the
two dicts have the same set of keys *and* initially assign those keys
in the same order. In, say, 98.9% of cases, different objects of the
same type all share the same set of keys and initially assign those
keys in the same order, and in the other 1.1%, oh well, we can
silently fall back on unshared keys, same as before. (And crucially,
the OrderedDict semantics are that only adding *new* keys changes the
order; assignments to existing keys preserve the existing order. So if
a given type always creates the same instance attributes in the same
order at startup and never adds or deletes any, then its key values
*and* key order will stay the same even if it later mutates some of
those existing attributes in-place.)

It's possible that there will be some weird types that mess this up, like:

class WeirdFoo:
def __init__(self, a, b):
if a % 2 == 0:
self.a = a
self.b = b
else:
self.b = b
self.a = a

assert list(WeirdFoo(1, 2).__dict__.keys()) != list(WeirdFoo(2,
3).__dict__.keys())

but, who cares? It'd be good due-diligence to collect data on this to
confirm that it isn't a big issue, but intuitively, code like
WeirdFoo.__init__ is vanishingly rare, and this is just a best-effort
optimization anyway. Catching 98.9% of cases is good enough.

Is there something I'm missing here? Is this your option #3?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Compact ordered dict is not ordered for split table. (was: PEP XXX: Compact ordered dict

2016-06-23 Thread INADA Naoki
> IIUC, key-sharing dicts are a best-effort optimization where if I have
> a class like:
>
> class Foo:
> def __init__(self, a, b):
> self.a = a
> self.b = b
>
> f1 = Foo(1, 2)
> f2 = Foo(3, 4)
>
> then f1.__dict__ and f2.__dict__ can share their key arrays... but if
> I do f1.c = "c", then f1.__dict__ gets automatically switched to a
> regular dict. The idea being that in, say, 99% of cases, different
> objects of the same type all share the same set of keys, and in the
> other 1%, oh well, we fall back on the regular behavior.

Small correction:  Giving up sharing dict can happen when resizing keys.

f1 = Foo(1, 2)  # f1 has [a, b] keys. Let's say it k1.  Foo caches k1.
f2 = Foo(3, 4)  # new instance uses cached k1 keys.
f1.c = "c"   # Since k1 can contain three keys, nothing happen.
f1.d = "d"   # gave up.  Foo doesn't use shared key anymore.
f3 = Foo(5, 6)   # f3 has normal dict.

You can see it by `sys.getsizeof(f1.__dict__), sys.getsizeof(f2.__dict__)`.


> It seems to me that all this works fine for ordered dicts too, if we
> add the restriction that key arrays can be shared if and only if the
> two dicts have the same set of keys *and* initially assign those keys
> in the same order. In, say, 98.9% of cases, different objects of the
> same type all share the same set of keys and initially assign those
> keys in the same order, and in the other 1.1%, oh well, we can
> silently fall back on unshared keys, same as before. (And crucially,
> the OrderedDict semantics are that only adding *new* keys changes the
> order; assignments to existing keys preserve the existing order. So if
> a given type always creates the same instance attributes in the same
> order at startup and never adds or deletes any, then its key values
> *and* key order will stay the same even if it later mutates some of
> those existing attributes in-place.)
>
> It's possible that there will be some weird types that mess this up, like:
>
> class WeirdFoo:
> def __init__(self, a, b):
> if a % 2 == 0:
> self.a = a
> self.b = b
> else:
> self.b = b
> self.a = a
>
> assert list(WeirdFoo(1, 2).__dict__.keys()) != list(WeirdFoo(2,
> 3).__dict__.keys())
>
> but, who cares? It'd be good due-diligence to collect data on this to
> confirm that it isn't a big issue, but intuitively, code like
> WeirdFoo.__init__ is vanishingly rare, and this is just a best-effort
> optimization anyway. Catching 98.9% of cases is good enough.
>

While I think it's less than 98.9% (see below examples), I agree with you.


1) not shared even current implementation

class A:
n = 0

def __init__(self, a, b, c):
self.a, self.b, self.c = a, b, c

def add(self):
self.n += 1

a = A()
b = A()
a.add(1)

2) not shared if strict ordering rule

class A:
file = None
def __init__(self, a, **, filename=None):
if filename is not None:
self.file = open(filename, 'w')
self.a = a

a = A(42, filename="logfile.txt")
b = B(43)


3) Web application's model objects

class User(Model):
id = IntColumn()
name = StringColumn()
age = IntColumn()

# When creating new instance, (name, age) is initialized, and id is
filled after insert.
user = User(name="methane", age=32)
db.add(user)

# When instances fetched from DB, ORM populate attributes in (id,
name, age) order.
# 100 instances doesn't share keys under "strict ordering rule".

users = User.query.limit(100).all()


> Is there something I'm missing here? Is this your option #3?

Yes.

It may works well, but "one special instance disables key-sharing for
all instances
created after" may cause long time increasing memory usage.
People seeing monitoring graph will think their application have memory leak.

My new idea may have more stable memory usage, without decreasing memory
efficiency so much.

See https://mail.python.org/pipermail/python-dev/2016-June/145391.html

Compact ordered dict is more efficient than key-sharing dict in case of Sphinx.
It means, instance __dict__ is not dominance.

I'll implement POC of my new idea and compare it with Sphinx.
If you know another good *real application*, which is easy to benchmark,
please tell me it.

-- 
INADA Naoki  
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com