Re: [Python-Dev] Idea: more compact, interned string key only dict for namespace.
I've checked time and maxrss of sphinx-build. In case of sphinx, ## master $ rm -rf build/ $ /usr/bin/time ~/local/python-master/bin/sphinx-build -b html -d build/doctrees -D latex_paper_size= . build/html -QN 71.76user 0.27system 1:12.06elapsed 99%CPU (0avgtext+0avgdata 176248maxresident)k 80inputs+202888outputs (2major+58234minor)pagefaults 0swaps 71.86user 0.28system 1:12.16elapsed 99%CPU (0avgtext+0avgdata 176312maxresident)k 0inputs+201480outputs (0major+59897minor)pagefaults 0swaps ## compact-dict w/ shared $ rm -rf build/ $ /usr/bin/time ~/local/python-compact/bin/sphinx-build -b html -d build/doctrees -D latex_paper_size= . build/html -QN 72.18user 0.27system 1:12.47elapsed 99%CPU (0avgtext+0avgdata 158104maxresident)k 728inputs+200792outputs (0major+53409minor)pagefaults 0swaps 72.79user 0.30system 1:13.11elapsed 99%CPU (0avgtext+0avgdata 157916maxresident)k 0inputs+200792outputs (0major+54072minor)pagefaults 0swaps ## compact w/o shared key (Only shared key removed. No interned key only dict) $ rm -rf build/ $ /usr/bin/time ~/local/python-intern/bin/sphinx-build -b html -d build/doctrees -D latex_paper_size= . build/html -QN 71.79user 0.34system 1:12.16elapsed 99%CPU (0avgtext+0avgdata 165884maxresident)k 480inputs+200792outputs (0major+56947minor)pagefaults 0swaps 71.84user 0.27system 1:12.13elapsed 99%CPU (0avgtext+0avgdata 166888maxresident)k 640inputs+200792outputs (5major+56834minor)pagefaults 0swaps -- INADA Naoki ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why are class dictionaries not accessible?
On Wed, Jun 22, 2016, at 11:11, Guido van Rossum wrote: > This is done in order to force all mutations of the class dict to go > through attribute assignments on the class. The latter takes care of > updating the class struct, e.g. if you were to add an `__add__` method > dynamically it would update tp_as_number->nb_add. If you could modify the > dict object directly it would be more difficult to arrange for this side > effect. Why is this different from the fact that updating a normal object's dict bypasses descriptors and any special logic in __setattr__? Dunder methods are already "special" in the sense that you can't use them as object attributes; I wouldn't be surprised by "assigning a dunder method via the class's dict breaks things". ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP XXX: Compact ordered dict
On Mon, Jun 20, 2016 at 11:02 PM, INADA Naoki wrote: > On Tue, Jun 21, 2016 at 12:17 PM, Oleg Broytman wrote: >> (if a PEP is needed at all) > > I don't think so. My PEP is not for changing Python Language, > just describe implementation detail. > > Python 3.5 has new OrderedDict implemented in C without PEP. > My patch is relatively small than it. And the idea has been well known. How about, for 3.6, target re-implementing OrderedDict using the compact dict approach (and leave dict alone for now). That way we have an extra release cycle to iron out the kinks before switching dict over for 3.7. :) -eric ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why are class dictionaries not accessible?
On Thu, Jun 23, 2016 at 8:01 AM, Random832 wrote: > On Wed, Jun 22, 2016, at 11:11, Guido van Rossum wrote: > > This is done in order to force all mutations of the class dict to go > > through attribute assignments on the class. The latter takes care of > > updating the class struct, e.g. if you were to add an `__add__` method > > dynamically it would update tp_as_number->nb_add. If you could modify the > > dict object directly it would be more difficult to arrange for this side > > effect. > > Why is this different from the fact that updating a normal object's dict > bypasses descriptors and any special logic in __setattr__? Dunder > methods are already "special" in the sense that you can't use them as > object attributes; I wouldn't be surprised by "assigning a dunder method > via the class's dict breaks things". > It was a long time when I wrote this, but IIRC the breakage could express itself as a segfault or other C-level crash due to some internal state invariant of the type object being violated, not just an exception. The existence of ctypes notwithstanding, we take C-level crashes very seriously. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP XXX: Compact ordered dict
On Fri, Jun 24, 2016 at 12:03 AM, Eric Snow wrote: > On Mon, Jun 20, 2016 at 11:02 PM, INADA Naoki wrote: >> On Tue, Jun 21, 2016 at 12:17 PM, Oleg Broytman wrote: >>> (if a PEP is needed at all) >> >> I don't think so. My PEP is not for changing Python Language, >> just describe implementation detail. >> >> Python 3.5 has new OrderedDict implemented in C without PEP. >> My patch is relatively small than it. And the idea has been well known. > > How about, for 3.6, target re-implementing OrderedDict using the > compact dict approach (and leave dict alone for now). That way we > have an extra release cycle to iron out the kinks before switching > dict over for 3.7. :) > > -eric I can't. Since OrderedDict inherits dict. OrderedDict implementation based on dict implementation. Since I'm not expert of Python object system, I don't know how to separate OrderedDict implementation from dict. -- INADA Naoki ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why are class dictionaries not accessible?
On Thu, Jun 23, 2016 at 8:19 AM, Guido van Rossum wrote: > > It was a long time when I wrote this, but IIRC the breakage could express > itself as a segfault or other C-level crash due to some internal state > invariant of the type object being violated, not just an exception. The > existence of ctypes notwithstanding, we take C-level crashes very seriously. > Big digression: one can still obtain the dict if they really want to, even without using ctypes. I suppose don't actually mutate it unless you want to segfault. >>> import gc >>> class A(object): pass >>> type(A.__dict__) >>> type(gc.get_referents(A.__dict__)[0]) >>> gc.get_referents(A.__dict__)[0]['abc'] = 1 >>> A.abc 1 >>> (One can also get it right from A, but A can have other references, so maybe that's less reliable.) I think I wanted this at the time so I could better measure the sizes of objects. sys.getsizeof(A.__dict__) is very different from sys.getsizeof(gc.get_referents(A.__dict__)[0]), and also different from sys.getsizeof(A). For example: >>> import gc >>> class A(object): pass >>> sys.getsizeof(A); sys.getsizeof(A.__dict__); sys.getsizeof(gc.get_referents(A.__dict__)[0]) 976 48 288 >>> for i in range(1): setattr(A, 'attr_%s' % i, i) >>> sys.getsizeof(A); sys.getsizeof(A.__dict__); sys.getsizeof(gc.get_referents(A.__dict__)[0]) 976 48 393312 (Fortunately, if you want to walk the object graph to measure memory usage per object type, you're probably going to be using gc.get_referents already anyway, so this is just confirmation that you're getting what you want in one corner case.) -- Devin ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why are class dictionaries not accessible?
"Er, among our chief weapons are fear, surprise, ctypes, gc, and fanatical devotion to the Pope!" On Thu, Jun 23, 2016 at 1:03 PM, Devin Jeanpierre wrote: > On Thu, Jun 23, 2016 at 8:19 AM, Guido van Rossum > wrote: >> >> It was a long time when I wrote this, but IIRC the breakage could express >> itself as a segfault or other C-level crash due to some internal state >> invariant of the type object being violated, not just an exception. The >> existence of ctypes notwithstanding, we take C-level crashes very seriously. >> > > Big digression: one can still obtain the dict if they really want to, even > without using ctypes. I suppose don't actually mutate it unless you want to > segfault. > > >>> import gc > >>> class A(object): pass > >>> type(A.__dict__) > > >>> type(gc.get_referents(A.__dict__)[0]) > > >>> gc.get_referents(A.__dict__)[0]['abc'] = 1 > >>> A.abc > 1 > >>> > > (One can also get it right from A, but A can have other references, so > maybe that's less reliable.) > > I think I wanted this at the time so I could better measure the sizes of > objects. sys.getsizeof(A.__dict__) is very different > from sys.getsizeof(gc.get_referents(A.__dict__)[0]), and also different > from sys.getsizeof(A). For example: > > >>> import gc > >>> class A(object): pass > >>> sys.getsizeof(A); sys.getsizeof(A.__dict__); > sys.getsizeof(gc.get_referents(A.__dict__)[0]) > 976 > 48 > 288 > >>> for i in range(1): setattr(A, 'attr_%s' % i, i) > >>> sys.getsizeof(A); sys.getsizeof(A.__dict__); > sys.getsizeof(gc.get_referents(A.__dict__)[0]) > 976 > 48 > 393312 > > (Fortunately, if you want to walk the object graph to measure memory usage > per object type, you're probably going to be using gc.get_referents already > anyway, so this is just confirmation that you're getting what you want in > one corner case.) > > -- Devin > -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Compact ordered dict is not ordered for split table. (was: PEP XXX: Compact ordered dict
On Tue, Jun 21, 2016 at 8:40 PM, INADA Naoki wrote: > There are three options I can think. > > > 1) Revert key-shared dict (PEP412). > > pros: Removing key-shared dict makes dict implementation simple. > > cons: In some applications, PEP 412 is far more compact than compact > ordered dict. (Note: Using __slots__ may help such situation). > > > 2) Don't make "keeping insertion order" is Python Language Spec. > > pros: Best efficiency > > cons: Different behavior between normal dict and instance.__dict__ may > confuse people. > > > 3) More strict rule for key sharing dict. > > My idea is: > * Increasing number of entries (inserting new key) can be possible > only if refcnt of keys == 1. > > * Inserting new item (with existing key) into dict is allowed only when > insertion position == number of items in the dict (PyDictObject.ma_used). > > pros: We can have "dict keeping insertion order". > > cons: Can't use key-sharing dict for many cases. Small and harmless > change may cause > sudden memory usage increase. (__slots__ is more predicable). IIUC, key-sharing dicts are a best-effort optimization where if I have a class like: class Foo: def __init__(self, a, b): self.a = a self.b = b f1 = Foo(1, 2) f2 = Foo(3, 4) then f1.__dict__ and f2.__dict__ can share their key arrays... but if I do f1.c = "c", then f1.__dict__ gets automatically switched to a regular dict. The idea being that in, say, 99% of cases, different objects of the same type all share the same set of keys, and in the other 1%, oh well, we fall back on the regular behavior. It seems to me that all this works fine for ordered dicts too, if we add the restriction that key arrays can be shared if and only if the two dicts have the same set of keys *and* initially assign those keys in the same order. In, say, 98.9% of cases, different objects of the same type all share the same set of keys and initially assign those keys in the same order, and in the other 1.1%, oh well, we can silently fall back on unshared keys, same as before. (And crucially, the OrderedDict semantics are that only adding *new* keys changes the order; assignments to existing keys preserve the existing order. So if a given type always creates the same instance attributes in the same order at startup and never adds or deletes any, then its key values *and* key order will stay the same even if it later mutates some of those existing attributes in-place.) It's possible that there will be some weird types that mess this up, like: class WeirdFoo: def __init__(self, a, b): if a % 2 == 0: self.a = a self.b = b else: self.b = b self.a = a assert list(WeirdFoo(1, 2).__dict__.keys()) != list(WeirdFoo(2, 3).__dict__.keys()) but, who cares? It'd be good due-diligence to collect data on this to confirm that it isn't a big issue, but intuitively, code like WeirdFoo.__init__ is vanishingly rare, and this is just a best-effort optimization anyway. Catching 98.9% of cases is good enough. Is there something I'm missing here? Is this your option #3? -n -- Nathaniel J. Smith -- https://vorpus.org ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Compact ordered dict is not ordered for split table. (was: PEP XXX: Compact ordered dict
> IIUC, key-sharing dicts are a best-effort optimization where if I have > a class like: > > class Foo: > def __init__(self, a, b): > self.a = a > self.b = b > > f1 = Foo(1, 2) > f2 = Foo(3, 4) > > then f1.__dict__ and f2.__dict__ can share their key arrays... but if > I do f1.c = "c", then f1.__dict__ gets automatically switched to a > regular dict. The idea being that in, say, 99% of cases, different > objects of the same type all share the same set of keys, and in the > other 1%, oh well, we fall back on the regular behavior. Small correction: Giving up sharing dict can happen when resizing keys. f1 = Foo(1, 2) # f1 has [a, b] keys. Let's say it k1. Foo caches k1. f2 = Foo(3, 4) # new instance uses cached k1 keys. f1.c = "c" # Since k1 can contain three keys, nothing happen. f1.d = "d" # gave up. Foo doesn't use shared key anymore. f3 = Foo(5, 6) # f3 has normal dict. You can see it by `sys.getsizeof(f1.__dict__), sys.getsizeof(f2.__dict__)`. > It seems to me that all this works fine for ordered dicts too, if we > add the restriction that key arrays can be shared if and only if the > two dicts have the same set of keys *and* initially assign those keys > in the same order. In, say, 98.9% of cases, different objects of the > same type all share the same set of keys and initially assign those > keys in the same order, and in the other 1.1%, oh well, we can > silently fall back on unshared keys, same as before. (And crucially, > the OrderedDict semantics are that only adding *new* keys changes the > order; assignments to existing keys preserve the existing order. So if > a given type always creates the same instance attributes in the same > order at startup and never adds or deletes any, then its key values > *and* key order will stay the same even if it later mutates some of > those existing attributes in-place.) > > It's possible that there will be some weird types that mess this up, like: > > class WeirdFoo: > def __init__(self, a, b): > if a % 2 == 0: > self.a = a > self.b = b > else: > self.b = b > self.a = a > > assert list(WeirdFoo(1, 2).__dict__.keys()) != list(WeirdFoo(2, > 3).__dict__.keys()) > > but, who cares? It'd be good due-diligence to collect data on this to > confirm that it isn't a big issue, but intuitively, code like > WeirdFoo.__init__ is vanishingly rare, and this is just a best-effort > optimization anyway. Catching 98.9% of cases is good enough. > While I think it's less than 98.9% (see below examples), I agree with you. 1) not shared even current implementation class A: n = 0 def __init__(self, a, b, c): self.a, self.b, self.c = a, b, c def add(self): self.n += 1 a = A() b = A() a.add(1) 2) not shared if strict ordering rule class A: file = None def __init__(self, a, **, filename=None): if filename is not None: self.file = open(filename, 'w') self.a = a a = A(42, filename="logfile.txt") b = B(43) 3) Web application's model objects class User(Model): id = IntColumn() name = StringColumn() age = IntColumn() # When creating new instance, (name, age) is initialized, and id is filled after insert. user = User(name="methane", age=32) db.add(user) # When instances fetched from DB, ORM populate attributes in (id, name, age) order. # 100 instances doesn't share keys under "strict ordering rule". users = User.query.limit(100).all() > Is there something I'm missing here? Is this your option #3? Yes. It may works well, but "one special instance disables key-sharing for all instances created after" may cause long time increasing memory usage. People seeing monitoring graph will think their application have memory leak. My new idea may have more stable memory usage, without decreasing memory efficiency so much. See https://mail.python.org/pipermail/python-dev/2016-June/145391.html Compact ordered dict is more efficient than key-sharing dict in case of Sphinx. It means, instance __dict__ is not dominance. I'll implement POC of my new idea and compare it with Sphinx. If you know another good *real application*, which is easy to benchmark, please tell me it. -- INADA Naoki ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com