[Python-Dev] Impact of Namedtuple on startup time
Hello, Cost of creating a namedtuple has been identified as a contributor to Python startup time. Not only Python core and the stdlib, but any third-party library creating namedtuple classes (there are many of them). An issue was created for this: https://bugs.python.org/issue28638 Raymond decided to close the issue because: 1) the proposed resolution makes the "_source" attribute empty (or, at least, something else than it currently is). Raymond claims the "_source" attribute is an essential feature of namedtuples. 2) optimizing startup cost is supposedly not worth the effort. To this, I will counter-argument: As for 1), a search for "namedtuple" and "_source" in a code search engine (*) brings *only* false positives of different kinds: * clones of the CPython repo * copies of the namedtuple class instantiation source code with slight tweaks (*not* reading the _source attribute of an existing namedtuple) * modules using namedtuples and also using a "_source" attribute on unrelated objects (*) https://searchcode.com/?q=namedtuple+_source As for 2), startup time is actually a very important consideration nowadays, both for small scripts *and* for interactive use with the now very wide-spread use of Jupyter Notebooks. A 1 ms. cost when importing a single module can translate into a large slowdown when your library imports (directly or indirectly) hundreds of modules, many of which may create their own namedtuple classes. Nick pointed out that one alternative is to make the C-written "struct sequence" class user-visible. My opinion is that, while better than nothing, this would complicate things by exposing two very similar primitives in the stdlib, without there being a clear choice for users. Should I use the well-known namedtuple? Should I use the new-ish "struct sequence", with similar characteristics and better performance, but worse compatibility (now I have to write fallback code for Python versions where the "struct sequence" isn't exposed)? And not to mention all third-party libraries must be migrated to the newly-exposed "struct sequence" + compatibility fallback code... So my take is: 1) Usage of "_source" in open source code (as per the search above) seems non-existent. 2) If the primary intent of "_source" is to show-case how to write a tuple subclass, well, why not write a recipe or tutorial somewhere? The Python stdlib is generally not a place where we reify tutorials or educational snippets as public APIs. 3) The well-known namedtuple would really benefit from a performance boost, without asking all maintainers of dependent code (that's a *ton*) to migrate to a new idiom + compatibility fallback. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Mon, 17 Jul 2017 14:43:19 +0200 Antoine Pitrou wrote: > Hello, > > Cost of creating a namedtuple has been identified as a contributor to > Python startup time. Imprecise wording: that's the cost of creating a namedtuple *class*, i.e. anytime someone writes `MyClass = namedtuple('MyClass', ...)`. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
Interesting coincidence, just two days ago I have heard that a team at one large company completely abandoned namedtuple because of the creation time problem. Concerning _source, why it is not possible to make it a property so that all the string formatting will happen on request, thus saving some time for users who doesn't need it? (Of course this will not be an actual source, but it can be made practically equivalent to the no-compile version.) -- Ivan On 17 July 2017 at 14:53, Antoine Pitrou wrote: > On Mon, 17 Jul 2017 14:43:19 +0200 > Antoine Pitrou wrote: > > Hello, > > > > Cost of creating a namedtuple has been identified as a contributor to > > Python startup time. > > Imprecise wording: that's the cost of creating a namedtuple *class*, > i.e. anytime someone writes `MyClass = namedtuple('MyClass', ...)`. > > Regards > > Antoine. > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > levkivskyi%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On 17 July 2017 at 08:43, Antoine Pitrou wrote: > > Hello, > > Cost of creating a namedtuple has been identified as a contributor to > Python startup time. Not only Python core and the stdlib, but any > third-party library creating namedtuple classes (there are many of > them). An issue was created for this: > https://bugs.python.org/issue28638 > > Raymond decided to close the issue because: > > 1) the proposed resolution makes the "_source" attribute empty (or, at > least, something else than it currently is). Raymond claims the > "_source" attribute is an essential feature of namedtuples. > I think I understand well enough to say something intelligent… While actual references to _source are likely rare (certainly I’ve never used it), my understanding is that the way namedtuple works is to construct _source, and then exec it to create the class. Once that is done, there is no significant saving to be had by throwing away the constructed _source value. When namedtuple was being considered for inclusion, I actually went so far as to write a proof-of-concept version that worked by creating a class, creating attributes on it, etc. I don’t remember how far I got but the exec version is the version included in the stdlib. I come from a non-Pythonic background so use of exec still feels a bit weird to me but I absolutely love namedtuple and use it constantly. I don't know whether a polished and completed version of my idea could be faster than using exec, but I wouldn't expect a major saving — a whole bunch of code has to run either way. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Mon, 17 Jul 2017 15:03:26 +0200 Ivan Levkivskyi wrote: > Interesting coincidence, just two days ago I have heard that a team at one > large company completely abandoned namedtuple because of the creation time > problem. > > Concerning _source, why it is not possible to make it a property so that > all the string formatting will happen on request, thus saving some time for > users who doesn't need it? It was proposed in https://bugs.python.org/issue19640 but rejected. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
Le 17/07/2017 à 15:26, Isaac Morland a écrit : > > I think I understand well enough to say something intelligent… > > While actual references to _source are likely rare (certainly I’ve never > used it), my understanding is that the way namedtuple works is to > construct _source, and then exec it to create the class. Once that is > done, there is no significant saving to be had by throwing away the > constructed _source value. The proposed resolution on https://bugs.python.org/issue28638 is to avoid exec() on most parts of the namedtuple class, hence speeding up the class creation. > I come from > a non-Pythonic background so use of exec still feels a bit weird to me > but I absolutely love namedtuple and use it constantly. I think for most Python programmers, it still feels a bit un-Pythonic. While exec() is part of Python, it's generally only used in fringe cases where nothing else works. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Mon, Jul 17, 2017 at 9:43 AM, Antoine Pitrou wrote: > As for 2), startup time is actually a very important consideration > nowadays, both for small scripts *and* for interactive use with the > now very wide-spread use of Jupyter Notebooks. A 1 ms. cost when > importing a single module can translate into a large slowdown when your > library imports (directly or indirectly) hundreds of modules, many of > which may create their own namedtuple classes. My experience inside Canonical is that golang stole a lot of "codebase share" from Python, and (others and mine) talks hit two walls, mainly: one is memory consumption, and the other is startup time. So yes, startup time is important for user-faced scripts and services. Regards, -- .Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ Twitter: @facundobatista ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
> On Jul 17, 2017, at 6:31 AM, Antoine Pitrou wrote: > >> I think I understand well enough to say something intelligent… >> >> While actual references to _source are likely rare (certainly I’ve never >> used it), my understanding is that the way namedtuple works is to >> construct _source, and then exec it to create the class. Once that is >> done, there is no significant saving to be had by throwing away the >> constructed _source value. There are considerable benefits to namedtuple being able to generate and match its own source. * It makes it is really easy for a user to generate the code, drop it into another another module, and customize it. * It makes the named tuple factory function completely self-documenting. * The verbose/_source option teaches you exactly what named tuple does. That makes the tool relatively easy to learn, understand, and debug. I really don't want to throw away these benefits to save a couple of milliseconds. As Nick Coghlan recently posted, "Speed isn't everything, and it certainly isn't adequate justification for breaking public APIs that have been around for years." FWIW, the template/exec implementation has had excellent benefits for maintainability making it very easy to fix and update. As other parts of Python have changed (limitations on number of arguments, what is allowed as an identifier, etc), it mostly automatically stays in sync with the rest of the language. ISTM this issue is being pressed by micro-optimizers who are being very aggressive and not responding to actual user needs (it is more an invented issue than a real one). Named tuple has been around for a long time and users have been somewhat happy with it. If someone truly cares about the exec time for a particular named tuple, the _source option makes it trivially easy to just replace the generator call with the expanded code in that particular circumstance. Raymond P.S. I'm fully supportive of Victor's efforts to build-out structseq to make it sufficiently expressive to do more of what collections.namedtuple() does. That is a perfectly reasonable path to optimization. We've wanted that for a long time and no one has had the spare clock cycles to make it come true. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Jul 17, 2017, at 10:59, Raymond Hettinger wrote: > > ISTM this issue is being pressed by micro-optimizers who are being very > aggressive and not responding to actual user needs (it is more an invented > issue than a real one). Named tuple has been around for a long time and > users have been somewhat happy with it. Regardless of whether this particular optimization is a good idea or not, start up time *is* a serious challenge in many environments for CPython in particular and the perception of Python’s applicability to many problems. I think we’re better off trying to identify and address such problems than ignoring or minimizing them. Cheers, -Barry signature.asc Description: Message signed with OpenPGP ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
2017-07-17 16:56 GMT+02:00 Facundo Batista : > My experience inside Canonical is that golang stole a lot of "codebase > share" from Python, and (others and mine) talks hit two walls, mainly: > one is memory consumption, and the other is startup time. > > So yes, startup time is important for user-faced scripts and services. Removing the _source attribute would allow to: (1) Reduce the memory consumption http://bugs.python.org/issue19640#msg213949 (2) Pyhon startup up time https://bugs.python.org/issue28638#msg280277 Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On 2017-07-17 14:43, Antoine Pitrou wrote: > So my take is: > > 1) Usage of "_source" in open source code (as per the search above) > seems non-existent. > > 2) If the primary intent of "_source" is to show-case how to write a > tuple subclass, well, why not write a recipe or tutorial somewhere? > The Python stdlib is generally not a place where we reify tutorials or > educational snippets as public APIs. > > 3) The well-known namedtuple would really benefit from a performance > boost, without asking all maintainers of dependent code (that's a > *ton*) to migrate to a new idiom + compatibility fallback. I have an additional take on named tuples 4) The current approach uses exec() to generate the namedtuple class on the fly. The exec() function isn't necessarily evil and the use of exec() in namedtuple is safe. However I would appreciate if Python interpreter could be started without requiring the exec() function. It would make it easier to harden the interpreter for embedding and system integration uses cases. It's not about sandboxing Python. My goal is to make it harder to abuse Python. See Steve's lighting talk "Python as a security vulnerability" at the language summit, https://lwn.net/Articles/723823/ . Christian ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Mon, Jul 17, 2017 at 3:59 PM, Raymond Hettinger < raymond.hettin...@gmail.com> wrote: > I really don't want to throw away these benefits to save a couple of > milliseconds. As Nick Coghlan recently posted, "Speed isn't everything, > and it certainly isn't adequate justification for breaking public APIs that > have been around for years." My only question is "what's a variable called _source doing in the public API?" regards Steve Steve Holden ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
> On Jul 17, 2017, at 8:22 AM, Steve Holden wrote: > > My only question is "what's a variable called _source doing in the public > API?" The convention for named tuple hnas been for all the methods and attributes to be prefixed with an underscore so that the names won't conflict with field names in the named tuple itself. For example, we want to allow Path=namedtuple('Path', ['source', 'destination']). If I had it all to do over again, it might have been better to have had a different convention like source_ with a trailing underscore, but that ship sailed long ago :-) Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
Makes sense. Thanks. S Steve Holden On Mon, Jul 17, 2017 at 4:29 PM, Raymond Hettinger < raymond.hettin...@gmail.com> wrote: > > > On Jul 17, 2017, at 8:22 AM, Steve Holden wrote: > > > > My only question is "what's a variable called _source doing in the > public API?" > > The convention for named tuple hnas been for all the methods and > attributes to be prefixed with an underscore so that the names won't > conflict with field names in the named tuple itself. For example, we want > to allow Path=namedtuple('Path', ['source', 'destination']). > > If I had it all to do over again, it might have been better to have had a > different convention like source_ with a trailing underscore, but that ship > sailed long ago :-) > > > Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
I am firmly with Antoine here. The cumulative startup time of large Python programs is a serious problem and namedtuple is one of the major contributors -- especially because it is so convenient that it is ubiquitous. The approach of generating source code and exec()ing it, is a cool demonstration of Python's expressive power, but it's always been my sense that whenever we encounter a popular idiom that uses exec() and eval(), we should augment the language (or the builtins) to avoid these calls -- that's for example how we ended up with getattr(). One of the reasons to be wary of exec()/eval() other than the usual security concerns is that in some Python implementations they have a high overhead to initialize the parser and compiler. (Even in CPython it's not that fast.) Regarding the argument that it's easier to learn what namedtuple does if the generated source is available, while I don't feel this is important, supposedly it is important to Raymond. But surely there are other approaches possible that work just as well in an educational setting while being more efficient in production use. (E.g. the approach taken by itertools, where the docs show equivalent Python code.) Concluding, I think we should move on from the original implementation and optimize the heck out of namedtuple. The original has served us well. The world is constantly changing. Python should adapt to the (happy) fact that it's being used for systems larger than any of us could imagine 15 years ago. --Guido On Mon, Jul 17, 2017 at 7:59 AM, Raymond Hettinger < raymond.hettin...@gmail.com> wrote: > > > On Jul 17, 2017, at 6:31 AM, Antoine Pitrou wrote: > > > >> I think I understand well enough to say something intelligent… > >> > >> While actual references to _source are likely rare (certainly I’ve never > >> used it), my understanding is that the way namedtuple works is to > >> construct _source, and then exec it to create the class. Once that is > >> done, there is no significant saving to be had by throwing away the > >> constructed _source value. > > There are considerable benefits to namedtuple being able to generate and > match its own source. > > * It makes it is really easy for a user to generate the code, drop it into > another another module, and customize it. > > * It makes the named tuple factory function completely self-documenting. > > * The verbose/_source option teaches you exactly what named tuple does. > That makes the tool relatively easy to learn, understand, and debug. > > I really don't want to throw away these benefits to save a couple of > milliseconds. As Nick Coghlan recently posted, "Speed isn't everything, > and it certainly isn't adequate justification for breaking public APIs that > have been around for years." > > FWIW, the template/exec implementation has had excellent benefits for > maintainability making it very easy to fix and update. As other parts of > Python have changed (limitations on number of arguments, what is allowed as > an identifier, etc), it mostly automatically stays in sync with the rest of > the language. > > ISTM this issue is being pressed by micro-optimizers who are being very > aggressive and not responding to actual user needs (it is more an invented > issue than a real one). Named tuple has been around for a long time and > users have been somewhat happy with it. > > If someone truly cares about the exec time for a particular named tuple, > the _source option makes it trivially easy to just replace the generator > call with the expanded code in that particular circumstance. > > > Raymond > > > P.S. I'm fully supportive of Victor's efforts to build-out structseq to > make it sufficiently expressive to do more of what collections.namedtuple() > does. That is a perfectly reasonable path to optimization. We've wanted > that for a long time and no one has had the spare clock cycles to make it > come true. > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Mon, Jul 17, 2017 at 8:00 AM Raymond Hettinger < raymond.hettin...@gmail.com> wrote: > > > On Jul 17, 2017, at 6:31 AM, Antoine Pitrou wrote: > > > >> I think I understand well enough to say something intelligent… > >> > >> While actual references to _source are likely rare (certainly I’ve never > >> used it), my understanding is that the way namedtuple works is to > >> construct _source, and then exec it to create the class. Once that is > >> done, there is no significant saving to be had by throwing away the > >> constructed _source value. > > There are considerable benefits to namedtuple being able to generate and > match its own source. > > * It makes it is really easy for a user to generate the code, drop it into > another another module, and customize it. > > * It makes the named tuple factory function completely self-documenting. > > * The verbose/_source option teaches you exactly what named tuple does. > That makes the tool relatively easy to learn, understand, and debug. > > I really don't want to throw away these benefits to save a couple of > milliseconds. As Nick Coghlan recently posted, "Speed isn't everything, > and it certainly isn't adequate justification for breaking public APIs that > have been around for years." > > FWIW, the template/exec implementation has had excellent benefits for > maintainability making it very easy to fix and update. As other parts of > Python have changed (limitations on number of arguments, what is allowed as > an identifier, etc), it mostly automatically stays in sync with the rest of > the language. > > ISTM this issue is being pressed by micro-optimizers who are being very > aggressive and not responding to actual user needs (it is more an invented > issue than a real one). Named tuple has been around for a long time and > users have been somewhat happy with it. > Raymond, you keep repeating statements similar to "only a millisecond" and "aggressive micro-optimizers who don't care about user needs" in your comments on issues like this. That simply isn't true. These issues come up in the first place *because of* users who need fast startup. Please don't be so dismissive. The reason people care about this has been stated many times. It isn't just "a millisecond", it's 100s or 1000s of milliseconds in any application of reasonable size where namedtuples were adopted as a design pattern in various libraries. Real world use cases for startup time mattering exist: interactive command line tools are the most obvious one people keep citing. I'll toss another where Python startup time has raised eyebrows at work: unittest startup and completion time. When the bulk of a processes time is spent in startup before hitting unittest.main(), people take notice and consider it a problem. Developer productivity is reduced. The hacks individual developers come up with to try and workaround things like this are not pretty. If someone truly cares about the exec time for a particular named tuple, > the _source option makes it trivially easy to just replace the generator > call with the expanded code in that particular circumstance. > In real world applications you do not control the bulk of the code that has chosen to use namedtuple. They're scattered through 100-1000s of other transitive dependency libraries (not just the standard library), the modification of each of which faces hurdles both technical and non-technical in nature. To me the desired resolution to this is clear: Optimize the default use case of namedtuple and everybody wins. This isn't just about the stdlib's namedtuple uses being fast, those a small portion of all uses in any application where startup time matters. This is about making Python better for the world. ie: What Antoine's original write-up suggested in his #3. I get that namedtuple ._source is a public API. We may need to keep it. If so, that just means revisiting lazily generating it as a property - issue19640. -gps PS - Good call on the naming hindsight! A trailing underscore would've been nice. Oh well, too late for that. > > Raymond > > > P.S. I'm fully supportive of Victor's efforts to build-out structseq to > make it sufficiently expressive to do more of what collections.namedtuple() > does. That is a perfectly reasonable path to optimization. We've wanted > that for a long time and no one has had the spare clock cycles to make it > come true. > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
2017-07-17 18:13 GMT+02:00 Gregory P. Smith : > I get that namedtuple ._source is a public API. We may need to keep it. If > so, that just means revisiting lazily generating it as a property - > issue19640. I agree. Technically speaking, optimizing namedtuple doesn't have to mean "remove the _source attribute". I wouldn't discuss here if _source should be kept or not, but even if we rewrite the namedtuple implementation, I agree that we *can* technically keep a _source property which would create the same Python code. It would allow it to speedup namedtuple, reduce the memory footprint, and have a smooth deprecation policy (*if* we decide to deprecate this attribute). Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Mon, Jul 17, 2017 at 02:43:19PM +0200, Antoine Pitrou wrote: > > Hello, > > Cost of creating a namedtuple has been identified as a contributor to > Python startup time. Not only Python core and the stdlib, but any > third-party library creating namedtuple classes (there are many of > them). An issue was created for this: > https://bugs.python.org/issue28638 Some time ago, I needed to backport a version of namedtuple to Python 2.4, so I started with Raymond's recipe on Activestate and modified it to only exec the code needed for __new__. The rest of the class is an ordinary inner class: # a short sketch def namedtuple(...): class Inner(tuple): ... exec(source, ns) Inner.__new__ = ns['__new__'] return Inner Here's my fork of Raymond's recipe: https://code.activestate.com/recipes/578918-yet-another-namedtuple/ Out of curiosity, I took that recipe, updated it to work in Python 3, and compared it to the std lib version. Here are some representative timings: [steve@ando ~]$ python3.5 -m timeit -s "from collections import namedtuple" "K = namedtuple('K', 'a b c')" 1000 loops, best of 3: 1.02 msec per loop [steve@ando ~]$ python3.5 -m timeit -s "from nt3 import namedtuple" "K = namedtuple('K', 'a b c')" 1000 loops, best of 3: 255 usec per loop I think that proves that this approach is viable and can lead to a big speed up. I don't think that merely dropping the _source attribute will save much time. It might save a bit of memory, but in my experiements dropping it only saves about 10µs more. I think the real bottleneck is the cost of exec'ing the entire class. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
2017-07-17 9:45 GMT-07:00 Steven D'Aprano : > On Mon, Jul 17, 2017 at 02:43:19PM +0200, Antoine Pitrou wrote: > > > > Hello, > > > > Cost of creating a namedtuple has been identified as a contributor to > > Python startup time. Not only Python core and the stdlib, but any > > third-party library creating namedtuple classes (there are many of > > them). An issue was created for this: > > https://bugs.python.org/issue28638 > > Some time ago, I needed to backport a version of namedtuple to Python > 2.4, so I started with Raymond's recipe on Activestate and modified it > to only exec the code needed for __new__. The rest of the class is an > ordinary inner class: > > # a short sketch > def namedtuple(...): > class Inner(tuple): > ... > exec(source, ns) > Inner.__new__ = ns['__new__'] > return Inner > > > Here's my fork of Raymond's recipe: > > https://code.activestate.com/recipes/578918-yet-another-namedtuple/ > > > Out of curiosity, I took that recipe, updated it to work in Python 3, > and compared it to the std lib version. Here are some representative > timings: > > [steve@ando ~]$ python3.5 -m timeit -s "from collections import > namedtuple" "K = namedtuple('K', 'a b c')" > 1000 loops, best of 3: 1.02 msec per loop > > [steve@ando ~]$ python3.5 -m timeit -s "from nt3 import namedtuple" "K = > namedtuple('K', 'a b c')" > 1000 loops, best of 3: 255 usec per loop > > > I think that proves that this approach is viable and can lead to a big > speed up. > > I have an open pull request implementing this approach: https://github.com/python/cpython/pull/2736. We can discuss the exact form the code should take there (Ivan already added some good suggestions). > I don't think that merely dropping the _source attribute will save much > time. It might save a bit of memory, but in my experiements dropping it > only saves about 10µs more. I think the real bottleneck is the cost of > exec'ing the entire class. > > > > -- > Steve > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > jelle.zijlstra%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
> On Jul 17, 2017, at 8:49 AM, Guido van Rossum wrote: > > The approach of generating source code and exec()ing it, is a cool > demonstration of Python's expressive power, but it's always been my sense > that whenever we encounter a popular idiom that uses exec() and eval(), we > should augment the language (or the builtins) to avoid these calls -- that's > for example how we ended up with getattr(). FYI, the proposal (from Jelle) isn't to remove exec. It is to only exec a smaller piece of code and make the rest of it static. It isn't bad idea, it just complicates the implementation (generating _source lazily) and the subsequence maintenance (which is currently really easy). > Concluding, I think we should move on from the original implementation and > optimize the heck out of namedtuple. The original has served us well. The > world is constantly changing. Python should adapt to the (happy) fact that > it's being used for systems larger than any of us could imagine 15 years ago. Okay, then Nick and I are overruled. I'll move Jelle's patch forward. We'll also need to lazily generate _source but I don't think that will be hard. One minor grumble: I think we need to give careful cost/benefit considerations to optimizations that complicate the implementation. Over the last several years, the source for Python has grown increasingly complicated. Fewer people understand it now. It is much harder to newcomers to on-ramp. The old-timers (myself included) find that their knowledge is out of date. And complexity leads to bugs (the C optimization of random number seeding caused a major bug in the 3.6.0 release; the C optimization of the lru_cache resulted in multiple releases having a hard to find threading bugs, etc.). It is becoming increasingly difficult to look at code and tell whether it is correct (I still don't fully understand the implications of the recursive constant folding in the peephole optimizer for example).In the case of this named tuple proposal, the complexity is manageable, but the overall trend isn't good and I get the feeling the aggressive optimization is causing us to forget key par ts of the zen-of-python. Cheers, Raymond P.S. Ironically, a lot of my consulting work comes from people who have created something complex our of something that could have been simple. So, I in a strange way, I should be happy about these trends -- just saying ;-) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
> On Jul 17, 2017, at 8:49 AM, Guido van Rossum wrote: > > One of the reasons to be wary of exec()/eval() other than the usual security > concerns is that in some Python implementations they have a high overhead to > initialize the parser and compiler. (Even in CPython it's not that fast.) BTW, if getting rid of the template/exec pair is a goal, Joe Jevnik proposed a patch a couple of years ago the completely reimplemented namedtuple() in C. The patch was somewhat complex and hard to semantic equivalence, but we could resurrect it and clean it up. That way, we could like the existing namedtuple() code in-place and do a subsequent import from the C-version. This path won't be fun (whenever we have both a C version and Python version, we get years of trying to sync-up tiny differences); however, it will give you take fastest startup times, the fastest lookups at runtime, and eliminate use of exec. > On Jul 17, 2017, at 8:13 AM, Barry Warsaw wrote: > Regardless of whether this particular optimization is a good idea or not, > start up time *is* a serious challenge in many environments for CPython in > particular and the perception of Python’s applicability to many problems. I > think we’re better off trying to identify and address such problems than > ignoring or minimizing them. I agree with that sentiment but think we ought to look at places where the payoffs would actually matter such a minimizing the number of disk accesses (Python performs a lot of I/O on startup). Whenever I've addressed start-up time for my clients, named tuples we never the issue. Also, it would have been trivially easy to replace the factory function call with the generated code, but that never proved necessary or beneficial. IMO, we're about to turn the named tuple code into a mess but will find that most users, most of the time will get nearly zero benefit. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
I completely agree. I love namedtuples but I've never been too happy about the additional overhead vs. plain tuples (both for creation and attribute access times), to the point that I explicitly avoid to use them in certain circumstances (e.g. a busy loop) and only for public end-user APIs returning multiple values. To be entirely honest, I'm not even sure why they need to be forcefully declared upfront in the first place, instead of just having a first-class function (builtin?) written in C: >>> ntuple(x=1, y=0) (x=1, y=0) ...or even a literal as in: >>> (x=1, y=0) (x=1, y=0) Most of the times this is what I really want: quickly returning an anonymous tuple with named attributes and nothing else, similarly to os.times() & others. I believe that if something like this would exist we would witness a big transition from tuple() to ntuple() for all those functions returning more than 1 value. We witnessed a similar transition in many parts of the stdlib when collections.namedtuple was first introduced, but not everywhere, probably because declaring a namedtuple is more work, it's more expensive, and it still feels like you're dealing with some kind of too high-level second-class citizen with too much overhead and too many sugar in terms of API (e.g. "verbose", "rename", "module" and "_source"). If something like this were to happen I expect collections.namedtuple to be used only by those who want to subclass it in order to attach methods, whereas the rest would stick and use ntuple() pretty much everywhere (both in "private" and "public" functions). On Mon, Jul 17, 2017 at 5:49 PM, Guido van Rossum wrote: > I am firmly with Antoine here. The cumulative startup time of large Python > programs is a serious problem and namedtuple is one of the major > contributors -- especially because it is so convenient that it is > ubiquitous. The approach of generating source code and exec()ing it, is a > cool demonstration of Python's expressive power, but it's always been my > sense that whenever we encounter a popular idiom that uses exec() and > eval(), we should augment the language (or the builtins) to avoid these > calls -- that's for example how we ended up with getattr(). > > One of the reasons to be wary of exec()/eval() other than the usual > security concerns is that in some Python implementations they have a high > overhead to initialize the parser and compiler. (Even in CPython it's not > that fast.) > > Regarding the argument that it's easier to learn what namedtuple does if > the generated source is available, while I don't feel this is important, > supposedly it is important to Raymond. But surely there are other > approaches possible that work just as well in an educational setting while > being more efficient in production use. (E.g. the approach taken by > itertools, where the docs show equivalent Python code.) > > Concluding, I think we should move on from the original implementation and > optimize the heck out of namedtuple. The original has served us well. The > world is constantly changing. Python should adapt to the (happy) fact that > it's being used for systems larger than any of us could imagine 15 years > ago. > > --Guido > > On Mon, Jul 17, 2017 at 7:59 AM, Raymond Hettinger < > raymond.hettin...@gmail.com> wrote: > >> >> > On Jul 17, 2017, at 6:31 AM, Antoine Pitrou wrote: >> > >> >> I think I understand well enough to say something intelligent… >> >> >> >> While actual references to _source are likely rare (certainly I’ve >> never >> >> used it), my understanding is that the way namedtuple works is to >> >> construct _source, and then exec it to create the class. Once that is >> >> done, there is no significant saving to be had by throwing away the >> >> constructed _source value. >> >> There are considerable benefits to namedtuple being able to generate and >> match its own source. >> >> * It makes it is really easy for a user to generate the code, drop it >> into another another module, and customize it. >> >> * It makes the named tuple factory function completely self-documenting. >> >> * The verbose/_source option teaches you exactly what named tuple does. >> That makes the tool relatively easy to learn, understand, and debug. >> >> I really don't want to throw away these benefits to save a couple of >> milliseconds. As Nick Coghlan recently posted, "Speed isn't everything, >> and it certainly isn't adequate justification for breaking public APIs that >> have been around for years." >> >> FWIW, the template/exec implementation has had excellent benefits for >> maintainability making it very easy to fix and update. As other parts of >> Python have changed (limitations on number of arguments, what is allowed as >> an identifier, etc), it mostly automatically stays in sync with the rest of >> the language. >> >> ISTM this issue is being pressed by micro-optimizers who are being very >> aggressive and not responding to actual user needs (it is more an invented >> issue than a rea
Re: [Python-Dev] Impact of Namedtuple on startup time
My apologies, I misunderstood what had been proposed (and rejected). So it sounds like the _source is a pre-requisite for the current exec-based implementation, but the proposal is to replace with a non-exec-based implementation, meaning _source would no longer be needed for the module to work and might be eliminated. But _source could continue to be generated lazily (and cached if thought helpful) using an @property, so even the (apparently rare) uses of _source would continue to work. This would in some sense be a DRY violation, but of a very pragmatic Pythonic sort, where we have two implementations, one for documentation and one for efficiency. How different would this be from all those modules that have both Python and C implementations? On 17 July 2017 at 09:31, Antoine Pitrou wrote: > > Le 17/07/2017 à 15:26, Isaac Morland a écrit : > > > > I think I understand well enough to say something intelligent… > > > > While actual references to _source are likely rare (certainly I’ve never > > used it), my understanding is that the way namedtuple works is to > > construct _source, and then exec it to create the class. Once that is > > done, there is no significant saving to be had by throwing away the > > constructed _source value. > > The proposed resolution on https://bugs.python.org/issue28638 is to > avoid exec() on most parts of the namedtuple class, hence speeding up > the class creation. > > > I come from > > a non-Pythonic background so use of exec still feels a bit weird to me > > but I absolutely love namedtuple and use it constantly. > > I think for most Python programmers, it still feels a bit un-Pythonic. > While exec() is part of Python, it's generally only used in fringe cases > where nothing else works. > > Regards > > Antoine. > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > isaac.morland%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On 2017-07-17 21:31, Giampaolo Rodola' wrote: I completely agree. I love namedtuples but I've never been too happy about the additional overhead vs. plain tuples (both for creation and attribute access times), to the point that I explicitly avoid to use them in certain circumstances (e.g. a busy loop) and only for public end-user APIs returning multiple values. To be entirely honest, I'm not even sure why they need to be forcefully declared upfront in the first place, instead of just having a first-class function (builtin?) written in C: >>> ntuple(x=1, y=0) (x=1, y=0) ...or even a literal as in: >>> (x=1, y=0) (x=1, y=0) [snip] I know it's a bit early to bikeshed, but shouldn't that be: >>> (x: 1, y: 0) (x: 1, y: 0) instead if it's a display/literal? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On 2017-07-17 21:46, MRAB wrote: On 2017-07-17 21:31, Giampaolo Rodola' wrote: I completely agree. I love namedtuples but I've never been too happy about the additional overhead vs. plain tuples (both for creation and attribute access times), to the point that I explicitly avoid to use them in certain circumstances (e.g. a busy loop) and only for public end-user APIs returning multiple values. To be entirely honest, I'm not even sure why they need to be forcefully declared upfront in the first place, instead of just having a first-class function (builtin?) written in C: >>> ntuple(x=1, y=0) (x=1, y=0) ...or even a literal as in: >>> (x=1, y=0) (x=1, y=0) [snip] I know it's a bit early to bikeshed, but shouldn't that be: >>> (x: 1, y: 0) (x: 1, y: 0) instead if it's a display/literal? Actually, come to think of it, a dict's keys would be quoted, so there would be a slight inconsistency there... ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On 07/17/2017 10:31 PM, Giampaolo Rodola' wrote: I completely agree. I love namedtuples but I've never been too happy about the additional overhead vs. plain tuples (both for creation and attribute access times), to the point that I explicitly avoid to use them in certain circumstances (e.g. a busy loop) and only for public end-user APIs returning multiple values. To be entirely honest, I'm not even sure why they need to be forcefully declared upfront in the first place, instead of just having a first-class function (builtin?) written in C: >>> ntuple(x=1, y=0) (x=1, y=0) ...or even a literal as in: >>> (x=1, y=0) (x=1, y=0) Most of the times this is what I really want: quickly returning an anonymous tuple with named attributes and nothing else, similarly to os.times() & others. [...] It seems that you want `types.SimpleNamespace(x=1, y=0)`. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Mon, Jul 17, 2017 at 11:07 PM, Petr Viktorin wrote: > On 07/17/2017 10:31 PM, Giampaolo Rodola' wrote: > >> I completely agree. I love namedtuples but I've never been too happy >> about the additional overhead vs. plain tuples (both for creation and >> attribute access times), to the point that I explicitly avoid to use them >> in certain circumstances (e.g. a busy loop) and only for public end-user >> APIs returning multiple values. >> >> To be entirely honest, I'm not even sure why they need to be forcefully >> declared upfront in the first place, instead of just having a first-class >> function (builtin?) written in C: >> >> >>> ntuple(x=1, y=0) >> (x=1, y=0) >> >> ...or even a literal as in: >> >> >>> (x=1, y=0) >> (x=1, y=0) >> >> Most of the times this is what I really want: quickly returning an >> anonymous tuple with named attributes and nothing else, similarly to >> os.times() & others. [...] >> > > It seems that you want `types.SimpleNamespace(x=1, y=0)`. > That doesn't support indexing (obj[0]). -- Giampaolo - http://grodola.blogspot.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
namedtuple is great and clever, but it’s also a bit clunky. It has a weird signature and requires a made up type name. It’s also rather unPythonic if you want to support default arguments when creating namedtuple instances. Maybe as you say, a lot of the typical use cases for namedtuples could be addressed by a better builtin, but I fear we’ll end up down the bikeshedding hole for that. -Barry > On Jul 17, 2017, at 16:31, Giampaolo Rodola' wrote: > > I completely agree. I love namedtuples but I've never been too happy about > the additional overhead vs. plain tuples (both for creation and attribute > access times), to the point that I explicitly avoid to use them in certain > circumstances (e.g. a busy loop) and only for public end-user APIs returning > multiple values. > > To be entirely honest, I'm not even sure why they need to be forcefully > declared upfront in the first place, instead of just having a first-class > function (builtin?) written in C: > > >>> ntuple(x=1, y=0) > (x=1, y=0) > > ...or even a literal as in: > > >>> (x=1, y=0) > (x=1, y=0) > > Most of the times this is what I really want: quickly returning an anonymous > tuple with named attributes and nothing else, similarly to os.times() & > others. I believe that if something like this would exist we would witness a > big transition from tuple() to ntuple() for all those functions returning > more than 1 value. We witnessed a similar transition in many parts of the > stdlib when collections.namedtuple was first introduced, but not everywhere, > probably because declaring a namedtuple is more work, it's more expensive, > and it still feels like you're dealing with some kind of too high-level > second-class citizen with too much overhead and too many sugar in terms of > API (e.g. "verbose", "rename", "module" and "_source"). signature.asc Description: Message signed with OpenPGP ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
[Giampaolo Rodola' ] > > To be entirely honest, I'm not even sure why they need to be forcefully > declared upfront in the first place, instead of just having a first-class > function (builtin?) written in C: > > >>> ntuple(x=1, y=0) > (x=1, y=0) > > ...or even a literal as in: > > >>> (x=1, y=0) > (x=1, y=0) How do you propose that the resulting object T know that T.x is 1. T.y is 0, and T.z doesn't make sense? Declaring a namedtuple up front allows the _class_ to know that all of its instances map attribute "x" to index 0 and attribute "y" to index 1. The instances know nothing about that on their own, and consume no more memory than a plain tuple. If your `ntuple()` returns an object implementing its own mapping, it loses a primary advantage (0 memory overhead) of namedtuples. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Mon, 17 Jul 2017 at 13:28 Raymond Hettinger wrote: > > > On Jul 17, 2017, at 8:49 AM, Guido van Rossum wrote: > > > > One of the reasons to be wary of exec()/eval() other than the usual > security concerns is that in some Python implementations they have a high > overhead to initialize the parser and compiler. (Even in CPython it's not > that fast.) > > BTW, if getting rid of the template/exec pair is a goal, Joe Jevnik > proposed a patch a couple of years ago the completely reimplemented > namedtuple() in C. The patch was somewhat complex and hard to semantic > equivalence, but we could resurrect it and clean it up. That way, we > could like the existing namedtuple() code in-place and do a subsequent > import from the C-version. > > This path won't be fun (whenever we have both a C version and Python > version, we get years of trying to sync-up tiny differences); however, it > will give you take fastest startup times, the fastest lookups at runtime, > and eliminate use of exec. > I vaguely remember some years ago someone proposing a patch that used metaclasses to avoid using exec() (I think it was to benefit PyPy or one of the JIT-backed interpreters). Would that work to remove the need for exec() while keeping the code in pure Python? As for removing exec() as a goal, I'll back up Christian's point and the one Steve made at the language summit that removing the use of exec() from the critical path in Python is a laudable goal from a security perspective. -Brett ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
Just for sake of completeness since people are talking about a namedtuple overhaul, I have a couple implementations here - https://github.com/jsbueno/extradict/blob/master/extradict/extratuple.py If any idea there can help inspiring someone, I will be happy. js -><- On 17 July 2017 at 18:26, Barry Warsaw wrote: > namedtuple is great and clever, but it’s also a bit clunky. It has a > weird signature and requires a made up type name. It’s also rather > unPythonic if you want to support default arguments when creating > namedtuple instances. Maybe as you say, a lot of the typical use cases for > namedtuples could be addressed by a better builtin, but I fear we’ll end up > down the bikeshedding hole for that. > > -Barry > > > On Jul 17, 2017, at 16:31, Giampaolo Rodola' wrote: > > > > I completely agree. I love namedtuples but I've never been too happy > about the additional overhead vs. plain tuples (both for creation and > attribute access times), to the point that I explicitly avoid to use them > in certain circumstances (e.g. a busy loop) and only for public end-user > APIs returning multiple values. > > > > To be entirely honest, I'm not even sure why they need to be forcefully > declared upfront in the first place, instead of just having a first-class > function (builtin?) written in C: > > > > >>> ntuple(x=1, y=0) > > (x=1, y=0) > > > > ...or even a literal as in: > > > > >>> (x=1, y=0) > > (x=1, y=0) > > > > Most of the times this is what I really want: quickly returning an > anonymous tuple with named attributes and nothing else, similarly to > os.times() & others. I believe that if something like this would exist we > would witness a big transition from tuple() to ntuple() for all those > functions returning more than 1 value. We witnessed a similar transition in > many parts of the stdlib when collections.namedtuple was first introduced, > but not everywhere, probably because declaring a namedtuple is more work, > it's more expensive, and it still feels like you're dealing with some kind > of too high-level second-class citizen with too much overhead and too many > sugar in terms of API (e.g. "verbose", "rename", "module" and "_source"). > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > jsbueno%40python.org.br > > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
Barry Warsaw wrote: namedtuple is great and clever, but it’s also a bit clunky. It has a weird signature and requires a made up type name. Maybe a metaclass could be used to make something like this possible: class Foo(NamedTuple, fields = 'x,y,z'): ... Then the name is explicit and you get to add methods etc. if you want. -- Greg ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On 07/17/2017 02:26 PM, Barry Warsaw wrote: namedtuple is great and clever, but it’s also a bit clunky. It has a weird > signature and requires a made up type name. It’s also rather unPythonic if > you want to support default arguments when creating namedtuple instances. > Maybe as you say, a lot of the typical use cases for namedtuples could be > addressed by a better builtin, but I fear we’ll end up down the bikeshedding > hole for that. My aenum library [1] has a metaclass-based NamedTuple that allows for default arguments as well as other goodies (which would probably not make it to the stdlib since they are mostly fluff). -- ~Ethan~ [1] https://pypi.python.org/pypi/aenum ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On 07/17/2017 02:31 PM, Brett Cannon wrote: I vaguely remember some years ago someone proposing a patch that used metaclasses to avoid using exec() (I think it was to benefit PyPy or one of the JIT-backed interpreters). Would that work to remove the need for exec() while keeping the code in pure Python? The aenum library [1] uses the same techniques as Enum for a metaclass-based namedtuple. I don't expect it to be faster, but somebody could do the benchmarks and then we'd know for sure. ;) -- ~Ethan~ [1] https://pypi.python.org/pypi/aenum ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On 07/17/2017 03:27 PM, Greg Ewing wrote: Barry Warsaw wrote: namedtuple is great and clever, but it’s also a bit clunky. It has a weird signature and requires a made up type name. Maybe a metaclass could be used to make something like this possible: class Foo(NamedTuple, fields = 'x,y,z'): ... Then the name is explicit and you get to add methods etc. if you want. From the NamedTuple tests from my aenum library [1]: LifeForm = NamedTuple('LifeForm', 'branch genus species', module=__name__) class DeathForm(NamedTuple): color = 0 rigidity = 1 odor = 2 class WhatsIt(NamedTuple): def what(self): return self[0] class ThatsIt(WhatsIt): blah = 0 bleh = 1 class Character(NamedTuple): # second argument is doc string name = 0 gender = 1, None, 'male' klass = 2, None, 'fighter' class Point(NamedTuple): x = 0, 'horizondal coordinate', 0 y = 1, 'vertical coordinate', 0 class Point(NamedTuple): x = 0, 'horizontal coordinate', 1 y = 1, 'vertical coordinate', -1 class Color(NamedTuple): r = 0, 'red component', 11 g = 1, 'green component', 29 b = 2, 'blue component', 37 Pixel1 = NamedTuple('Pixel', Point+Color, module=__name__) class Pixel2(Point, Color): "a colored dot" class Pixel3(Point): r = 2, 'red component', 11 g = 3, 'green component', 29 b = 4, 'blue component', 37 -- ~Ethan~ [1] https://pypi.python.org/pypi/aenum ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Mon, Jul 17, 2017 at 11:24 PM, Tim Peters wrote: > [Giampaolo Rodola' ] > > > > To be entirely honest, I'm not even sure why they need to be forcefully > > declared upfront in the first place, instead of just having a first-class > > function (builtin?) written in C: > > > > >>> ntuple(x=1, y=0) > > (x=1, y=0) > > > > ...or even a literal as in: > > > > >>> (x=1, y=0) > > (x=1, y=0) > > How do you propose that the resulting object T know that T.x is 1. T.y > is 0, and T.z doesn't make sense? I'm not sure I understand your concern. That's pretty much what PyStructSequence already does. > Declaring a namedtuple up front > allows the _class_ to know that all of its instances map attribute "x" > to index 0 and attribute "y" to index 1. The instances know nothing > about that on their own Hence why I was talking about a "(lightweight) anonymous tuple with named attributes". The primary use case for namedtuples is accessing values by name (obj.x). Personally I've always considered the upfront module-level declaration only an annoyance which unnecessarily pollutes the API and adds extra overhead. I typically end up putting all namedtuples in a private module: https://github.com/giampaolo/psutil/blob/8b8da39e0c62432504fb5f67c418715aad35b291/psutil/_common.py#L156-L225 ...then import them from elsewhere and make sure they are not exposed publicly because the intermediate object returned by collections.namedtuple() is basically useless for the end-user. Also picking up a sensible name for the namedtuple is an annoyance and kinda weird. Consider this: from collections import namedtuple Coordinates = namedtuple('coordinates', ['x', 'y']) def get_coordinates(): return Coordinates(10, 20) ...vs. this: def get_coordinates(): return ntuple(x=10, y=20) ...or this: def get_coordinates(): return (x=10, y=20) If your `ntuple()` returns an object implementing its own > mapping, it loses a primary advantage (0 memory overhead) of > namedtuples. > The extra memory overhead is a price I would be happy to pay considering that collections.namedtuple is considerably slower than a plain tuple. Other than the additional overhead on startup / import time, instantiation is 4.5x slower than a plain tuple: $ python3.7 -m timeit -s "from collections import namedtuple; nt = namedtuple('xxx', ('x', 'y'))" "nt(1, 2)" 100 loops, best of 5: 313 nsec per loop $ python3.7 -m timeit "tuple((1, 2))" 500 loops, best of 5: 68.4 nsec per loop ...and name access is 2x slower than index access: $ python3.7 -m timeit -s "from collections import namedtuple; nt = namedtuple('xxx', ('x', 'y')); x = nt(1, 2)" "x.x" 500 loops, best of 5: 41.9 nsec per loop $ python3.7 -m timeit -s "from collections import namedtuple; nt = namedtuple('xxx', ('x', 'y')); x = nt(1, 2)" "x[0]" 1000 loops, best of 5: 20.2 nsec per loop $ python3.7 -m timeit -s "x = (1, 2)" "x[0]" 1000 loops, best of 5: 20.5 nsec per loop -- Giampaolo - http://grodola.blogspot.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
Raymond agreed to reopen the issue. Everyone who's eager to redesign namedtuple, please go to python-ideas. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Mon, Jul 17, 2017 at 6:27 PM, Greg Ewing wrote: > > > Maybe a metaclass could be used to make something > like this possible: > > >class Foo(NamedTuple, fields = 'x,y,z'): > ... > > If you think of it, collection.namedtuple *is* a metaclass. A simple wrapper will make it usable as such: import collections def namedtuple(name, bases, attrs, fields=()): # Override __init_subclass__ for Python 3.6 return collections.namedtuple(name, fields) class Foo(metaclass=namedtuple, fields='x,y'): pass print(Foo(1, 2)) # ---> Foo(x=1, y=2) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Jul 17, 2017, at 18:27, Greg Ewing wrote: > > Barry Warsaw wrote: >> namedtuple is great and clever, but it’s also a bit clunky. It has a weird >> signature and requires a made up type name. > > Maybe a metaclass could be used to make something > like this possible: > > > class Foo(NamedTuple, fields = 'x,y,z'): > ... > > Then the name is explicit and you get to add methods > etc. if you want. Yes, I like how that reads. -Barry signature.asc Description: Message signed with OpenPGP ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Mon, Jul 17, 2017 at 09:31:20PM +, Brett Cannon wrote: > As for removing exec() as a goal, I'll back up Christian's point and the > one Steve made at the language summit that removing the use of exec() from > the critical path in Python is a laudable goal from a security perspective. I'm sorry, I don't understand this point. What do you mean by "critical path"? Is the intention to remove exec from builtins? From the entire language? If not, how does its use in namedtuple introduce a security problem? -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Mon, Jul 17, 2017 at 8:16 PM, Barry Warsaw wrote: > .. > > class Foo(NamedTuple, fields = 'x,y,z'): > > ... > > > > Then the name is explicit and you get to add methods > > etc. if you want. > > Yes, I like how that reads. > > I would prefer class Foo(metaclass=namedtuple, fields = 'x,y,z'): ... which while slightly more verbose, does not lie about what namedtuple is - a factory of classes. This, however is completely orthogonal to the issue of performance. As I mentioned in my previous post, namedtuple metaclass above can be a simple function: def namedtuple(name, bases, attrs, fields=()): # Override __init_subclass__ for Python 3.6 return collections.namedtuple(name, fields) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Tue, Jul 18, 2017 at 01:17:24AM +0200, Giampaolo Rodola' wrote: > The extra memory overhead is a price I would be happy to pay considering > that collections.namedtuple is considerably slower than a plain tuple. > Other than the additional overhead on startup / import time, instantiation > is 4.5x slower than a plain tuple: > > $ python3.7 -m timeit -s "from collections import namedtuple; nt = > namedtuple('xxx', ('x', 'y'))" "nt(1, 2)" > 100 loops, best of 5: 313 nsec per loop > > $ python3.7 -m timeit "tuple((1, 2))" > 500 loops, best of 5: 68.4 nsec per loop I don't think that is a fair comparision. As far as I can tell, that gets compiled to a name lookup for "tuple" which then returns its argument unchanged, the tuple itself being constant-folded at compile time. py> dis.dis("tuple((1, 2))") 1 0 LOAD_NAME0 (tuple) 3 LOAD_CONST 2 ((1, 2)) 6 CALL_FUNCTION1 (1 positional, 0 keyword pair) 9 RETURN_VALUE -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On Jul 17, 2017 5:28 PM, "Steven D'Aprano" wrote: On Mon, Jul 17, 2017 at 09:31:20PM +, Brett Cannon wrote: > As for removing exec() as a goal, I'll back up Christian's point and the > one Steve made at the language summit that removing the use of exec() from > the critical path in Python is a laudable goal from a security perspective. I'm sorry, I don't understand this point. What do you mean by "critical path"? Is the intention to remove exec from builtins? From the entire language? If not, how does its use in namedtuple introduce a security problem? I think the intention is to allow users with a certain kind of security requirement to opt in to a restricted version of the language that doesn't support exec. This is difficult if the stdlib is calling exec all over the place. But nobody is suggesting to change the language in regular usage, just provide another option. -n ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
On 07/17/2017 04:45 PM, Guido van Rossum wrote: Raymond agreed to reopen the issue. Everyone who's eager to redesign namedtuple, please go to python-ideas. Python Ideas thread started. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Impact of Namedtuple on startup time
17.07.17 15:43, Antoine Pitrou пише: Cost of creating a namedtuple has been identified as a contributor to Python startup time. Not only Python core and the stdlib, but any third-party library creating namedtuple classes (there are many of them). An issue was created for this: https://bugs.python.org/issue28638 Raymond decided to close the issue because: 1) the proposed resolution makes the "_source" attribute empty (or, at least, something else than it currently is). Raymond claims the "_source" attribute is an essential feature of namedtuples. 2) optimizing startup cost is supposedly not worth the effort. The implementations of namedtuple that don't use compilation were written by different developers (including me) multiple times before issue28638. I provided my patch in issue28638 as an example, but I understand Raymond's arguments, and they look weighty to me. I don't know how much the _source attribute is used, but it is a part of public API. The drawback of these implementation is slower __new__ and __repr__ methods. This can be avoided if use compilation for creating __new__, but this makes the creation of a namedtuple class slower (but still faster than compiling full namedtuple class). The drawback of generating _source without using it to create a namedtuple class is complicating the code and possible quickly desynchronization of two implementations in future. I think that the right solution of this issue is generalizing the import machinery and allowing it to cache not just files, but arbitrary chunks of code. We already use precompiled bytecode files for exactly same goal -- speed up the startup by avoiding compilation. This solution could be used for caching other generated code, not just namedtuples. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com