[Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Antoine Pitrou

Hello,

Cost of creating a namedtuple has been identified as a contributor to
Python startup time.  Not only Python core and the stdlib, but any
third-party library creating namedtuple classes (there are many of
them).  An issue was created for this:
https://bugs.python.org/issue28638

Raymond decided to close the issue because:

1) the proposed resolution makes the "_source" attribute empty (or, at
least, something else than it currently is).  Raymond claims the
"_source" attribute is an essential feature of namedtuples.

2) optimizing startup cost is supposedly not worth the effort.


To this, I will counter-argument:

As for 1), a search for "namedtuple" and "_source" in a code search
engine (*) brings *only* false positives of different kinds:

* clones of the CPython repo
* copies of the namedtuple class instantiation source code with slight
  tweaks (*not* reading the _source attribute of an existing namedtuple)
* modules using namedtuples and also using a "_source" attribute on
  unrelated objects

(*) https://searchcode.com/?q=namedtuple+_source


As for 2), startup time is actually a very important consideration
nowadays, both for small scripts *and* for interactive use with the
now very wide-spread use of Jupyter Notebooks.  A 1 ms. cost when
importing a single module can translate into a large slowdown when your
library imports (directly or indirectly) hundreds of modules, many of
which may create their own namedtuple classes.


Nick pointed out that one alternative is to make the C-written "struct
sequence" class user-visible.

My opinion is that, while better than nothing, this would complicate
things by exposing two very similar primitives in the stdlib, without
there being a clear choice for users.  Should I use the well-known
namedtuple?  Should I use the new-ish "struct sequence", with similar
characteristics and better performance, but worse compatibility (now I
have to write fallback code for Python versions where the "struct
sequence" isn't exposed)?

And not to mention all third-party libraries must be migrated to the
newly-exposed "struct sequence" + compatibility fallback code...


So my take is:

1) Usage of "_source" in open source code (as per the search above)
seems non-existent.

2) If the primary intent of "_source" is to show-case how to write a
tuple subclass, well, why not write a recipe or tutorial somewhere?
The Python stdlib is generally not a place where we reify tutorials or
educational snippets as public APIs.

3) The well-known namedtuple would really benefit from a performance
boost, without asking all maintainers of dependent code (that's a
*ton*) to migrate to a new idiom + compatibility fallback.


Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Antoine Pitrou
On Mon, 17 Jul 2017 14:43:19 +0200
Antoine Pitrou  wrote:
> Hello,
> 
> Cost of creating a namedtuple has been identified as a contributor to
> Python startup time.

Imprecise wording: that's the cost of creating a namedtuple *class*,
i.e. anytime someone writes `MyClass = namedtuple('MyClass', ...)`.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Ivan Levkivskyi
Interesting coincidence, just two days ago I have heard that a team at one
large company completely abandoned namedtuple because of the creation time
problem.

Concerning _source, why it is not possible to make it a property so that
all the string formatting will happen on request, thus saving some time for
users who doesn't need it?
(Of course this will not be an actual source, but it can be made
practically equivalent to the no-compile version.)

--
Ivan


On 17 July 2017 at 14:53, Antoine Pitrou  wrote:

> On Mon, 17 Jul 2017 14:43:19 +0200
> Antoine Pitrou  wrote:
> > Hello,
> >
> > Cost of creating a namedtuple has been identified as a contributor to
> > Python startup time.
>
> Imprecise wording: that's the cost of creating a namedtuple *class*,
> i.e. anytime someone writes `MyClass = namedtuple('MyClass', ...)`.
>
> Regards
>
> Antoine.
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> levkivskyi%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Isaac Morland
On 17 July 2017 at 08:43, Antoine Pitrou  wrote:

>
> Hello,
>
> Cost of creating a namedtuple has been identified as a contributor to
> Python startup time.  Not only Python core and the stdlib, but any
> third-party library creating namedtuple classes (there are many of
> them).  An issue was created for this:
> https://bugs.python.org/issue28638
>
> Raymond decided to close the issue because:
>
> 1) the proposed resolution makes the "_source" attribute empty (or, at
> least, something else than it currently is).  Raymond claims the
> "_source" attribute is an essential feature of namedtuples.
>

I think I understand well enough to say something intelligent…

While actual references to _source are likely rare (certainly I’ve never
used it), my understanding is that the way namedtuple works is to construct
_source, and then exec it to create the class. Once that is done, there is
no significant saving to be had by throwing away the constructed _source
value.

When namedtuple was being considered for inclusion, I actually went so far
as to write a proof-of-concept version that worked by creating a class,
creating attributes on it, etc. I don’t remember how far I got but the exec
version is the version included in the stdlib. I come from a non-Pythonic
background so use of exec still feels a bit weird to me but I absolutely
love namedtuple and use it constantly. I don't know whether a polished and
completed version of my idea could be faster than using exec, but I
wouldn't expect a major saving — a whole bunch of code has to run either
way.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Antoine Pitrou
On Mon, 17 Jul 2017 15:03:26 +0200
Ivan Levkivskyi  wrote:
> Interesting coincidence, just two days ago I have heard that a team at one
> large company completely abandoned namedtuple because of the creation time
> problem.
> 
> Concerning _source, why it is not possible to make it a property so that
> all the string formatting will happen on request, thus saving some time for
> users who doesn't need it?

It was proposed in https://bugs.python.org/issue19640 but rejected.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Antoine Pitrou

Le 17/07/2017 à 15:26, Isaac Morland a écrit :
> 
> I think I understand well enough to say something intelligent…
> 
> While actual references to _source are likely rare (certainly I’ve never
> used it), my understanding is that the way namedtuple works is to
> construct _source, and then exec it to create the class. Once that is
> done, there is no significant saving to be had by throwing away the
> constructed _source value.

The proposed resolution on https://bugs.python.org/issue28638 is to
avoid exec() on most parts of the namedtuple class, hence speeding up
the class creation.

> I come from
> a non-Pythonic background so use of exec still feels a bit weird to me
> but I absolutely love namedtuple and use it constantly.

I think for most Python programmers, it still feels a bit un-Pythonic.
While exec() is part of Python, it's generally only used in fringe cases
where nothing else works.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Facundo Batista
On Mon, Jul 17, 2017 at 9:43 AM, Antoine Pitrou  wrote:

> As for 2), startup time is actually a very important consideration
> nowadays, both for small scripts *and* for interactive use with the
> now very wide-spread use of Jupyter Notebooks.  A 1 ms. cost when
> importing a single module can translate into a large slowdown when your
> library imports (directly or indirectly) hundreds of modules, many of
> which may create their own namedtuple classes.

My experience inside Canonical is that golang stole a lot of "codebase
share" from Python, and (others and mine) talks hit two walls, mainly:
one is memory consumption, and the other is startup time.

So yes, startup time is important for user-faced scripts and services.

Regards,

-- 
.Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/
Twitter: @facundobatista
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Raymond Hettinger

> On Jul 17, 2017, at 6:31 AM, Antoine Pitrou  wrote:
> 
>> I think I understand well enough to say something intelligent…
>> 
>> While actual references to _source are likely rare (certainly I’ve never
>> used it), my understanding is that the way namedtuple works is to
>> construct _source, and then exec it to create the class. Once that is
>> done, there is no significant saving to be had by throwing away the
>> constructed _source value.

There are considerable benefits to namedtuple being able to generate and match 
its own source.

* It makes it is really easy for a user to generate the code, drop it into 
another another module, and customize it.

* It makes the named tuple factory function completely self-documenting. 

* The verbose/_source option teaches you exactly what named tuple does.  That 
makes the tool relatively easy to learn, understand, and debug.

I really don't want to throw away these benefits to save a couple of 
milliseconds.   As Nick Coghlan recently posted, "Speed isn't everything, and 
it certainly isn't adequate justification for breaking public APIs that have 
been around for years."

FWIW, the template/exec implementation has had excellent benefits for 
maintainability making it very easy to fix and update.  As other parts of 
Python have changed (limitations on number of arguments, what is allowed as an 
identifier, etc), it mostly automatically stays in sync with the rest of the 
language.

ISTM this issue is being pressed by micro-optimizers who are being very 
aggressive and not responding to actual user needs (it is more an invented 
issue than a real one).  Named tuple has been around for a long time and users 
have been somewhat happy with it.

If someone truly cares about the exec time for a particular named tuple, the 
_source option makes it trivially easy to just replace the generator call with 
the expanded code in that particular circumstance.


Raymond


P.S. I'm fully supportive of Victor's efforts to build-out structseq to make it 
sufficiently expressive to do more of what collections.namedtuple() does.  That 
is a perfectly reasonable path to optimization. We've wanted that for a long 
time and no one has had the spare clock cycles to make it come true.

  
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Barry Warsaw
On Jul 17, 2017, at 10:59, Raymond Hettinger  
wrote:
> 
> ISTM this issue is being pressed by micro-optimizers who are being very 
> aggressive and not responding to actual user needs (it is more an invented 
> issue than a real one).  Named tuple has been around for a long time and 
> users have been somewhat happy with it.

Regardless of whether this particular optimization is a good idea or not, start 
up time *is* a serious challenge in many environments for CPython in particular 
and the perception of Python’s applicability to many problems.  I think we’re 
better off trying to identify and address such problems than ignoring or 
minimizing them.

Cheers,
-Barry



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Victor Stinner
2017-07-17 16:56 GMT+02:00 Facundo Batista :
> My experience inside Canonical is that golang stole a lot of "codebase
> share" from Python, and (others and mine) talks hit two walls, mainly:
> one is memory consumption, and the other is startup time.
>
> So yes, startup time is important for user-faced scripts and services.

Removing the _source attribute would allow to:

(1) Reduce the memory consumption

http://bugs.python.org/issue19640#msg213949

(2) Pyhon startup up time

https://bugs.python.org/issue28638#msg280277

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Christian Heimes
On 2017-07-17 14:43, Antoine Pitrou wrote:
> So my take is:
> 
> 1) Usage of "_source" in open source code (as per the search above)
> seems non-existent.
> 
> 2) If the primary intent of "_source" is to show-case how to write a
> tuple subclass, well, why not write a recipe or tutorial somewhere?
> The Python stdlib is generally not a place where we reify tutorials or
> educational snippets as public APIs.
> 
> 3) The well-known namedtuple would really benefit from a performance
> boost, without asking all maintainers of dependent code (that's a
> *ton*) to migrate to a new idiom + compatibility fallback.

I have an additional take on named tuples

4) The current approach uses exec() to generate the namedtuple class on
the fly. The exec() function isn't necessarily evil and the use of
exec() in namedtuple is safe. However I would appreciate if Python
interpreter could be started without requiring the exec() function. It
would make it easier to harden the interpreter for embedding and system
integration uses cases.

It's not about sandboxing Python. My goal is to make it harder to abuse
Python. See Steve's lighting talk "Python as a security vulnerability"
at the language summit, https://lwn.net/Articles/723823/ .

Christian


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Steve Holden
On Mon, Jul 17, 2017 at 3:59 PM, Raymond Hettinger <
raymond.hettin...@gmail.com> wrote:

> I really don't want to throw away these benefits to save a couple of
> milliseconds.   As Nick Coghlan recently posted, "Speed isn't everything,
> and it certainly isn't adequate justification for breaking public APIs that
> have been around for years."


​My only question is "what's a variable called _source doing in the public
API?"

regards
 Steve​


Steve Holden
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Raymond Hettinger

> On Jul 17, 2017, at 8:22 AM, Steve Holden  wrote:
> 
> My only question is "what's a variable called _source doing in the public 
> API?"

The convention for named tuple hnas been for all the methods and attributes to 
be prefixed with an underscore so that the names won't conflict with field 
names in the named tuple itself.  For example, we want to allow 
Path=namedtuple('Path', ['source', 'destination']).

If I had it all to do over again, it might have been better to have had a 
different convention like source_ with a trailing underscore, but that ship 
sailed long ago :-)


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Steve Holden
Makes sense. Thanks.  S

Steve Holden

On Mon, Jul 17, 2017 at 4:29 PM, Raymond Hettinger <
raymond.hettin...@gmail.com> wrote:

>
> > On Jul 17, 2017, at 8:22 AM, Steve Holden  wrote:
> >
> > My only question is "what's a variable called _source doing in the
> public API?"
>
> The convention for named tuple hnas been for all the methods and
> attributes to be prefixed with an underscore so that the names won't
> conflict with field names in the named tuple itself.  For example, we want
> to allow Path=namedtuple('Path', ['source', 'destination']).
>
> If I had it all to do over again, it might have been better to have had a
> different convention like source_ with a trailing underscore, but that ship
> sailed long ago :-)
>
>
> Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Guido van Rossum
I am firmly with Antoine here. The cumulative startup time of large Python
programs is a serious problem and namedtuple is one of the major
contributors -- especially because it is so convenient that it is
ubiquitous. The approach of generating source code and exec()ing it, is a
cool demonstration of Python's expressive power, but it's always been my
sense that whenever we encounter a popular idiom that uses exec() and
eval(), we should augment the language (or the builtins) to avoid these
calls -- that's for example how we ended up with getattr().

One of the reasons to be wary of exec()/eval() other than the usual
security concerns is that in some Python implementations they have a high
overhead to initialize the parser and compiler. (Even in CPython it's not
that fast.)

Regarding the argument that it's easier to learn what namedtuple does if
the generated source is available, while I don't feel this is important,
supposedly it is important to Raymond. But surely there are other
approaches possible that work just as well in an educational setting while
being more efficient in production use. (E.g. the approach taken by
itertools, where the docs show equivalent Python code.)

Concluding, I think we should move on from the original implementation and
optimize the heck out of namedtuple. The original has served us well. The
world is constantly changing. Python should adapt to the (happy) fact that
it's being used for systems larger than any of us could imagine 15 years
ago.

--Guido

On Mon, Jul 17, 2017 at 7:59 AM, Raymond Hettinger <
raymond.hettin...@gmail.com> wrote:

>
> > On Jul 17, 2017, at 6:31 AM, Antoine Pitrou  wrote:
> >
> >> I think I understand well enough to say something intelligent…
> >>
> >> While actual references to _source are likely rare (certainly I’ve never
> >> used it), my understanding is that the way namedtuple works is to
> >> construct _source, and then exec it to create the class. Once that is
> >> done, there is no significant saving to be had by throwing away the
> >> constructed _source value.
>
> There are considerable benefits to namedtuple being able to generate and
> match its own source.
>
> * It makes it is really easy for a user to generate the code, drop it into
> another another module, and customize it.
>
> * It makes the named tuple factory function completely self-documenting.
>
> * The verbose/_source option teaches you exactly what named tuple does.
> That makes the tool relatively easy to learn, understand, and debug.
>
> I really don't want to throw away these benefits to save a couple of
> milliseconds.   As Nick Coghlan recently posted, "Speed isn't everything,
> and it certainly isn't adequate justification for breaking public APIs that
> have been around for years."
>
> FWIW, the template/exec implementation has had excellent benefits for
> maintainability making it very easy to fix and update.  As other parts of
> Python have changed (limitations on number of arguments, what is allowed as
> an identifier, etc), it mostly automatically stays in sync with the rest of
> the language.
>
> ISTM this issue is being pressed by micro-optimizers who are being very
> aggressive and not responding to actual user needs (it is more an invented
> issue than a real one).  Named tuple has been around for a long time and
> users have been somewhat happy with it.
>
> If someone truly cares about the exec time for a particular named tuple,
> the _source option makes it trivially easy to just replace the generator
> call with the expanded code in that particular circumstance.
>
>
> Raymond
>
>
> P.S. I'm fully supportive of Victor's efforts to build-out structseq to
> make it sufficiently expressive to do more of what collections.namedtuple()
> does.  That is a perfectly reasonable path to optimization. We've wanted
> that for a long time and no one has had the spare clock cycles to make it
> come true.
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Gregory P. Smith
On Mon, Jul 17, 2017 at 8:00 AM Raymond Hettinger <
raymond.hettin...@gmail.com> wrote:

>
> > On Jul 17, 2017, at 6:31 AM, Antoine Pitrou  wrote:
> >
> >> I think I understand well enough to say something intelligent…
> >>
> >> While actual references to _source are likely rare (certainly I’ve never
> >> used it), my understanding is that the way namedtuple works is to
> >> construct _source, and then exec it to create the class. Once that is
> >> done, there is no significant saving to be had by throwing away the
> >> constructed _source value.
>
> There are considerable benefits to namedtuple being able to generate and
> match its own source.
>
> * It makes it is really easy for a user to generate the code, drop it into
> another another module, and customize it.
>
> * It makes the named tuple factory function completely self-documenting.
>
> * The verbose/_source option teaches you exactly what named tuple does.
> That makes the tool relatively easy to learn, understand, and debug.
>
> I really don't want to throw away these benefits to save a couple of
> milliseconds.   As Nick Coghlan recently posted, "Speed isn't everything,
> and it certainly isn't adequate justification for breaking public APIs that
> have been around for years."
>
> FWIW, the template/exec implementation has had excellent benefits for
> maintainability making it very easy to fix and update.  As other parts of
> Python have changed (limitations on number of arguments, what is allowed as
> an identifier, etc), it mostly automatically stays in sync with the rest of
> the language.
>
> ISTM this issue is being pressed by micro-optimizers who are being very
> aggressive and not responding to actual user needs (it is more an invented
> issue than a real one).  Named tuple has been around for a long time and
> users have been somewhat happy with it.
>

Raymond, you keep repeating statements similar to "only a millisecond" and
"aggressive micro-optimizers who don't care about user needs" in your
comments on issues like this. That simply isn't true. These issues come up
in the first place *because of* users who need fast startup. Please don't
be so dismissive.

The reason people care about this has been stated many times. It isn't just
"a millisecond", it's 100s or 1000s of milliseconds in any application of
reasonable size where namedtuples were adopted as a design pattern in
various libraries.

Real world use cases for startup time mattering exist: interactive command
line tools are the most obvious one people keep citing. I'll toss another
where Python startup time has raised eyebrows at work: unittest startup and
completion time. When the bulk of a processes time is spent in startup
before hitting unittest.main(), people take notice and consider it a
problem. Developer productivity is reduced. The hacks individual developers
come up with to try and workaround things like this are not pretty.

If someone truly cares about the exec time for a particular named tuple,
> the _source option makes it trivially easy to just replace the generator
> call with the expanded code in that particular circumstance.
>

In real world applications you do not control the bulk of the code that has
chosen to use namedtuple. They're scattered through 100-1000s of other
transitive dependency libraries (not just the standard library), the
modification of each of which faces hurdles both technical and
non-technical in nature.

To me the desired resolution to this is clear: Optimize the default use
case of namedtuple and everybody wins. This isn't just about the stdlib's
namedtuple uses being fast, those a small portion of all uses in any
application where startup time matters. This is about making Python better
for the world.  ie: What Antoine's original write-up suggested in his #3.

I get that namedtuple ._source is a public API. We may need to keep it. If
so, that just means revisiting lazily generating it as a property -
issue19640.

-gps

PS - Good call on the naming hindsight! A trailing underscore would've been
nice. Oh well, too late for that.


>
> Raymond
>
>
> P.S. I'm fully supportive of Victor's efforts to build-out structseq to
> make it sufficiently expressive to do more of what collections.namedtuple()
> does.  That is a perfectly reasonable path to optimization. We've wanted
> that for a long time and no one has had the spare clock cycles to make it
> come true.
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Victor Stinner
2017-07-17 18:13 GMT+02:00 Gregory P. Smith :
> I get that namedtuple ._source is a public API. We may need to keep it. If
> so, that just means revisiting lazily generating it as a property -
> issue19640.

I agree. Technically speaking, optimizing namedtuple doesn't have to
mean "remove the _source attribute".

I wouldn't discuss here if _source should be kept or not, but even if
we rewrite the namedtuple implementation, I agree that we *can*
technically keep a _source property which would create the same Python
code. It would allow it to speedup namedtuple, reduce the memory
footprint, and have a smooth deprecation policy (*if* we decide to
deprecate this attribute).

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Steven D'Aprano
On Mon, Jul 17, 2017 at 02:43:19PM +0200, Antoine Pitrou wrote:
> 
> Hello,
> 
> Cost of creating a namedtuple has been identified as a contributor to
> Python startup time.  Not only Python core and the stdlib, but any
> third-party library creating namedtuple classes (there are many of
> them).  An issue was created for this:
> https://bugs.python.org/issue28638

Some time ago, I needed to backport a version of namedtuple to Python 
2.4, so I started with Raymond's recipe on Activestate and modified it 
to only exec the code needed for __new__. The rest of the class is an 
ordinary inner class:

# a short sketch
def namedtuple(...):
class Inner(tuple):
...
exec(source, ns)
Inner.__new__ = ns['__new__']
return Inner


Here's my fork of Raymond's recipe:

https://code.activestate.com/recipes/578918-yet-another-namedtuple/


Out of curiosity, I took that recipe, updated it to work in Python 3, 
and compared it to the std lib version. Here are some representative 
timings:

[steve@ando ~]$ python3.5 -m timeit -s "from collections import 
namedtuple" "K = namedtuple('K', 'a b c')"
1000 loops, best of 3: 1.02 msec per loop

[steve@ando ~]$ python3.5 -m timeit -s "from nt3 import namedtuple" "K = 
namedtuple('K', 'a b c')"
1000 loops, best of 3: 255 usec per loop


I think that proves that this approach is viable and can lead to a big 
speed up.

I don't think that merely dropping the _source attribute will save much 
time. It might save a bit of memory, but in my experiements dropping it 
only saves about 10µs more. I think the real bottleneck is the cost of 
exec'ing the entire class.


 
-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Jelle Zijlstra
2017-07-17 9:45 GMT-07:00 Steven D'Aprano :

> On Mon, Jul 17, 2017 at 02:43:19PM +0200, Antoine Pitrou wrote:
> >
> > Hello,
> >
> > Cost of creating a namedtuple has been identified as a contributor to
> > Python startup time.  Not only Python core and the stdlib, but any
> > third-party library creating namedtuple classes (there are many of
> > them).  An issue was created for this:
> > https://bugs.python.org/issue28638
>
> Some time ago, I needed to backport a version of namedtuple to Python
> 2.4, so I started with Raymond's recipe on Activestate and modified it
> to only exec the code needed for __new__. The rest of the class is an
> ordinary inner class:
>
> # a short sketch
> def namedtuple(...):
> class Inner(tuple):
> ...
> exec(source, ns)
> Inner.__new__ = ns['__new__']
> return Inner
>
>
> Here's my fork of Raymond's recipe:
>
> https://code.activestate.com/recipes/578918-yet-another-namedtuple/
>
>
> Out of curiosity, I took that recipe, updated it to work in Python 3,
> and compared it to the std lib version. Here are some representative
> timings:
>
> [steve@ando ~]$ python3.5 -m timeit -s "from collections import
> namedtuple" "K = namedtuple('K', 'a b c')"
> 1000 loops, best of 3: 1.02 msec per loop
>
> [steve@ando ~]$ python3.5 -m timeit -s "from nt3 import namedtuple" "K =
> namedtuple('K', 'a b c')"
> 1000 loops, best of 3: 255 usec per loop
>
>
> I think that proves that this approach is viable and can lead to a big
> speed up.
>
> I have an open pull request implementing this approach:
https://github.com/python/cpython/pull/2736. We can discuss the exact form
the code should take there (Ivan already added some good suggestions).


> I don't think that merely dropping the _source attribute will save much
> time. It might save a bit of memory, but in my experiements dropping it
> only saves about 10µs more. I think the real bottleneck is the cost of
> exec'ing the entire class.
>
>
>
> --
> Steve
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> jelle.zijlstra%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Raymond Hettinger

> On Jul 17, 2017, at 8:49 AM, Guido van Rossum  wrote:
> 
>  The approach of generating source code and exec()ing it, is a cool 
> demonstration of Python's expressive power, but it's always been my sense 
> that whenever we encounter a popular idiom that uses exec() and eval(), we 
> should augment the language (or the builtins) to avoid these calls -- that's 
> for example how we ended up with getattr().

FYI, the proposal (from Jelle) isn't to remove exec.  It is to only exec a 
smaller piece of code and make the rest of it static.  

It isn't bad idea, it just complicates the implementation (generating _source 
lazily) and the subsequence maintenance (which is currently really easy).

> Concluding, I think we should move on from the original implementation and 
> optimize the heck out of namedtuple. The original has served us well. The 
> world is constantly changing. Python should adapt to the (happy) fact that 
> it's being used for systems larger than any of us could imagine 15 years ago.

Okay, then Nick and I are overruled.  I'll move Jelle's patch forward.  We'll 
also need to lazily generate _source but I don't think that will be hard.

One minor grumble:  I think we need to give careful cost/benefit considerations 
to optimizations that complicate the implementation.  Over the last several 
years, the source for Python has grown increasingly complicated.  Fewer people 
understand it now. It is much harder to newcomers to on-ramp.  The old-timers 
(myself included) find that their knowledge is out of date.  And complexity 
leads to bugs (the C optimization of random number seeding caused a major bug 
in the 3.6.0 release; the C optimization of the lru_cache resulted in multiple 
releases having a hard to find threading bugs, etc.).  It is becoming 
increasingly difficult to look at code and tell whether it is correct (I still 
don't fully understand the implications of the recursive constant folding in 
the peephole optimizer for example).In the case of this named tuple 
proposal, the complexity is manageable, but the overall trend isn't good and I 
get the feeling the aggressive optimization is causing us to forget key par
 ts of the zen-of-python.

Cheers,


Raymond


P.S.  Ironically, a lot of my consulting work comes from people who have 
created something complex our of something that could have been simple.  So, I 
in a strange way, I should be happy about these trends -- just saying ;-)



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Raymond Hettinger

> On Jul 17, 2017, at 8:49 AM, Guido van Rossum  wrote:
> 
> One of the reasons to be wary of exec()/eval() other than the usual security 
> concerns is that in some Python implementations they have a high overhead to 
> initialize the parser and compiler. (Even in CPython it's not that fast.)

BTW, if getting rid of the template/exec pair is a goal, Joe Jevnik proposed a 
patch a couple of years ago the completely reimplemented namedtuple() in C.   
The patch was somewhat complex and hard to semantic equivalence, but we could 
resurrect it and clean it up.   That way, we could like the existing 
namedtuple() code in-place and do a subsequent import from the C-version.

This path won't be fun (whenever we have both a C version and Python version, 
we get years of trying to sync-up tiny differences); however, it will give you 
take fastest startup times, the fastest lookups at runtime, and eliminate use 
of exec.

> On Jul 17, 2017, at 8:13 AM, Barry Warsaw  wrote:


> Regardless of whether this particular optimization is a good idea or not, 
> start up time *is* a serious challenge in many environments for CPython in 
> particular and the perception of Python’s applicability to many problems.  I 
> think we’re better off trying to identify and address such problems than 
> ignoring or minimizing them.

I agree with that sentiment but think we ought to look at places where the 
payoffs would actually matter such a minimizing the number of disk accesses 
(Python performs a lot of I/O on startup).  Whenever I've addressed start-up 
time for my clients, named tuples we never the issue.  Also, it would have been 
trivially easy to replace the factory function call with the generated code, 
but that never proved necessary or beneficial.   IMO, we're about to turn the 
named tuple code into a mess but will find that most users, most of the time 
will get nearly zero benefit. 


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Giampaolo Rodola'
I completely agree. I love namedtuples but I've never been too happy about
the additional overhead vs. plain tuples (both for creation and attribute
access times), to the point that I explicitly avoid to use them in certain
circumstances (e.g. a busy loop) and only for public end-user APIs
returning multiple values.

To be entirely honest, I'm not even sure why they need to be forcefully
declared upfront in the first place, instead of just having a first-class
function (builtin?) written in C:

>>> ntuple(x=1, y=0)
(x=1, y=0)

...or even a literal as in:

>>> (x=1, y=0)
(x=1, y=0)

Most of the times this is what I really want: quickly returning an
anonymous tuple with named attributes and nothing else, similarly to
os.times() & others. I believe that if something like this would exist we
would witness a big transition from tuple() to ntuple() for all those
functions returning more than 1 value. We witnessed a similar transition in
many parts of the stdlib when collections.namedtuple was first introduced,
but not everywhere, probably because declaring a namedtuple is more work,
it's more expensive, and it still feels like you're dealing with some kind
of too high-level second-class citizen with too much overhead and too many
sugar in terms of API (e.g. "verbose", "rename", "module" and "_source").

If something like this were to happen I expect collections.namedtuple to be
used only by those who want to subclass it in order to attach methods,
whereas the rest would stick and use ntuple() pretty much everywhere (both
in "private" and "public" functions).


On Mon, Jul 17, 2017 at 5:49 PM, Guido van Rossum  wrote:

> I am firmly with Antoine here. The cumulative startup time of large Python
> programs is a serious problem and namedtuple is one of the major
> contributors -- especially because it is so convenient that it is
> ubiquitous. The approach of generating source code and exec()ing it, is a
> cool demonstration of Python's expressive power, but it's always been my
> sense that whenever we encounter a popular idiom that uses exec() and
> eval(), we should augment the language (or the builtins) to avoid these
> calls -- that's for example how we ended up with getattr().
>
> One of the reasons to be wary of exec()/eval() other than the usual
> security concerns is that in some Python implementations they have a high
> overhead to initialize the parser and compiler. (Even in CPython it's not
> that fast.)
>
> Regarding the argument that it's easier to learn what namedtuple does if
> the generated source is available, while I don't feel this is important,
> supposedly it is important to Raymond. But surely there are other
> approaches possible that work just as well in an educational setting while
> being more efficient in production use. (E.g. the approach taken by
> itertools, where the docs show equivalent Python code.)
>
> Concluding, I think we should move on from the original implementation and
> optimize the heck out of namedtuple. The original has served us well. The
> world is constantly changing. Python should adapt to the (happy) fact that
> it's being used for systems larger than any of us could imagine 15 years
> ago.
>
> --Guido
>
> On Mon, Jul 17, 2017 at 7:59 AM, Raymond Hettinger <
> raymond.hettin...@gmail.com> wrote:
>
>>
>> > On Jul 17, 2017, at 6:31 AM, Antoine Pitrou  wrote:
>> >
>> >> I think I understand well enough to say something intelligent…
>> >>
>> >> While actual references to _source are likely rare (certainly I’ve
>> never
>> >> used it), my understanding is that the way namedtuple works is to
>> >> construct _source, and then exec it to create the class. Once that is
>> >> done, there is no significant saving to be had by throwing away the
>> >> constructed _source value.
>>
>> There are considerable benefits to namedtuple being able to generate and
>> match its own source.
>>
>> * It makes it is really easy for a user to generate the code, drop it
>> into another another module, and customize it.
>>
>> * It makes the named tuple factory function completely self-documenting.
>>
>> * The verbose/_source option teaches you exactly what named tuple does.
>> That makes the tool relatively easy to learn, understand, and debug.
>>
>> I really don't want to throw away these benefits to save a couple of
>> milliseconds.   As Nick Coghlan recently posted, "Speed isn't everything,
>> and it certainly isn't adequate justification for breaking public APIs that
>> have been around for years."
>>
>> FWIW, the template/exec implementation has had excellent benefits for
>> maintainability making it very easy to fix and update.  As other parts of
>> Python have changed (limitations on number of arguments, what is allowed as
>> an identifier, etc), it mostly automatically stays in sync with the rest of
>> the language.
>>
>> ISTM this issue is being pressed by micro-optimizers who are being very
>> aggressive and not responding to actual user needs (it is more an invented
>> issue than a rea

Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Isaac Morland
My apologies, I misunderstood what had been proposed (and rejected).

So it sounds like the _source is a pre-requisite for the current exec-based
implementation, but the proposal is to replace with a non-exec-based
implementation, meaning _source would no longer be needed for the module to
work and might be eliminated. But _source could continue to be generated
lazily (and cached if thought helpful) using an @property, so even the
(apparently rare) uses of _source would continue to work.

This would in some sense be a DRY violation, but of a very pragmatic
Pythonic sort, where we have two implementations, one for documentation and
one for efficiency. How different would this be from all those modules that
have both Python and C implementations?


On 17 July 2017 at 09:31, Antoine Pitrou  wrote:

>
> Le 17/07/2017 à 15:26, Isaac Morland a écrit :
> >
> > I think I understand well enough to say something intelligent…
> >
> > While actual references to _source are likely rare (certainly I’ve never
> > used it), my understanding is that the way namedtuple works is to
> > construct _source, and then exec it to create the class. Once that is
> > done, there is no significant saving to be had by throwing away the
> > constructed _source value.
>
> The proposed resolution on https://bugs.python.org/issue28638 is to
> avoid exec() on most parts of the namedtuple class, hence speeding up
> the class creation.
>
> > I come from
> > a non-Pythonic background so use of exec still feels a bit weird to me
> > but I absolutely love namedtuple and use it constantly.
>
> I think for most Python programmers, it still feels a bit un-Pythonic.
> While exec() is part of Python, it's generally only used in fringe cases
> where nothing else works.
>
> Regards
>
> Antoine.
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> isaac.morland%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread MRAB

On 2017-07-17 21:31, Giampaolo Rodola' wrote:
I completely agree. I love namedtuples but I've never been too happy 
about the additional overhead vs. plain tuples (both for creation and 
attribute access times), to the point that I explicitly avoid to use 
them in certain circumstances (e.g. a busy loop) and only for public 
end-user APIs returning multiple values.


To be entirely honest, I'm not even sure why they need to be forcefully 
declared upfront in the first place, instead of just having a 
first-class function (builtin?) written in C:


 >>> ntuple(x=1, y=0)
(x=1, y=0)

...or even a literal as in:

 >>> (x=1, y=0)
(x=1, y=0)


[snip]

I know it's a bit early to bikeshed, but shouldn't that be:

>>> (x: 1, y: 0)
(x: 1, y: 0)

instead if it's a display/literal?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread MRAB

On 2017-07-17 21:46, MRAB wrote:

On 2017-07-17 21:31, Giampaolo Rodola' wrote:
I completely agree. I love namedtuples but I've never been too happy 
about the additional overhead vs. plain tuples (both for creation and 
attribute access times), to the point that I explicitly avoid to use 
them in certain circumstances (e.g. a busy loop) and only for public 
end-user APIs returning multiple values.


To be entirely honest, I'm not even sure why they need to be forcefully 
declared upfront in the first place, instead of just having a 
first-class function (builtin?) written in C:


 >>> ntuple(x=1, y=0)
(x=1, y=0)

...or even a literal as in:

 >>> (x=1, y=0)
(x=1, y=0)


[snip]

I know it's a bit early to bikeshed, but shouldn't that be:

  >>> (x: 1, y: 0)
(x: 1, y: 0)

instead if it's a display/literal?

Actually, come to think of it, a dict's keys would be quoted, so there 
would be a slight inconsistency there...

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Petr Viktorin

On 07/17/2017 10:31 PM, Giampaolo Rodola' wrote:
I completely agree. I love namedtuples but I've never been too happy 
about the additional overhead vs. plain tuples (both for creation and 
attribute access times), to the point that I explicitly avoid to use 
them in certain circumstances (e.g. a busy loop) and only for public 
end-user APIs returning multiple values.


To be entirely honest, I'm not even sure why they need to be forcefully 
declared upfront in the first place, instead of just having a 
first-class function (builtin?) written in C:


 >>> ntuple(x=1, y=0)
(x=1, y=0)

...or even a literal as in:

 >>> (x=1, y=0)
(x=1, y=0)

Most of the times this is what I really want: quickly returning an 
anonymous tuple with named attributes and nothing else, similarly to 
os.times() & others. [...]


It seems that you want `types.SimpleNamespace(x=1, y=0)`.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Giampaolo Rodola'
On Mon, Jul 17, 2017 at 11:07 PM, Petr Viktorin  wrote:

> On 07/17/2017 10:31 PM, Giampaolo Rodola' wrote:
>
>> I completely agree. I love namedtuples but I've never been too happy
>> about the additional overhead vs. plain tuples (both for creation and
>> attribute access times), to the point that I explicitly avoid to use them
>> in certain circumstances (e.g. a busy loop) and only for public end-user
>> APIs returning multiple values.
>>
>> To be entirely honest, I'm not even sure why they need to be forcefully
>> declared upfront in the first place, instead of just having a first-class
>> function (builtin?) written in C:
>>
>>  >>> ntuple(x=1, y=0)
>> (x=1, y=0)
>>
>> ...or even a literal as in:
>>
>>  >>> (x=1, y=0)
>> (x=1, y=0)
>>
>> Most of the times this is what I really want: quickly returning an
>> anonymous tuple with named attributes and nothing else, similarly to
>> os.times() & others. [...]
>>
>
> It seems that you want `types.SimpleNamespace(x=1, y=0)`.
>

That doesn't support indexing (obj[0]).

-- 
Giampaolo - http://grodola.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Barry Warsaw
namedtuple is great and clever, but it’s also a bit clunky.  It has a weird 
signature and requires a made up type name.  It’s also rather unPythonic if you 
want to support default arguments when creating namedtuple instances.  Maybe as 
you say, a lot of the typical use cases for namedtuples could be addressed by a 
better builtin, but I fear we’ll end up down the bikeshedding hole for that.

-Barry

> On Jul 17, 2017, at 16:31, Giampaolo Rodola'  wrote:
> 
> I completely agree. I love namedtuples but I've never been too happy about 
> the additional overhead vs. plain tuples (both for creation and attribute 
> access times), to the point that I explicitly avoid to use them in certain 
> circumstances (e.g. a busy loop) and only for public end-user APIs returning 
> multiple values.
> 
> To be entirely honest, I'm not even sure why they need to be forcefully 
> declared upfront in the first place, instead of just having a first-class 
> function (builtin?) written in C:
> 
> >>> ntuple(x=1, y=0)
> (x=1, y=0)
> 
> ...or even a literal as in:
> 
> >>> (x=1, y=0)
> (x=1, y=0)
> 
> Most of the times this is what I really want: quickly returning an anonymous 
> tuple with named attributes and nothing else, similarly to os.times() & 
> others. I believe that if something like this would exist we would witness a 
> big transition from tuple() to ntuple() for all those functions returning 
> more than 1 value. We witnessed a similar transition in many parts of the 
> stdlib when collections.namedtuple was first introduced, but not everywhere, 
> probably because declaring a namedtuple is more work, it's more expensive, 
> and it still feels like you're dealing with some kind of too high-level 
> second-class citizen with too much overhead and too many sugar in terms of 
> API (e.g. "verbose", "rename", "module" and "_source").



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Tim Peters
[Giampaolo Rodola' ]
> 
> To be entirely honest, I'm not even sure why they need to be forcefully
> declared upfront in the first place, instead of just having a first-class
> function (builtin?) written in C:
>
> >>> ntuple(x=1, y=0)
> (x=1, y=0)
>
> ...or even a literal as in:
>
> >>> (x=1, y=0)
> (x=1, y=0)

How do you propose that the resulting object T know that T.x is 1. T.y
is 0, and T.z doesn't make sense?  Declaring a namedtuple up front
allows the _class_ to know that all of its instances map attribute "x"
to index 0 and attribute "y" to index 1.  The instances know nothing
about that on their own, and consume no more memory than a plain
tuple.  If your `ntuple()` returns an object implementing its own
mapping, it loses a primary advantage (0 memory overhead) of
namedtuples.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Brett Cannon
On Mon, 17 Jul 2017 at 13:28 Raymond Hettinger 
wrote:

>
> > On Jul 17, 2017, at 8:49 AM, Guido van Rossum  wrote:
> >
> > One of the reasons to be wary of exec()/eval() other than the usual
> security concerns is that in some Python implementations they have a high
> overhead to initialize the parser and compiler. (Even in CPython it's not
> that fast.)
>
> BTW, if getting rid of the template/exec pair is a goal, Joe Jevnik
> proposed a patch a couple of years ago the completely reimplemented
> namedtuple() in C.   The patch was somewhat complex and hard to semantic
> equivalence, but we could resurrect it and clean it up.   That way, we
> could like the existing namedtuple() code in-place and do a subsequent
> import from the C-version.
>
> This path won't be fun (whenever we have both a C version and Python
> version, we get years of trying to sync-up tiny differences); however, it
> will give you take fastest startup times, the fastest lookups at runtime,
> and eliminate use of exec.
>

I vaguely remember some years ago someone proposing a patch that used
metaclasses to avoid using exec() (I think it was to benefit PyPy or one of
the JIT-backed interpreters). Would that work to remove the need for exec()
while keeping the code in pure Python?

As for removing exec() as a goal, I'll back up Christian's point and the
one Steve made at the language summit that removing the use of exec() from
the critical path in Python is a laudable goal from a security perspective.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Joao S. O. Bueno
Just for sake of completeness  since people are talking about a namedtuple
overhaul, I have a couple implementations here -

https://github.com/jsbueno/extradict/blob/master/extradict/extratuple.py

If any idea there can help inspiring someone, I will be happy.

  js
 -><-

On 17 July 2017 at 18:26, Barry Warsaw  wrote:

> namedtuple is great and clever, but it’s also a bit clunky.  It has a
> weird signature and requires a made up type name.  It’s also rather
> unPythonic if you want to support default arguments when creating
> namedtuple instances.  Maybe as you say, a lot of the typical use cases for
> namedtuples could be addressed by a better builtin, but I fear we’ll end up
> down the bikeshedding hole for that.
>
> -Barry
>
> > On Jul 17, 2017, at 16:31, Giampaolo Rodola'  wrote:
> >
> > I completely agree. I love namedtuples but I've never been too happy
> about the additional overhead vs. plain tuples (both for creation and
> attribute access times), to the point that I explicitly avoid to use them
> in certain circumstances (e.g. a busy loop) and only for public end-user
> APIs returning multiple values.
> >
> > To be entirely honest, I'm not even sure why they need to be forcefully
> declared upfront in the first place, instead of just having a first-class
> function (builtin?) written in C:
> >
> > >>> ntuple(x=1, y=0)
> > (x=1, y=0)
> >
> > ...or even a literal as in:
> >
> > >>> (x=1, y=0)
> > (x=1, y=0)
> >
> > Most of the times this is what I really want: quickly returning an
> anonymous tuple with named attributes and nothing else, similarly to
> os.times() & others. I believe that if something like this would exist we
> would witness a big transition from tuple() to ntuple() for all those
> functions returning more than 1 value. We witnessed a similar transition in
> many parts of the stdlib when collections.namedtuple was first introduced,
> but not everywhere, probably because declaring a namedtuple is more work,
> it's more expensive, and it still feels like you're dealing with some kind
> of too high-level second-class citizen with too much overhead and too many
> sugar in terms of API (e.g. "verbose", "rename", "module" and "_source").
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> jsbueno%40python.org.br
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Greg Ewing

Barry Warsaw wrote:

namedtuple is great and clever, but it’s also a bit clunky.  It has a weird
signature and requires a made up type name.


Maybe a metaclass could be used to make something
like this possible:


   class Foo(NamedTuple, fields = 'x,y,z'):
  ...

Then the name is explicit and you get to add methods
etc. if you want.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Ethan Furman

On 07/17/2017 02:26 PM, Barry Warsaw wrote:


namedtuple is great and clever, but it’s also a bit clunky.  It has a weird

> signature and requires a made up type name.  It’s also rather unPythonic if
> you want to support default arguments when creating namedtuple instances.
> Maybe as you say, a lot of the typical use cases for namedtuples could be
> addressed by a better builtin, but I fear we’ll end up down the bikeshedding
> hole for that.

My aenum library [1] has a metaclass-based NamedTuple that allows for default arguments as well as other goodies (which 
would probably not make it to the stdlib since they are mostly fluff).


--
~Ethan~

[1] https://pypi.python.org/pypi/aenum
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Ethan Furman

On 07/17/2017 02:31 PM, Brett Cannon wrote:


I vaguely remember some years ago someone proposing a patch that used 
metaclasses to avoid using exec() (I think it was
to benefit PyPy or one of the JIT-backed interpreters). Would that work to 
remove the need for exec() while keeping the
code in pure Python?


The aenum library [1] uses the same techniques as Enum for a metaclass-based namedtuple.  I don't expect it to be 
faster, but somebody could do the benchmarks and then we'd know for sure.  ;)


--
~Ethan~

[1] https://pypi.python.org/pypi/aenum

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Ethan Furman

On 07/17/2017 03:27 PM, Greg Ewing wrote:

Barry Warsaw wrote:



namedtuple is great and clever, but it’s also a bit clunky.  It has a weird
signature and requires a made up type name.


Maybe a metaclass could be used to make something
like this possible:


class Foo(NamedTuple, fields = 'x,y,z'):
   ...

Then the name is explicit and you get to add methods
etc. if you want.


From the NamedTuple tests from my aenum library [1]:

LifeForm = NamedTuple('LifeForm', 'branch genus species', module=__name__)

class DeathForm(NamedTuple):
color = 0
rigidity = 1
odor = 2

class WhatsIt(NamedTuple):
def what(self):
return self[0]
class ThatsIt(WhatsIt):
blah = 0
bleh = 1

class Character(NamedTuple):
# second argument is doc string
name = 0
gender = 1, None, 'male'
klass = 2, None, 'fighter'

class Point(NamedTuple):
x = 0, 'horizondal coordinate', 0
y = 1, 'vertical coordinate', 0

class Point(NamedTuple):
x = 0, 'horizontal coordinate', 1
y = 1, 'vertical coordinate', -1
class Color(NamedTuple):
r = 0, 'red component', 11
g = 1, 'green component', 29
b = 2, 'blue component', 37

Pixel1 = NamedTuple('Pixel', Point+Color, module=__name__)
class Pixel2(Point, Color):
"a colored dot"
class Pixel3(Point):
r = 2, 'red component', 11
g = 3, 'green component', 29
b = 4, 'blue component', 37

--
~Ethan~

[1] https://pypi.python.org/pypi/aenum
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Giampaolo Rodola'
On Mon, Jul 17, 2017 at 11:24 PM, Tim Peters  wrote:

> [Giampaolo Rodola' ]
> > 
> > To be entirely honest, I'm not even sure why they need to be forcefully
> > declared upfront in the first place, instead of just having a first-class
> > function (builtin?) written in C:
> >
> > >>> ntuple(x=1, y=0)
> > (x=1, y=0)
> >
> > ...or even a literal as in:
> >
> > >>> (x=1, y=0)
> > (x=1, y=0)
>
> How do you propose that the resulting object T know that T.x is 1. T.y
> is 0, and T.z doesn't make sense?


I'm not sure I understand your concern. That's pretty much what
PyStructSequence already does.


> Declaring a namedtuple up front
> allows the _class_ to know that all of its instances map attribute "x"
> to index 0 and attribute "y" to index 1.  The instances know nothing
> about that on their own


Hence why I was talking about a "(lightweight) anonymous tuple with named
attributes". The primary use case for namedtuples is accessing values by
name (obj.x). Personally I've always considered the upfront module-level
declaration only an annoyance which unnecessarily pollutes the API and adds
extra overhead. I typically end up putting all namedtuples in a private
module:
https://github.com/giampaolo/psutil/blob/8b8da39e0c62432504fb5f67c418715aad35b291/psutil/_common.py#L156-L225
...then import them from elsewhere and make sure they are not exposed
publicly because the intermediate object returned by
collections.namedtuple() is basically useless for the end-user. Also
picking up a sensible name for the namedtuple is an annoyance and kinda
weird. Consider this:

from collections import namedtuple

Coordinates = namedtuple('coordinates', ['x', 'y'])

def get_coordinates():
return Coordinates(10, 20)

...vs. this:

def get_coordinates():
return ntuple(x=10, y=20)

...or this:

def get_coordinates():
return (x=10, y=20)

If your `ntuple()` returns an object implementing its own
> mapping, it loses a primary advantage (0 memory overhead) of
> namedtuples.
>

The extra memory overhead is a price I would be happy to pay considering
that collections.namedtuple is considerably slower than a plain tuple.
Other than the additional overhead on startup / import time, instantiation
is 4.5x slower than a plain tuple:

$ python3.7 -m timeit -s "from collections import namedtuple; nt =
namedtuple('xxx', ('x', 'y'))" "nt(1, 2)"
100 loops, best of 5: 313 nsec per loop

$ python3.7 -m timeit "tuple((1, 2))"
500 loops, best of 5: 68.4 nsec per loop

...and name access is 2x slower than index access:

$ python3.7 -m timeit -s "from collections import namedtuple; nt =
namedtuple('xxx', ('x', 'y')); x = nt(1, 2)" "x.x"
500 loops, best of 5: 41.9 nsec per loop

$ python3.7 -m timeit -s "from collections import namedtuple; nt =
namedtuple('xxx', ('x', 'y')); x = nt(1, 2)" "x[0]"
1000 loops, best of 5: 20.2 nsec per loop
$ python3.7 -m timeit -s "x = (1, 2)" "x[0]"
1000 loops, best of 5: 20.5 nsec per loop

-- 
Giampaolo - http://grodola.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Guido van Rossum
Raymond agreed to reopen the issue. Everyone who's eager to redesign
namedtuple, please go to python-ideas.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Alexander Belopolsky
On Mon, Jul 17, 2017 at 6:27 PM, Greg Ewing 
wrote:
>
>
> Maybe a metaclass could be used to make something
> like this possible:
>
>
>class Foo(NamedTuple, fields = 'x,y,z'):
>   ...
>
>
If you think of it, collection.namedtuple *is* a metaclass.  A simple
wrapper will make it usable as such:

import collections

def namedtuple(name, bases, attrs, fields=()):
# Override __init_subclass__ for Python 3.6
return collections.namedtuple(name, fields)

class Foo(metaclass=namedtuple, fields='x,y'):
pass

print(Foo(1, 2))   # ---> Foo(x=1, y=2)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Barry Warsaw
On Jul 17, 2017, at 18:27, Greg Ewing  wrote:
> 
> Barry Warsaw wrote:
>> namedtuple is great and clever, but it’s also a bit clunky.  It has a weird
>> signature and requires a made up type name.
> 
> Maybe a metaclass could be used to make something
> like this possible:
> 
> 
>   class Foo(NamedTuple, fields = 'x,y,z'):
>  ...
> 
> Then the name is explicit and you get to add methods
> etc. if you want.

Yes, I like how that reads.

-Barry



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Steven D'Aprano
On Mon, Jul 17, 2017 at 09:31:20PM +, Brett Cannon wrote:

> As for removing exec() as a goal, I'll back up Christian's point and the
> one Steve made at the language summit that removing the use of exec() from
> the critical path in Python is a laudable goal from a security perspective.

I'm sorry, I don't understand this point. What do you mean by "critical 
path"?

Is the intention to remove exec from builtins? From the entire language? 
If not, how does its use in namedtuple introduce a security problem?



-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Alexander Belopolsky
On Mon, Jul 17, 2017 at 8:16 PM, Barry Warsaw  wrote:

> ..
> >   class Foo(NamedTuple, fields = 'x,y,z'):
> >  ...
> >
> > Then the name is explicit and you get to add methods
> > etc. if you want.
>
> Yes, I like how that reads.
>
>
I would prefer

 class Foo(metaclass=namedtuple, fields = 'x,y,z'):
   ...

which while slightly more verbose, does not lie about what namedtuple is -
a factory of classes.  This, however is completely orthogonal to the issue
of performance.  As I mentioned in my previous post, namedtuple metaclass
above can be a simple function:

def namedtuple(name, bases, attrs, fields=()):
# Override __init_subclass__ for Python 3.6
return collections.namedtuple(name, fields)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Steven D'Aprano
On Tue, Jul 18, 2017 at 01:17:24AM +0200, Giampaolo Rodola' wrote:

> The extra memory overhead is a price I would be happy to pay considering
> that collections.namedtuple is considerably slower than a plain tuple.
> Other than the additional overhead on startup / import time, instantiation
> is 4.5x slower than a plain tuple:
> 
> $ python3.7 -m timeit -s "from collections import namedtuple; nt =
> namedtuple('xxx', ('x', 'y'))" "nt(1, 2)"
> 100 loops, best of 5: 313 nsec per loop
> 
> $ python3.7 -m timeit "tuple((1, 2))"
> 500 loops, best of 5: 68.4 nsec per loop

I don't think that is a fair comparision. As far as I can tell, that 
gets compiled to a name lookup for "tuple" which then returns its 
argument unchanged, the tuple itself being constant-folded at compile 
time.

py> dis.dis("tuple((1, 2))")
  1   0 LOAD_NAME0 (tuple)
  3 LOAD_CONST   2 ((1, 2))
  6 CALL_FUNCTION1 (1 positional, 0 keyword pair)
  9 RETURN_VALUE


-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Nathaniel Smith
On Jul 17, 2017 5:28 PM, "Steven D'Aprano"  wrote:

On Mon, Jul 17, 2017 at 09:31:20PM +, Brett Cannon wrote:

> As for removing exec() as a goal, I'll back up Christian's point and the
> one Steve made at the language summit that removing the use of exec() from
> the critical path in Python is a laudable goal from a security
perspective.

I'm sorry, I don't understand this point. What do you mean by "critical
path"?

Is the intention to remove exec from builtins? From the entire language?
If not, how does its use in namedtuple introduce a security problem?


I think the intention is to allow users with a certain kind of security
requirement to opt in to a restricted version of the language that doesn't
support exec. This is difficult if the stdlib is calling exec all over the
place. But nobody is suggesting to change the language in regular usage,
just provide another option.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Ethan Furman

On 07/17/2017 04:45 PM, Guido van Rossum wrote:


Raymond agreed to reopen the issue. Everyone who's eager to redesign 
namedtuple, please go to python-ideas.


Python Ideas thread started.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Serhiy Storchaka

17.07.17 15:43, Antoine Pitrou пише:

Cost of creating a namedtuple has been identified as a contributor to
Python startup time.  Not only Python core and the stdlib, but any
third-party library creating namedtuple classes (there are many of
them).  An issue was created for this:
https://bugs.python.org/issue28638

Raymond decided to close the issue because:

1) the proposed resolution makes the "_source" attribute empty (or, at
least, something else than it currently is).  Raymond claims the
"_source" attribute is an essential feature of namedtuples.

2) optimizing startup cost is supposedly not worth the effort.


The implementations of namedtuple that don't use compilation were 
written by different developers (including me) multiple times before 
issue28638. I provided my patch in issue28638 as an example, but I 
understand Raymond's arguments, and they look weighty to me. I don't 
know how much the _source attribute is used, but it is a part of public API.


The drawback of these implementation is slower __new__ and __repr__ 
methods. This can be avoided if use compilation for creating __new__, 
but this makes the creation of a namedtuple class slower (but still 
faster than compiling full namedtuple class). The drawback of generating 
_source without using it to create a namedtuple class is complicating 
the code and possible quickly desynchronization of two implementations 
in future.


I think that the right solution of this issue is generalizing the import 
machinery and allowing it to cache not just files, but arbitrary chunks 
of code. We already use precompiled bytecode files for exactly same goal 
-- speed up the startup by avoiding compilation. This solution could be 
used for caching other generated code, not just namedtuples.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com