Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-18 Thread INADA Naoki
>
> I think that the right solution of this issue is generalizing the import
> machinery and allowing it to cache not just files, but arbitrary chunks of
> code. We already use precompiled bytecode files for exactly same goal --
> speed up the startup by avoiding compilation. This solution could be used
> for caching other generated code, not just namedtuples.
>

I thought about adding C implementation based on PyStructSequence.
But I like Jelle's approach because it may improve performance on all
Python implementation.

It's reducing source to eval.  It shares code objects for most methods.
(refer https://github.com/python/cpython/pull/2736#issuecomment-316014866
for quick and dirty bench on PyPy)

I agree that template + eval pattern is nice for readability when comparing to
other meta-programming magic.
And code cache machinery can improve template + eval pattern in CPython.

But namedtuple is very widely used.
It's loved enough to get optimized for not only CPython.
So I prefer Jelle's approach to adding code cache machinery in this case.

Regards,

INADA Naoki  
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-18 Thread Nick Coghlan
On 18 July 2017 at 05:42, Raymond Hettinger  wrote:
> One minor grumble:  I think we need to give careful cost/benefit 
> considerations to optimizations that complicate the implementation.  Over the 
> last several years, the source for Python has grown increasingly complicated. 
>  Fewer people understand it now. It is much harder to newcomers to on-ramp.  
> The old-timers (myself included) find that their knowledge is out of date.  
> And complexity leads to bugs (the C optimization of random number seeding 
> caused a major bug in the 3.6.0 release; the C optimization of the lru_cache 
> resulted in multiple releases having a hard to find threading bugs, etc.).  
> It is becoming increasingly difficult to look at code and tell whether it is 
> correct (I still don't fully understand the implications of the recursive 
> constant folding in the peephole optimizer for example).In the case of 
> this named tuple proposal, the complexity is manageable, but the overall 
> trend isn't good and I get the feeling the aggressive optimization is causing 
> us to forget key p
 ar
>  ts of the zen-of-python.

As another example of this: while trading the global import lock for
per-module locks eliminated most of the old import deadlocks, it turns
out that it *also* left us with some fairly messy race conditions and
more fragile code (I still count that particular case as a win
overall, but it definitely raises the barrier to entry for maintaining
that code).

Unfortunately, these are frequently cases where the benefits are
immediately visible (e.g. faster benchmark results, removing
longstanding limitations on user code), but the downsides can
literally take years to make themselves felt (e.g. higher defect rates
in the interpreter, subtle bugs in previously correct user code that
are eventually traced back to interpreter changes).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-18 Thread Antoine Pitrou
On Tue, 18 Jul 2017 22:13:01 +1000
Nick Coghlan  wrote:
> 
> As another example of this: while trading the global import lock for
> per-module locks eliminated most of the old import deadlocks, it turns
> out that it *also* left us with some fairly messy race conditions and
> more fragile code (I still count that particular case as a win
> overall, but it definitely raises the barrier to entry for maintaining
> that code).
> 
> Unfortunately, these are frequently cases where the benefits are
> immediately visible (e.g. faster benchmark results, removing
> longstanding limitations on user code), but the downsides can
> literally take years to make themselves felt (e.g. higher defect rates
> in the interpreter, subtle bugs in previously correct user code that
> are eventually traced back to interpreter changes).

The import deadlocks were really in the category of "subtle bugs" that
only occur in certain timing conditions (especially when combined with
PyImport_ImportModuleNoBlock and/or stdlib modules which can try to
import stuff silently, such as the codecs module). So we traded a
category of "subtle bugs" due to a core design deficiency for another
category of "subtle bugs" due to an imperfect implementation, the
latter being actually fixable incrementally :-)

Disclaimer: I wrote the initial per-module lock implementation, which
was motivated by those long-standing "subtle bugs" in multi-threaded
applications.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-18 Thread Guido van Rossum
There are some weighty things being said in this subthread that shouldn't
be hidden under the heading of improving NamedTuple. For continued
discussion of our development philosophy let's open a new thread. (I have
an opinion but I expect I'm not the only one with that opinion, so I'll let
others express theirs first.)

--Guido

On Tue, Jul 18, 2017 at 5:13 AM, Nick Coghlan  wrote:

> On 18 July 2017 at 05:42, Raymond Hettinger 
> wrote:
> > One minor grumble:  I think we need to give careful cost/benefit
> considerations to optimizations that complicate the implementation.  Over
> the last several years, the source for Python has grown increasingly
> complicated.  Fewer people understand it now. It is much harder to
> newcomers to on-ramp.  The old-timers (myself included) find that their
> knowledge is out of date.  And complexity leads to bugs (the C optimization
> of random number seeding caused a major bug in the 3.6.0 release; the C
> optimization of the lru_cache resulted in multiple releases having a hard
> to find threading bugs, etc.).  It is becoming increasingly difficult to
> look at code and tell whether it is correct (I still don't fully understand
> the implications of the recursive constant folding in the peephole
> optimizer for example).In the case of this named tuple proposal, the
> complexity is manageable, but the overall trend isn't good and I get the
> feeling the aggressive optimization is causing us to forget key par
> >  ts of the zen-of-python.
>
> As another example of this: while trading the global import lock for
> per-module locks eliminated most of the old import deadlocks, it turns
> out that it *also* left us with some fairly messy race conditions and
> more fragile code (I still count that particular case as a win
> overall, but it definitely raises the barrier to entry for maintaining
> that code).
>
> Unfortunately, these are frequently cases where the benefits are
> immediately visible (e.g. faster benchmark results, removing
> longstanding limitations on user code), but the downsides can
> literally take years to make themselves felt (e.g. higher defect rates
> in the interpreter, subtle bugs in previously correct user code that
> are eventually traced back to interpreter changes).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Program runs in 12s on Python 2.7, but 5s on Python 3.5 -- why so much difference?

2017-07-18 Thread Ben Hoyt
Hi folks,

(Not entirely sure this is the right place for this question, but hopefully
it's of interest to several folks.)

A few days ago I posted a note in response to Victor Stinner's articles on
his CPython contributions, noting that I wrote a program that ran in 11.7
seconds on Python 2.7, but only takes 5.1 seconds on Python 3.5 (on my 2.5
GHz macOS i7), more than 2x as fast. Obviously this is a Good Thing, but
I'm curious as to why there's so much difference.

The program is a pentomino puzzle solver, and it works via code generation,
generating a ton of nested "if" statements, so I believe it's exercising
the Python bytecode interpreter heavily. Obviously there have been some big
optimizations to make this happen, but I'm curious what the main
improvements are that are causing this much difference.

There's a writeup about my program here, with benchmarks at the bottom:
http://benhoyt.com/writings/python-pentomino/

This is the generated Python code that's being exercised:
https://github.com/benhoyt/python-pentomino/blob/master/generated_solve.py

For reference, on Python 3.6 it runs in 4.6 seconds (same on Python 3.7
alpha). This smallish increase from Python 3.5 to Python 3.6 was more
expected to me due to the bytecode changing to wordcode in 3.6.

I tried using cProfile on both Python versions, but that didn't say much,
because the functions being called aren't taking the majority of the time.
How does one benchmark at a lower level, or otherwise explain what's going
on here?

Thanks,
Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Design Philosophy: Performance vs Robustness/Maintainability

2017-07-18 Thread Ethan Furman

Raymond Hettinger:
-
> One minor grumble:  I think we need to give careful cost/benefit 
considerations to
> optimizations that complicate the implementation.  Over the last several 
years, the
> source for Python has grown increasingly complicated.  Fewer people 
understand it
> now. It is much harder to newcomers to on-ramp.  The old-timers (myself 
included)
> find that their knowledge is out of date.  And complexity leads to bugs (the C
> optimization of random number seeding caused a major bug in the 3.6.0 
release; the
> C optimization of the lru_cache resulted in multiple releases having a hard 
to find
> threading bugs, etc.).  It is becoming increasingly difficult to look at code 
and
> tell whether it is correct (I still don't fully understand the implications 
of the
> recursive constant folding in the peephole optimizer for example).In the 
case
> of this named tuple proposal, the complexity is manageable, but the overall 
trend
> isn't good and I get the feeling the aggressive optimization is causing us to
> forget key parts of the zen-of-python.

Nick Coughlan:
-

As another example of this: while trading the global import lock for
per-module locks eliminated most of the old import deadlocks, it turns
out that it *also* left us with some fairly messy race conditions and
more fragile code (I still count that particular case as a win
overall, but it definitely raises the barrier to entry for maintaining
that code).

Unfortunately, these are frequently cases where the benefits are
immediately visible (e.g. faster benchmark results, removing
longstanding limitations on user code), but the downsides can
literally take years to make themselves felt (e.g. higher defect rates
in the interpreter, subtle bugs in previously correct user code that
are eventually traced back to interpreter changes).


Barry Warsaw:

> Regardless of whether [namedtuple] optimization is a good idea or not, start 
up
> time *is* a serious challenge in many environments for CPython in particular 
and
> the perception of Python’s applicability to many problems.  I think we’re 
better
> off trying to identify and address such problems than ignoring or minimizing 
them.

Ethan Furman:

Speed is not the only factor, and certainly shouldn't be the first concern, but 
once
we have correct code we need to follow our own advice:  find the bottlenecks 
and optimize
them.  Optimized code will never be as pretty or maintainable as simple, 
unoptimized
code but real-world applications often require as much performance as can be 
obtained.

[My apologies if I missed any points from the namedtuple thread.]

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-18 Thread Ethan Furman

On 07/18/2017 08:12 AM, Guido van Rossum wrote:

There are some weighty things being said in this subthread that shouldn't be 
hidden under the heading of improving
NamedTuple. For continued discussion of our development philosophy let's open a 
new thread. (I have an opinion but I
expect I'm not the only one with that opinion, so I'll let others express 
theirs first.)


New thread created.

--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Design Philosophy: Performance vs Robustness/Maintainability

2017-07-18 Thread Victor Stinner
[Python-Dev] Design Philosophy: Performance vs Robustness/Maintainability

2017-07-18 18:08 GMT+02:00 Ethan Furman :
> Nick Coughlan:
> -
>>
>> As another example of this: while trading the global import lock for
>> per-module locks eliminated most of the old import deadlocks, (...)

Minor remark: the email subject is inaccurate, this change is not
related to performance. I would more say that it's about correctness.

Python 3 doesn't hung on deadlock in "legit" import anymore ;-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] deque implementation question

2017-07-18 Thread Joao S. O. Bueno
On 15 July 2017 at 04:01, Max Moroz  wrote:

> What would be the disadvantage of implementing collections.deque as a
> circular array (rather than a doubly linked list of blocks)? My naive
> thinking was that a circular array would maintain the current O(1) append/pop
> from either side, and would improve index lookup in the middle from O(n) to
> O(1). What am I missing?
>
> The insertion/removal of an arbitrary item specified by a pointer would
> increase from constant time to linear, but since we don't have pointers
> this is a moot point.
>
> Of course when the circular array is full, it will need to be reallocated,
> but the amortized cost of that is still O(1). (Moreover, for a bounded
> deque, there's even an option of preallocation, which would completely
> eliminate reallocations.)
>

Now - since you are at it,  you could possibly mine pypi for interesting
efficient data structures that could cover use cases lists and deque not
suffice. I am pretty sure one could find a couple,  - and once we get a few
that are well behaved and somewhat popular, they could be made candidates
for inclusion in collections, I guess.



> Thanks
>
> Max
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> jsbueno%40python.org.br
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Design Philosophy: Performance vs Robustness/Maintainability

2017-07-18 Thread Antoine Pitrou
On Tue, 18 Jul 2017 09:08:08 -0700
Ethan Furman  wrote:
> 
> Nick Coughlan:
> -

It is "Nick Coghlan" not "Coughlan".

> > As another example of this: while trading the global import lock for
> > per-module locks eliminated most of the old import deadlocks, it turns
> > out that it *also* left us with some fairly messy race conditions and
> > more fragile code (I still count that particular case as a win
> > overall, but it definitely raises the barrier to entry for maintaining
> > that code).
> >
> > Unfortunately, these are frequently cases where the benefits are
> > immediately visible (e.g. faster benchmark results, removing
> > longstanding limitations on user code), but the downsides can
> > literally take years to make themselves felt (e.g. higher defect rates
> > in the interpreter, subtle bugs in previously correct user code that
> > are eventually traced back to interpreter changes).  

I'll reply here again: the original motivation for the per-module
import lock was not performance but correctness.

The import deadlocks were really in the category of "subtle bugs" that
only occur in certain timing conditions (especially when combined with
PyImport_ImportModuleNoBlock and/or stdlib modules which can try to
import stuff silently, such as the codecs module). So we traded a
category of "subtle bugs" due to a core design deficiency for another
category of "subtle bugs" due to an imperfect implementation, the
latter being actually fixable incrementally :-)

Disclaimer: I wrote the initial per-module lock implementation.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Design Philosophy: Performance vs Robustness/Maintainability

2017-07-18 Thread Victor Stinner
2017-07-18 18:08 GMT+02:00 Ethan Furman :
> Raymond Hettinger:
> -
>> And complexity leads to bugs
>> (the C
>> optimization of random number seeding caused a major bug in the 3.6.0
>> release

Hum, I guess that Raymond is referring to http://bugs.python.org/issue29085

This regression was not by an optimization at all, but a change to
harden Python:
https://www.python.org/dev/peps/pep-0524/

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Program runs in 12s on Python 2.7, but 5s on Python 3.5 -- why so much difference?

2017-07-18 Thread Antoine Pitrou
On Tue, 18 Jul 2017 12:03:36 -0400
Ben Hoyt  wrote:

> Hi folks,
> 
> (Not entirely sure this is the right place for this question, but hopefully
> it's of interest to several folks.)
> 
> A few days ago I posted a note in response to Victor Stinner's articles on
> his CPython contributions, noting that I wrote a program that ran in 11.7
> seconds on Python 2.7, but only takes 5.1 seconds on Python 3.5 (on my 2.5
> GHz macOS i7), more than 2x as fast. Obviously this is a Good Thing, but
> I'm curious as to why there's so much difference.
> 
> The program is a pentomino puzzle solver, and it works via code generation,
> generating a ton of nested "if" statements, so I believe it's exercising
> the Python bytecode interpreter heavily.

A first step would be to see if the generated bytecode has changed
substantially.

Otherwise, you can try to comment out parts of the function until the
performance difference has been nullified.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Design Philosophy: Performance vs Robustness/Maintainability

2017-07-18 Thread Ethan Furman

On 07/18/2017 09:16 AM, Antoine Pitrou wrote:

On Tue, 18 Jul 2017 09:08:08 -0700
Ethan Furman  wrote:


Nick Coughlan:
-


It is "Nick Coghlan" not "Coughlan".


Argh.  Sorry, Nick, and thank you, Antoine!


As another example of this: while trading the global import lock for
per-module locks eliminated most of the old import deadlocks, it turns
out that it *also* left us with some fairly messy race conditions and
more fragile code (I still count that particular case as a win
overall, but it definitely raises the barrier to entry for maintaining
that code).

Unfortunately, these are frequently cases where the benefits are
immediately visible (e.g. faster benchmark results, removing
longstanding limitations on user code), but the downsides can
literally take years to make themselves felt (e.g. higher defect rates
in the interpreter, subtle bugs in previously correct user code that
are eventually traced back to interpreter changes).


I'll reply here again: the original motivation for the per-module
import lock was not performance but correctness.


I meant that as an example of the dangers of increased code complexity.

--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-18 Thread Brett Cannon
On Mon, 17 Jul 2017 at 19:26 Nathaniel Smith  wrote:

> On Jul 17, 2017 5:28 PM, "Steven D'Aprano"  wrote:
>
> On Mon, Jul 17, 2017 at 09:31:20PM +, Brett Cannon wrote:
>
> > As for removing exec() as a goal, I'll back up Christian's point and the
> > one Steve made at the language summit that removing the use of exec()
> from
> > the critical path in Python is a laudable goal from a security
> perspective.
>
> I'm sorry, I don't understand this point. What do you mean by "critical
> path"?
>
> Is the intention to remove exec from builtins? From the entire language?
> If not, how does its use in namedtuple introduce a security problem?
>
>
> I think the intention is to allow users with a certain kind of security
> requirement to opt in to a restricted version of the language that doesn't
> support exec. This is difficult if the stdlib is calling exec all over the
> place. But nobody is suggesting to change the language in regular usage,
> just provide another option.
>

What Nathaniel said. :)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-18 Thread Larry Hastings



On 07/17/2017 07:25 PM, Nathaniel Smith wrote:
I think the intention is to allow users with a certain kind of 
security requirement to opt in to a restricted version of the language 
that doesn't support exec. This is difficult if the stdlib is calling 
exec all over the place. But nobody is suggesting to change the 
language in regular usage, just provide another option.


An anecdote about removing exec().  Back in 2012 I interviewed Kristjan 
Valur Jonsson, then of CCP Games, for my podcast Radio Free Python.  He 
said that due to memory constraints they'd had to remove the compiler 
from the Playstation 3 build of Python for some game project.  This 
meant that namedtuple didn't work, which had knock-on effects for other 
bits of the standard library


So security concerns aren't the only reason for removing the compiler,


//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Design Philosophy: Performance vs Robustness/Maintainability

2017-07-18 Thread Brett Cannon
On Tue, 18 Jul 2017 at 09:07 Ethan Furman  wrote:

> Raymond Hettinger:
> -
>  > One minor grumble:  I think we need to give careful cost/benefit
> considerations to
>  > optimizations that complicate the implementation.  Over the last
> several years, the
>  > source for Python has grown increasingly complicated.  Fewer people
> understand it
>  > now. It is much harder to newcomers to on-ramp.  The old-timers (myself
> included)
>  > find that their knowledge is out of date.  And complexity leads to bugs
> (the C
>  > optimization of random number seeding caused a major bug in the 3.6.0
> release; the
>  > C optimization of the lru_cache resulted in multiple releases having a
> hard to find
>  > threading bugs, etc.).  It is becoming increasingly difficult to look
> at code and
>  > tell whether it is correct (I still don't fully understand the
> implications of the
>  > recursive constant folding in the peephole optimizer for example).
> In the case
>  > of this named tuple proposal, the complexity is manageable, but the
> overall trend
>  > isn't good and I get the feeling the aggressive optimization is causing
> us to
>  > forget key parts of the zen-of-python.
>
> Nick Coghlan:
> -
> > As another example of this: while trading the global import lock for
> > per-module locks eliminated most of the old import deadlocks, it turns
> > out that it *also* left us with some fairly messy race conditions and
> > more fragile code (I still count that particular case as a win
> > overall, but it definitely raises the barrier to entry for maintaining
> > that code).
> >
> > Unfortunately, these are frequently cases where the benefits are
> > immediately visible (e.g. faster benchmark results, removing
> > longstanding limitations on user code), but the downsides can
> > literally take years to make themselves felt (e.g. higher defect rates
> > in the interpreter, subtle bugs in previously correct user code that
> > are eventually traced back to interpreter changes).
>
> Barry Warsaw:
> 
>  > Regardless of whether [namedtuple] optimization is a good idea or not,
> start up
>  > time *is* a serious challenge in many environments for CPython in
> particular and
>  > the perception of Python’s applicability to many problems.  I think
> we’re better
>  > off trying to identify and address such problems than ignoring or
> minimizing them.
>
> Ethan Furman:
> 
> Speed is not the only factor, and certainly shouldn't be the first
> concern, but once
> we have correct code we need to follow our own advice:  find the
> bottlenecks and optimize
> them.  Optimized code will never be as pretty or maintainable as simple,
> unoptimized
> code but real-world applications often require as much performance as can
> be obtained.
>

For me it's a balance based on how critical the code is and how complicated
the code will become long-term. I think between Victor and me we maybe have
1 person/week of paid work time on CPython and the rest is volunteer time,
so there always has to be some consideration as to whether maintenance will
become untenable long-term (this is why complex is better than complicated
pretty much no matter what).

In namedtuple's case, Raymond designed something that was useful with an
elegant solution. Unfortunately namedtuple is a victim of its own success
and became a bottleneck when it came to startup time in apps that used it
extensively as well as being a sticking point for anyone who wanted to
askew exec(). So now we're keeping the usefulness/API design aspect and are
being pragmatic about the fact that we want to rework the elegant design to
be computationally cheaper so it's no longer an obvious performance penalty
at app startup for people who use it a lot. And so now the work is trying
to balance the pragmatic performance aspect with the long-term maintenance
aspect.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Program runs in 12s on Python 2.7, but 5s on Python 3.5 -- why so much difference?

2017-07-18 Thread Nick Coghlan
On 19 July 2017 at 02:18, Antoine Pitrou  wrote:
> On Tue, 18 Jul 2017 12:03:36 -0400
> Ben Hoyt  wrote:
>> The program is a pentomino puzzle solver, and it works via code generation,
>> generating a ton of nested "if" statements, so I believe it's exercising
>> the Python bytecode interpreter heavily.
>
> A first step would be to see if the generated bytecode has changed
> substantially.

Scanning over them, the Python 2.7 bytecode appears to have many more
JUMP_FORWARD and JUMP_ABSOLUTE opcodes than appear in the 3.6 version
(I didn't dump them into a Counter instance to tally them properly
though, since 2.7's dis module is missing the structured opcode
iteration APIs).

With the shift to wordcode, the overall size of the bytecode is also
significantly *smaller*:

>>> len(co.co_consts[0].co_code) # 2.7
14427

>>> len(co.co_consts[0].co_code) # 3.6
11850

However, I'm not aware of any Python profilers that currently offer
opcode level profiling - the closest would probably be VMProf's JIT
profiling, and that aspect of VMProf is currently PyPy specific
(although could presumably be extended to CPython 3.6+ by way of the
opcode evaluation hook).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Program runs in 12s on Python 2.7, but 5s on Python 3.5 -- why so much difference?

2017-07-18 Thread Ben Hoyt
Thanks, Nick -- that's interesting. I just saw the extra JUMP_FORWARD and
JUMP_ABSOLUTE instructions on my commute home (I guess those are something
Python 3.x optimizes away).

VERY strangely, on Windows Python 2.7 is faster! Comparing 64-bit Python
2.7.12 against Python 3.5.3 on my Windows 10 laptop:

* Python 2.7.12: 4.088s
* Python 3.5.3: 5.792s

I'm pretty sure MSVC/Windows doesn't support computed gotos, but that
doesn't explain why 3.5 is so much faster than 2.7 on Mac. I have yet to
try it on Linux.

-Ben

On Tue, Jul 18, 2017 at 9:35 PM, Nick Coghlan  wrote:

> On 19 July 2017 at 02:18, Antoine Pitrou  wrote:
> > On Tue, 18 Jul 2017 12:03:36 -0400
> > Ben Hoyt  wrote:
> >> The program is a pentomino puzzle solver, and it works via code
> generation,
> >> generating a ton of nested "if" statements, so I believe it's exercising
> >> the Python bytecode interpreter heavily.
> >
> > A first step would be to see if the generated bytecode has changed
> > substantially.
>
> Scanning over them, the Python 2.7 bytecode appears to have many more
> JUMP_FORWARD and JUMP_ABSOLUTE opcodes than appear in the 3.6 version
> (I didn't dump them into a Counter instance to tally them properly
> though, since 2.7's dis module is missing the structured opcode
> iteration APIs).
>
> With the shift to wordcode, the overall size of the bytecode is also
> significantly *smaller*:
>
> >>> len(co.co_consts[0].co_code) # 2.7
> 14427
>
> >>> len(co.co_consts[0].co_code) # 3.6
> 11850
>
> However, I'm not aware of any Python profilers that currently offer
> opcode level profiling - the closest would probably be VMProf's JIT
> profiling, and that aspect of VMProf is currently PyPy specific
> (although could presumably be extended to CPython 3.6+ by way of the
> opcode evaluation hook).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> benhoyt%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Program runs in 12s on Python 2.7, but 5s on Python 3.5 -- why so much difference?

2017-07-18 Thread Nathaniel Smith
I'd probably start with a regular C-level profiler, like perf or
callgrind. They're not very useful for comparing two versions of code
written in Python, but here the Python code is the same (modulo
changes in the stdlib), and it's changes in the interpreter's C code
that probably make the difference.

On Tue, Jul 18, 2017 at 9:03 AM, Ben Hoyt  wrote:
> Hi folks,
>
> (Not entirely sure this is the right place for this question, but hopefully
> it's of interest to several folks.)
>
> A few days ago I posted a note in response to Victor Stinner's articles on
> his CPython contributions, noting that I wrote a program that ran in 11.7
> seconds on Python 2.7, but only takes 5.1 seconds on Python 3.5 (on my 2.5
> GHz macOS i7), more than 2x as fast. Obviously this is a Good Thing, but I'm
> curious as to why there's so much difference.
>
> The program is a pentomino puzzle solver, and it works via code generation,
> generating a ton of nested "if" statements, so I believe it's exercising the
> Python bytecode interpreter heavily. Obviously there have been some big
> optimizations to make this happen, but I'm curious what the main
> improvements are that are causing this much difference.
>
> There's a writeup about my program here, with benchmarks at the bottom:
> http://benhoyt.com/writings/python-pentomino/
>
> This is the generated Python code that's being exercised:
> https://github.com/benhoyt/python-pentomino/blob/master/generated_solve.py
>
> For reference, on Python 3.6 it runs in 4.6 seconds (same on Python 3.7
> alpha). This smallish increase from Python 3.5 to Python 3.6 was more
> expected to me due to the bytecode changing to wordcode in 3.6.
>
> I tried using cProfile on both Python versions, but that didn't say much,
> because the functions being called aren't taking the majority of the time.
> How does one benchmark at a lower level, or otherwise explain what's going
> on here?
>
> Thanks,
> Ben
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
>



-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com