[Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Patrascu, Alecsandru
Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel 
Corporation.

I would like to submit a request to turn-on Profile Guided Optimization or PGO 
as the default build option for Python (both 2.7 and 3.6), given its 
performance benefits on a wide variety of workloads and hardware.  For 
instance, as shown from attached sample performance results from the Grand 
Unified Python Benchmark, >20% speed up was observed.  In addition, we are 
seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the 
codes are in Python 2.7. Our analysis indicates the performance gain was mainly 
due to reduction of icache misses and CPU front-end stalls.

Attached is the Makefile patches that modify the all build target and adds a 
new one called "disable-profile-opt". We built and tested this patch for Python 
2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon 
Haswell/Broadwell with 18/8 cores).  We use "regrtest" suite for training as it 
provides the best performance improvement.  Some of the test programs in the 
suite may fail which leads to build fail.  One solution is to disable the 
specific failed test using the "-x " flag (as shown in the patch)

Steps to apply the patch: 
1.  hg clone https://hg.python.org/cpython cpython 
2.  cd cpython 
3.  hg update 2.7 (needed for 2.7 only) 
4.  Copy *.patch to the current directory 
5.  patch < python2.7-pgo.patch (or patch < python3.6-pgo.patch)
6.  ./configure 
7.  make

To disable PGO
7b. make disable-profile-opt

In the following, please find our sample performance results from latest XEON 
machine, XEON Broadwell EP.  
Hardware (HW):  Intel XEON (Broadwell) 8 Cores

BIOS settings:  Intel Turbo Boost Technology: false
Hyper-Threading: false

Operating System:   Ubuntu 14.04.3 LTS trusty

OS configuration:   CPU freq set at fixed: 2.6GHz by
echo 260 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
echo 260 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
Address Space Layout Randomization (ASLR) disabled (to 
reduce run to run variation) by
echo 0 > /proc/sys/kernel/randomize_va_space

GCC version:gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

Benchmark:  Grand Unified Python Benchmark (GUPB)
GUPB Source: https://hg.python.org/benchmarks/  
  

Python2.7 results:
Python source: hg clone https://hg.python.org/cpython cpython
Python Source: hg update 2.7
hg id: 0511b1165bb6 (2.7)
hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

Benchmarks  Speedup(%)
simple_logging  20
raytrace20
silent_logging  19
richards19
chaos   16
formatted_logging   16
json_dump   15
hexiom2 13
pidigits12
slowunpickle12
django_v2   12
unpack_sequence 11
float   11
mako11
slowpickle  11
fastpickle  11
django  11
go  10
json_dump_v210
pathlib 10
regex_compile   10
pybench 9.9
etree_process   9
regex_v88
bzr_startup 8
2to38
slowspitfire8
telco   8
pickle_list 8
fannkuch8
etree_iterparse 8
nqueens 8
mako_v2 8
etree_generate  8
call_method_slots   7
html5lib_warmup 7
html5lib7
nbody   7
spectral_norm   7
spambayes   7
fastunpickle6
meteor_contest  6
chameleon   6
rietveld6
tornado_http5
unpickle_list   5
pickle_dict 4
regex_effbot3
normal_startup  3
startup_nosite  3
etree_parse 2
call_method_unknown 2
call_simple 1
json_load   1
call_method 1

Python3.6 results
Python source: hg clone https://hg.python.org/cpython cpython
hg id: 96d016f78726 tip
hg id -r 'ancestors(.) and tag()': 1a58b1227501 (3.5) v3.5.0rc1
hg --debug id -i: 96d016f78726afbf66d396f084b291ea43792af1


Benchmark   Speedup(%)
fastunpickle22.94
fastpickle  21.67
json_load   17.64
simple_logging  17.49
meteor_cont

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Patrascu, Alecsandru
Hello and thank you for your feedback.

We have measured PGO gain using other workloads also. Our initial choice for 
this optimization was pybench, but the speedup obtained was lower than using 
regrtest and it didn't cover a lot of Python scenarios. Instead, regrtest has 
an uniform distribution for the tests and the resulting binary is overall much 
faster than the default, or trained using other workloads, and thus covering a 
larger pool of Python loads. This optimization was also tested on a production 
environments running OpenStack Swift and got up to 9% improvements.

The reason we proposed this target to be always on is that the obtained 
optimized binary is better out of the box for the general cases.

Alecsandru 

From: gvanros...@gmail.com [mailto:gvanros...@gmail.com] On Behalf Of Guido van 
Rossum
Sent: Saturday, August 22, 2015 7:15 PM
To: Patrascu, Alecsandru
Cc: python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

How about we first add a new Makefile target that enables PGO, without turning 
it on by default? Then later we can enable it by default.
Also, I have my doubts about regrtest. How sure are we that it represents a 
typical Python load? Tests are often using a different mix of operations than 
production code.

On Sat, Aug 22, 2015 at 7:46 AM, Patrascu, Alecsandru 
 wrote:
Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel 
Corporation.

I would like to submit a request to turn-on Profile Guided Optimization or PGO 
as the default build option for Python (both 2.7 and 3.6), given its 
performance benefits on a wide variety of workloads and hardware.  For 
instance, as shown from attached sample performance results from the Grand 
Unified Python Benchmark, >20% speed up was observed.  In addition, we are 
seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the 
codes are in Python 2.7. Our analysis indicates the performance gain was mainly 
due to reduction of icache misses and CPU front-end stalls.

Attached is the Makefile patches that modify the all build target and adds a 
new one called "disable-profile-opt". We built and tested this patch for Python 
2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon 
Haswell/Broadwell with 18/8 cores).  We use "regrtest" suite for training as it 
provides the best performance improvement.  Some of the test programs in the 
suite may fail which leads to build fail.  One solution is to disable the 
specific failed test using the "-x " flag (as shown in the patch)

Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython
2.  cd cpython
3.  hg update 2.7 (needed for 2.7 only)
4.  Copy *.patch to the current directory
5.  patch < python2.7-pgo.patch (or patch < python3.6-pgo.patch)
6.  ./configure
7.  make

To disable PGO
7b. make disable-profile-opt

In the following, please find our sample performance results from latest XEON 
machine, XEON Broadwell EP.
Hardware (HW):      Intel XEON (Broadwell) 8 Cores

BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false

Operating System:   Ubuntu 14.04.3 LTS trusty

OS configuration:   CPU freq set at fixed: 2.6GHz by
                        echo 260 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
                        echo 260 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
                    Address Space Layout Randomization (ASLR) disabled (to 
reduce run to run variation) by
                        echo 0 > /proc/sys/kernel/randomize_va_space

GCC version:        gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

Benchmark:          Grand Unified Python Benchmark (GUPB)
                    GUPB Source: https://hg.python.org/benchmarks/

Python2.7 results:
    Python source: hg clone https://hg.python.org/cpython cpython
    Python Source: hg update 2.7
    hg id: 0511b1165bb6 (2.7)
    hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
    hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

        Benchmarks          Speedup(%)
        simple_logging      20
        raytrace            20
        silent_logging      19
        richards            19
        chaos               16
        formatted_logging   16
        json_dump           15
        hexiom2             13
        pidigits            12
        slowunpickle        12
        django_v2           12
        unpack_sequence     11
        float               11
        mako                11
        slowpickle          11
        fastpickle          11
        django              11
        go                  10
        json_dump_v2        10
        pathlib             10
        regex_compile       10
        pybench             9.9
        etree_process       9
        regex_v8            8
        bzr_startup         8
        2t

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Patrascu, Alecsandru

This target replaces the existing one in the CPython Makefile, which now uses a 
quick run of pybench and the obtained binary does not perform well on general 
Python loads. I don't think is a good idea to add a by-default target that does 
PGO on dedicated workloads, like Django, because then it will perform better on 
that particular load and poorly on other. 

Of course, if any user has a dedicated workload for which he or she want to get 
the best benefit over PGO, it will have to run that training separately from 
the proposed one. Our proposal targets the broader audience that uses Python in 
various scenarios, and they will see an overall improvement after compiling 
Python from sources.

Alecsandru

From: Brett Cannon [mailto:br...@python.org] 
Sent: Saturday, August 22, 2015 7:25 PM
To: gu...@python.org; Patrascu, Alecsandru
Cc: python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default


On Sat, Aug 22, 2015, 09:17 Guido van Rossum  wrote:
How about we first add a new Makefile target that enables PGO, without turning 
it on by default? Then later we can enable it by default.

I agree. Updating the Makefile so it's easier to use PGO is great, but we 
should do a release with it as opt-in and go from there.
Also, I have my doubts about regrtest. How sure are we that it represents a 
typical Python load? Tests are often using a different mix of operations than 
production code.
That was also my question. You said that "it provides the best performance 
improvement", but compared to what; what else was tried? And what difference 
does it make to e.g. a Django app that is trained on their own simulated 
workload compared to using regrtest? IOW is regrtest displaying the best 
across-the-board performance because it stresses the largest swath of Python 
and thus catches generic patterns in the code but individuals could get better 
performance with a simulated workload?
-Brett

On Sat, Aug 22, 2015 at 7:46 AM, Patrascu, Alecsandru 
 wrote:
Hi All,
This is Alecsandru from Server Scripting Languages Optimization team at Intel 
Corporation.
I would like to submit a request to turn-on Profile Guided Optimization or PGO 
as the default build option for Python (both 2.7 and 3.6), given its 
performance benefits on a wide variety of workloads and hardware.  For 
instance, as shown from attached sample performance results from the Grand 
Unified Python Benchmark, >20% speed up was observed.  In addition, we are 
seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the 
codes are in Python 2.7. Our analysis indicates the performance gain was mainly 
due to reduction of icache misses and CPU front-end stalls.
Attached is the Makefile patches that modify the all build target and adds a 
new one called "disable-profile-opt". We built and tested this patch for Python 
2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon 
Haswell/Broadwell with 18/8 cores).  We use "regrtest" suite for training as it 
provides the best performance improvement.  Some of the test programs in the 
suite may fail which leads to build fail.  One solution is to disable the 
specific failed test using the "-x " flag (as shown in the patch)
Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython
2.  cd cpython
3.  hg update 2.7 (needed for 2.7 only)
4.  Copy *.patch to the current directory
5.  patch < python2.7-pgo.patch (or patch < python3.6-pgo.patch)
6.  ./configure
7.  make
To disable PGO
7b. make disable-profile-opt
In the following, please find our sample performance results from latest XEON 
machine, XEON Broadwell EP.
Hardware (HW):      Intel XEON (Broadwell) 8 Cores
BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false
Operating System:   Ubuntu 14.04.3 LTS trusty
OS configuration:   CPU freq set at fixed: 2.6GHz by
                        echo 260 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
                        echo 260 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
                    Address Space Layout Randomization (ASLR) disabled (to 
reduce run to run variation) by
                        echo 0 > /proc/sys/kernel/randomize_va_space
GCC version:        gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)
Benchmark:          Grand Unified Python Benchmark (GUPB)
                    GUPB Source: https://hg.python.org/benchmarks/
Python2.7 results:
    Python source: hg clone https://hg.python.org/cpython cpython
    Python Source: hg update 2.7
    hg id: 0511b1165bb6 (2.7)
    hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
    hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5
        Benchmarks          Speedup(%)
        simple_logging      20
        raytrace            20
        silent_logging      19
        richards            19
        chaos 

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Patrascu, Alecsandru
A trial period on numerous other Python loads in which the provided patches are 
tested is welcomed, to be sure that it works as presented.

Yes, it is easy to change it to use a different training set, or subsets of the 
regrtest by adding additional parameters to the line inside the Makefile that 
runs it. Now, the attached patches run the full regrtest suite. 

Alecsandru

From: gvanros...@gmail.com [mailto:gvanros...@gmail.com] On Behalf Of Guido van 
Rossum
Sent: Saturday, August 22, 2015 7:56 PM
To: Patrascu, Alecsandru
Cc: python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

I'm sorry, but we're just not going to turn this on by default without doing a 
trial period ourselves. Your (and Intel's) contribution is very welcome, but in 
order to establish trust in a feature like this, an optional trial period is 
absolutely required.

Regarding the training set, I agree that regrtest sounds to be better than 
pybench. If we make this an opt-in change, we can experiment with different 
training sets easily. (Also, I haven't seen the patch yet, but I presume it's 
easy to use a different training set? Experimentation should be encouraged.)

On Sat, Aug 22, 2015 at 9:40 AM, Patrascu, Alecsandru 
 wrote:
Hello and thank you for your feedback.

We have measured PGO gain using other workloads also. Our initial choice for 
this optimization was pybench, but the speedup obtained was lower than using 
regrtest and it didn't cover a lot of Python scenarios. Instead, regrtest has 
an uniform distribution for the tests and the resulting binary is overall much 
faster than the default, or trained using other workloads, and thus covering a 
larger pool of Python loads. This optimization was also tested on a production 
environments running OpenStack Swift and got up to 9% improvements.

The reason we proposed this target to be always on is that the obtained 
optimized binary is better out of the box for the general cases.

Alecsandru

From: gvanros...@gmail.com [mailto:gvanros...@gmail.com] On Behalf Of Guido van 
Rossum
Sent: Saturday, August 22, 2015 7:15 PM
To: Patrascu, Alecsandru
Cc: python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

How about we first add a new Makefile target that enables PGO, without turning 
it on by default? Then later we can enable it by default.
Also, I have my doubts about regrtest. How sure are we that it represents a 
typical Python load? Tests are often using a different mix of operations than 
production code.

On Sat, Aug 22, 2015 at 7:46 AM, Patrascu, Alecsandru 
 wrote:
Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel 
Corporation.

I would like to submit a request to turn-on Profile Guided Optimization or PGO 
as the default build option for Python (both 2.7 and 3.6), given its 
performance benefits on a wide variety of workloads and hardware.  For 
instance, as shown from attached sample performance results from the Grand 
Unified Python Benchmark, >20% speed up was observed.  In addition, we are 
seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the 
codes are in Python 2.7. Our analysis indicates the performance gain was mainly 
due to reduction of icache misses and CPU front-end stalls.

Attached is the Makefile patches that modify the all build target and adds a 
new one called "disable-profile-opt". We built and tested this patch for Python 
2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon 
Haswell/Broadwell with 18/8 cores).  We use "regrtest" suite for training as it 
provides the best performance improvement.  Some of the test programs in the 
suite may fail which leads to build fail.  One solution is to disable the 
specific failed test using the "-x " flag (as shown in the patch)

Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython
2.  cd cpython
3.  hg update 2.7 (needed for 2.7 only)
4.  Copy *.patch to the current directory
5.  patch < python2.7-pgo.patch (or patch < python3.6-pgo.patch)
6.  ./configure
7.  make

To disable PGO
7b. make disable-profile-opt

In the following, please find our sample performance results from latest XEON 
machine, XEON Broadwell EP.
Hardware (HW):      Intel XEON (Broadwell) 8 Cores

BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false

Operating System:   Ubuntu 14.04.3 LTS trusty

OS configuration:   CPU freq set at fixed: 2.6GHz by
                        echo 260 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
                        echo 260 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
                    Address Space Layout Randomization (ASLR) disabled (to 
reduce run to run variation) by
                        echo 0 > /proc/sys/kernel/randomize_va_space

GCC version:        gcc 

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Patrascu, Alecsandru
Yes, the results are measured from running the benchmarks from the repo [1].

Furthermore, this optimization is generic and can handle any kind of changes in 
hardware or the CPython 2/3 source code. We are not adding to or modifying 
regrtest and our rule will be applied on the latest tests existing in the 
CPython repo. Since they are up to date and being easy to be executed, this 
proposal makes sure that users will always take benefit from them.

[1] https://hg.python.org/benchmarks/

Alecsandru

From: Eric Snow [mailto:ericsnowcurren...@gmail.com] 
Sent: Saturday, August 22, 2015 8:26 PM
To: Patrascu, Alecsandru
Cc: Python-Dev
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default


On Aug 22, 2015 9:02 AM, "Patrascu, Alecsandru"  
wrote:
[snip] 
> For instance, as shown from attached sample performance results from the 
> Grand Unified Python Benchmark, >20% speed up was observed.
Are you referring to the tests in the benchmarks repo? [1]
How does the real-world performance improvement compare with other languages 
you are targeting for optimization?
And thanks for working on this!  I have several more questions:
What sorts of future changes in CPython's code might interfere with your 
optimizations?
What future additions might stand to benefit?
What changes in existing code might improve optimization opportunities?
What is the added maintenance burden of the optimizations on CPython, if any?
What is the performance impact on non-Intel architectures?  What about older 
Intel architectures?  ...and future ones?
What is Intel's commitment to supporting these (or other) optimizations in the 
future?  How is the practical EOL of the optimizations managed?
Finally, +1 on adding an opt-in Makefile target rather than enabling the 
optimizations by default.
Thanks again!
-eric
[1] https://hg.python.org/benchmarks/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Patrascu, Alecsandru
Thank you Stefan for also pointing out the importance of regrtest as a good 
training set for building Python. Indeed, Ubuntu delivers in their repos the 
Python2/3 binaries already optimized using PGO based on regrtest.

Alecsandru 

-Original Message-
From: Python-Dev 
[mailto:python-dev-bounces+alecsandru.patrascu=intel@python.org] On Behalf 
Of Stefan Behnel
Sent: Saturday, August 22, 2015 8:25 PM
To: python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

Guido van Rossum schrieb am 22.08.2015 um 18:55:
> Regarding the training set, I agree that regrtest sounds to be better 
> than pybench. If we make this an opt-in change, we can experiment with 
> different training sets easily. (Also, I haven't seen the patch yet, 
> but I presume it's easy to use a different training set?

It's just one command in one line, yes.


> Experimentation should be encouraged.)

A well chosen training set can have a notable impact on PGO compiled code in 
general, and switching from pybench to regrtests should make such a difference. 
However, since CPython's overall performance is mostly determined by the 
interpreter loop, general object operations (getattr!) and the basic builtin 
types, of which the regression test suite makes plenty of use, it is rather 
unlikely that other training sets would provide substantially better 
performance for Python code execution.

Note also that Ubuntu has shipped PGO builds based on the regrtests for years, 
and they seemed to be quite happy with it.

Stefan


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/alecsandru.patrascu%40intel.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Patrascu, Alecsandru
I'm sorry, I forgot to mention this, I already opened an issue and the patches 
are uploaded [1].

[1] http://bugs.python.org/issue24915

From: Brett Cannon [mailto:br...@python.org] 
Sent: Saturday, August 22, 2015 9:00 PM
To: Patrascu, Alecsandru; python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

I just realized I didn't see anyone say it, but please upload the patches to 
bugs.Python.org for easier tracking and reviewing.

On Sat, Aug 22, 2015, 08:01 Patrascu, Alecsandru 
 wrote:
Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel 
Corporation.

I would like to submit a request to turn-on Profile Guided Optimization or PGO 
as the default build option for Python (both 2.7 and 3.6), given its 
performance benefits on a wide variety of workloads and hardware.  For 
instance, as shown from attached sample performance results from the Grand 
Unified Python Benchmark, >20% speed up was observed.  In addition, we are 
seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the 
codes are in Python 2.7. Our analysis indicates the performance gain was mainly 
due to reduction of icache misses and CPU front-end stalls.

Attached is the Makefile patches that modify the all build target and adds a 
new one called "disable-profile-opt". We built and tested this patch for Python 
2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon 
Haswell/Broadwell with 18/8 cores).  We use "regrtest" suite for training as it 
provides the best performance improvement.  Some of the test programs in the 
suite may fail which leads to build fail.  One solution is to disable the 
specific failed test using the "-x " flag (as shown in the patch)

Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython
2.  cd cpython
3.  hg update 2.7 (needed for 2.7 only)
4.  Copy *.patch to the current directory
5.  patch < python2.7-pgo.patch (or patch < python3.6-pgo.patch)
6.  ./configure
7.  make

To disable PGO
7b. make disable-profile-opt

In the following, please find our sample performance results from latest XEON 
machine, XEON Broadwell EP.
Hardware (HW):      Intel XEON (Broadwell) 8 Cores

BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false

Operating System:   Ubuntu 14.04.3 LTS trusty

OS configuration:   CPU freq set at fixed: 2.6GHz by
                        echo 260 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
                        echo 260 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
                    Address Space Layout Randomization (ASLR) disabled (to 
reduce run to run variation) by
                        echo 0 > /proc/sys/kernel/randomize_va_space

GCC version:        gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

Benchmark:          Grand Unified Python Benchmark (GUPB)
                    GUPB Source: https://hg.python.org/benchmarks/

Python2.7 results:
    Python source: hg clone https://hg.python.org/cpython cpython
    Python Source: hg update 2.7
    hg id: 0511b1165bb6 (2.7)
    hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
    hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

        Benchmarks          Speedup(%)
        simple_logging      20
        raytrace            20
        silent_logging      19
        richards            19
        chaos               16
        formatted_logging   16
        json_dump           15
        hexiom2             13
        pidigits            12
        slowunpickle        12
        django_v2           12
        unpack_sequence     11
        float               11
        mako                11
        slowpickle          11
        fastpickle          11
        django              11
        go                  10
        json_dump_v2        10
        pathlib             10
        regex_compile       10
        pybench             9.9
        etree_process       9
        regex_v8            8
        bzr_startup         8
        2to3                8
        slowspitfire        8
        telco               8
        pickle_list         8
        fannkuch            8
        etree_iterparse     8
        nqueens             8
        mako_v2             8
        etree_generate      8
        call_method_slots   7
        html5lib_warmup     7
        html5lib            7
        nbody               7
        spectral_norm       7
        spambayes           7
        fastunpickle        6
        meteor_contest      6
        chameleon           6
        rietveld            6
        tornado_http        5
        unpickle_list       5
        pickle_dict         4
        regex_effbot        3
        normal_startup      3
        startup_nosite      3
        etree_parse         2
        call_method_unknown 2
        call_simple         1
  

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-23 Thread Patrascu, Alecsandru
I removed the zip file and uploaded the patches individually. 

Alecsandru

From: Brett Cannon [mailto:br...@python.org] 
Sent: Sunday, August 23, 2015 4:47 AM
To: Patrascu, Alecsandru; python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default


On Sat, 22 Aug 2015 at 11:10 Patrascu, Alecsandru 
 wrote:
I'm sorry, I forgot to mention this, I already opened an issue and the patches 
are uploaded [1].

[1] http://bugs.python.org/issue24915

Great, thanks Alecandru. Do please follow Stefan's comment, though, and upload 
the patch files directly and not as a zip file. That way we can use our code 
review tool to do a proper review of the patches.

-Brett
 


From: Brett Cannon [mailto:br...@python.org]
Sent: Saturday, August 22, 2015 9:00 PM
To: Patrascu, Alecsandru; python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

I just realized I didn't see anyone say it, but please upload the patches to 
bugs.Python.org for easier tracking and reviewing.

On Sat, Aug 22, 2015, 08:01 Patrascu, Alecsandru 
 wrote:
Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel 
Corporation.

I would like to submit a request to turn-on Profile Guided Optimization or PGO 
as the default build option for Python (both 2.7 and 3.6), given its 
performance benefits on a wide variety of workloads and hardware.  For 
instance, as shown from attached sample performance results from the Grand 
Unified Python Benchmark, >20% speed up was observed.  In addition, we are 
seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the 
codes are in Python 2.7. Our analysis indicates the performance gain was mainly 
due to reduction of icache misses and CPU front-end stalls.

Attached is the Makefile patches that modify the all build target and adds a 
new one called "disable-profile-opt". We built and tested this patch for Python 
2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon 
Haswell/Broadwell with 18/8 cores).  We use "regrtest" suite for training as it 
provides the best performance improvement.  Some of the test programs in the 
suite may fail which leads to build fail.  One solution is to disable the 
specific failed test using the "-x " flag (as shown in the patch)

Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython
2.  cd cpython
3.  hg update 2.7 (needed for 2.7 only)
4.  Copy *.patch to the current directory
5.  patch < python2.7-pgo.patch (or patch < python3.6-pgo.patch)
6.  ./configure
7.  make

To disable PGO
7b. make disable-profile-opt

In the following, please find our sample performance results from latest XEON 
machine, XEON Broadwell EP.
Hardware (HW):      Intel XEON (Broadwell) 8 Cores

BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false

Operating System:   Ubuntu 14.04.3 LTS trusty

OS configuration:   CPU freq set at fixed: 2.6GHz by
                        echo 260 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
                        echo 260 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
                    Address Space Layout Randomization (ASLR) disabled (to 
reduce run to run variation) by
                        echo 0 > /proc/sys/kernel/randomize_va_space

GCC version:        gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

Benchmark:          Grand Unified Python Benchmark (GUPB)
                    GUPB Source: https://hg.python.org/benchmarks/

Python2.7 results:
    Python source: hg clone https://hg.python.org/cpython cpython
    Python Source: hg update 2.7
    hg id: 0511b1165bb6 (2.7)
    hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
    hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

        Benchmarks          Speedup(%)
        simple_logging      20
        raytrace            20
        silent_logging      19
        richards            19
        chaos               16
        formatted_logging   16
        json_dump           15
        hexiom2             13
        pidigits            12
        slowunpickle        12
        django_v2           12
        unpack_sequence     11
        float               11
        mako                11
        slowpickle          11
        fastpickle          11
        django              11
        go                  10
        json_dump_v2        10
        pathlib             10
        regex_compile       10
        pybench             9.9
        etree_process       9
        regex_v8            8
        bzr_startup         8
        2to3                8
        slowspitfire        8
        telco               8
        pickle_list         8
        fannkuch            8
        etree_iterparse     8
        nqueens             8
        mako_v2             8
        etree_generate      8
   

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-25 Thread Patrascu, Alecsandru
Indeed, as Gregory well mentioned, PGO is unrelated to a particular CPU on 
which we do profiling.

From: Python-Dev 
[mailto:python-dev-bounces+alecsandru.patrascu=intel@python.org] On Behalf 
Of Gregory P. Smith
Sent: Tuesday, August 25, 2015 7:44 PM
To: Xavier Combelle; python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

PGO is unrelated to the particular CPU the profiling is done on. (It is 
conceivable that it'd make a small difference but I've never observed that in 
practice)
On Tue, Aug 25, 2015, 9:28 AM Xavier Combelle  wrote:
Pardon me if I'm not in the right place to ask the following naive question. 
(say me if it's the case)
Does Profile Guided Optimization performance improvements are specific to the 
chip where the built is done or the performance is better on a larger set of 
chips?


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Hash computation enhancement for {buffer, string, unicode}object

2015-09-14 Thread Patrascu, Alecsandru
Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel 
Corporation.

I would like to submit a patch that improves the performance of the hash 
computation code on stringobject, bufferobject and unicodeobject. As can be 
seen from the attached sample performance results from the Grand Unified Python 
Benchmark, speedups up to 40% were observed. Furthermore, we see a 5-7% 
performance on OpenStack/Swift, where most of the code is in Python 2.7.

Attached is the patch that modifies Object/stringobject.c, 
Object/bufferobject.c and Object/unicodeobject.c files. We built and tested 
this patch for Python 2.7 on our Linux machines (CentOS 7/Ubuntu Server 14.04, 
Intel Xeon Haswell/Broadwell with 18/8 cores). 

I've also opened an issue on the bug tracker: http://bugs.python.org/issue25106

Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython 
2.  cd cpython 
3.  hg update 2.7
4.  Copy hash8.patch to the current directory 
5.  hg import --no-commit hash8.patch
6.  ./configure 
7.  make



In the following, please find our sample performance results measured on a XEON 
Haswell machine.  

Hardware (HW):  Intel XEON (Haswell) 18 Cores

BIOS settings:  Intel Turbo Boost Technology: false
Hyper-Threading: false

Operating System:   Ubuntu 14.04.3 LTS trusty

OS configuration:   CPU freq set at fixed: 2.0GHz by
echo 200 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
echo 200 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
Address Space Layout Randomization (ASLR) disabled (to 
reduce run to run variation) by
echo 0 > /proc/sys/kernel/randomize_va_space

GCC version:gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

Benchmark:  Grand Unified Python Benchmark (GUPB)
GUPB Source: https://hg.python.org/benchmarks/  
  

Python2.7 results:
Python source: hg clone https://hg.python.org/cpython cpython
Python Source: hg update 2.7

Benchmarks  Speedup(%)
unpack_sequence 40.32733766
chaos   24.84002537
chameleon   23.01392651
silent_logging  22.27202911
django  20.83842317
etree_process   20.46968294
nqueens 20.34234985
pathlib 19.63445919
pidigits19.34722148
etree_generate  19.25836634
pybench 19.06895825
django_v2   18.06073108
etree_iterparse 17.3797149
fannkuch17.08120879
pickle_list 16.60363602
raytrace16.0316265
slowpickle  15.86611184
pickle_dict 15.30447114
call_simple 14.42909032
richards14.2949594
simple_logging  13.6522626
etree_parse 13.38113097
json_dump_v212.2655
float   11.88164311
mako11.20606516
spectral_norm   11.04356684
hg_startup  10.57686164
mako_v2 10.37912648
slowunpickle10.24030714
go  10.03567319
meteor_contest  9.956231435
normal_startup  9.607401586
formatted_logging   9.601244811
html5lib9.082603748
2to38.741557816
html5lib_warmup 8.268150981
nbody   7.507012306
regex_compile   7.153922724
bzr_startup 7.140244739
telco   6.869411927
slowspitfire5.746323922
tornado_http5.24360121
rietveld3.865704876
regex_v83.777622219
hexiom2 3.586305282
json_dump   3.477551682
spambayes   3.183991854
fastunpickle2.971645347
fastpickle  0.673086656
regex_effbot0.127946837
json_load   0.023727176

Thank you,
Alecsandru


hash8-v01.patch
Description: hash8-v01.patch
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] CPython build options for out-of-the box performance

2016-02-09 Thread Patrascu, Alecsandru
Hi all,

This is Alecsandru from the Dynamic Scripting Languages Optimization Team at 
Intel Corporation. I want to open a discussion regarding the way CPython is 
built, mainly the options that are available to the programmers. Analyzing the 
CPython ecosystem we can see that there are a lot of users that just download 
the sources and hit the commands "./configure", "make" and "make install" once 
and then continue using it with their Python scripts. One of the problems with 
this workflow it that the users do not benefit from the entire optimization 
features that are existing in the build system, such as PGO and LTO.

Therefore, I propose a workflow, like the following. Assume some work has to be 
done into the CPython interpreter, a developer can do the following steps:
A. Implementation and debugging phase. 
1. The command "./configure PYDIST=debug" is ran once. It will enable the 
Py_DEBUG, -O0 and -g flags
2. The command "make" is ran once or multiple times

B. Testing the implementation from step A, in a pre-release environment
1. The command "./configure PYDIST=devel" is ran once. It will disable the 
Py_DEBUG flags and will enable the -O3 and -g flags, and it is just like the 
current implementation in CPython
2. The command "make" is ran once or multiple times

C. For any other CPython usage, for example distributing the interpreter, 
installing it inside an operating system, or just the majority of users who are 
not CPython developers and only want to compile it once and use it as-is:
1. The command "./configure" is ran once. Alternatively, the command  
"./configure PYDIST=release" can be used. It will disable all debugging 
functionality, enable the -O3 flag and will enable PGO and LTO.
2. The command "make" is ran once

If you think this benefits CPython, I can create an issue and post the patches 
that enable all of the above. 

Thank you,
Alecsandru

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] CPython build options for out-of-the box performance

2016-02-14 Thread Patrascu, Alecsandru
I've added the patches here[1], to be more clear about the workflow and the 
small modifications in the CPython build system.

[1] http://bugs.python.org/issue26359

Thank you,
Alecsandru

> -Original Message-
> From: Python-Dev [mailto:python-dev-
> bounces+alecsandru.patrascu=intel@python.org] On Behalf Of Patrascu,
> Alecsandru
> Sent: Tuesday, February 9, 2016 1:45 PM
> To: python-dev@python.org
> Subject: [Python-Dev] CPython build options for out-of-the box performance
> 
> Hi all,
> 
> This is Alecsandru from the Dynamic Scripting Languages Optimization Team
> at Intel Corporation. I want to open a discussion regarding the way
> CPython is built, mainly the options that are available to the
> programmers. Analyzing the CPython ecosystem we can see that there are a
> lot of users that just download the sources and hit the commands
> "./configure", "make" and "make install" once and then continue using it
> with their Python scripts. One of the problems with this workflow it that
> the users do not benefit from the entire optimization features that are
> existing in the build system, such as PGO and LTO.
> 
> Therefore, I propose a workflow, like the following. Assume some work has
> to be done into the CPython interpreter, a developer can do the following
> steps:
> A. Implementation and debugging phase.
> 1. The command "./configure PYDIST=debug" is ran once. It will enable
> the Py_DEBUG, -O0 and -g flags
> 2. The command "make" is ran once or multiple times
> 
> B. Testing the implementation from step A, in a pre-release environment
> 1. The command "./configure PYDIST=devel" is ran once. It will disable
> the Py_DEBUG flags and will enable the -O3 and -g flags, and it is just
> like the current implementation in CPython
> 2. The command "make" is ran once or multiple times
> 
> C. For any other CPython usage, for example distributing the interpreter,
> installing it inside an operating system, or just the majority of users
> who are not CPython developers and only want to compile it once and use it
> as-is:
> 1. The command "./configure" is ran once. Alternatively, the command
> "./configure PYDIST=release" can be used. It will disable all debugging
> functionality, enable the -O3 flag and will enable PGO and LTO.
> 2. The command "make" is ran once
> 
> If you think this benefits CPython, I can create an issue and post the
> patches that enable all of the above.
> 
> Thank you,
> Alecsandru
> 
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-
> dev/alecsandru.patrascu%40intel.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com