Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?
I've now got a patch to add new COMPARE_IS_NONE and COMPARE_IS_NOT_NONE bytecodes to CPython to attempt to speed up the very common "is None" and "is not None" expressions. It's a very simple change overall. Here is my patch: https://gist.github.com/benhoyt/e5ba19afe7b869fd743c1c39fc2afdf8 Note: The intent here is a proof of concept, not a CPython-quality implementation. I've put the code to generate these in compile.c, though it may be better off in peephole.c. Also, I'm not sure two new bytecodes are needed -- it'd be simpler and probably almost as fast to have one bytecode instruction COMPARE_NONE with an argument that determines the "is" vs "is not" part. Anyway, it looks like a nice improvement on a micro-benchmark, with a 20% speed increase for "is None" and "is not None" expressions using timeit. See results at [1] below. However, I'm not very familiar with the python/performance benchmarks and with my first try at running those I'm getting wildly varying results (most faster, but some tests say 1.7x slower, so I'm pretty sure this is an artifact of how I'm running them). See [2] below for my results. Would someone with more experience running those be able to run the performance tests and post results? Any other thoughts welcome as well. -Ben [1] Here are the micro-benchmarks using timeit against the latest master (normal build and also PGO) as well as with my patch: latest master (e613e6add5f07ff6aad5802924596b631b707d2a, June 27) ./configure make -j ../test_none.sh x = 1234; x is None -- 2000 loops, best of 5: 19.8 nsec per loop x = 1234; x is not None -- 1000 loops, best of 5: 20 nsec per loop x = None; x is None -- 1000 loops, best of 5: 20.7 nsec per loop x = None; x is not None -- 1000 loops, best of 5: 20.8 nsec per loop avg 20.3 nsec per loop ./configure --enable-optimizations make -j ../test_none.sh x = 1234; x is None -- 2000 loops, best of 5: 18.6 nsec per loop x = 1234; x is not None -- 2000 loops, best of 5: 19.2 nsec per loop x = None; x is None -- 1000 loops, best of 5: 19.5 nsec per loop x = None; x is not None -- 2000 loops, best of 5: 19.4 nsec per loop avg 19.2 nsec per loop is_none_bytecode branch (with my patch) ./configure make -j ../test_none.sh x = 1234; x is None -- 2000 loops, best of 5: 16 nsec per loop x = 1234; x is not None -- 2000 loops, best of 5: 16.4 nsec per loop x = None; x is None -- 2000 loops, best of 5: 16.6 nsec per loop x = None; x is not None -- 2000 loops, best of 5: 15.5 nsec per loop avg 16.1 nsec per loop (21% faster than master) ./configure --enable-optimizations make -j ../test_none.sh x = 1234; x is None -- 2000 loops, best of 5: 15.6 nsec per loop x = 1234; x is not None -- 2000 loops, best of 5: 16.4 nsec per loop x = None; x is None -- 2000 loops, best of 5: 15.7 nsec per loop x = None; x is not None -- 2000 loops, best of 5: 15.4 nsec per loop avg 15.8 nsec per loop (18% faster than master) [2] Benchmarks comparing master and is_none_bytecode patch (each compiled with --enable-optimizations) using python/performance: +-++--+ | Benchmark | master_opt | is_none_bytecode_opt | +=++==+ | 2to3| 617 ms | 541 ms: 1.14x faster (-12%) | +-++--+ | chameleon | 19.9 ms| 18.6 ms: 1.07x faster (-7%) | +-++--+ | crypto_pyaes| 208 ms | 201 ms: 1.04x faster (-3%) | +-++--+ | deltablue | 13.8 ms| 12.9 ms: 1.07x faster (-7%) | +-++--+ | django_template | 245 ms | 233 ms: 1.05x faster (-5%) | +-++--+ | dulwich_log | 132 ms | 126 ms: 1.05x faster (-5%) | +-++--+ | fannkuch| 898 ms | 849 ms: 1.06x faster (-5%) | +-++--+ | float | 204 ms | 191 ms: 1.07x faster (-7%) | +-++--+ | genshi_text | 57.5 ms| 54.8 ms: 1.05x faster (-5%) | +-++--+ | genshi_xml | 122 ms | 114 ms: 1.07x faster (-6%) | +-++--+ | hexiom | 18.4 ms| 26.9 ms: 1.46x slower (+46%) | +-++--+ | html5lib| 155 ms | 265 ms: 1.72x slower (+72%) | +---
Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?
(Victor wears his benchmark hat.) 2017-06-28 15:50 GMT+02:00 Ben Hoyt : > ../test_none.sh > x = 1234; x is None -- 2000 loops, best of 5: 19.8 nsec per loop > x = 1234; x is not None -- 1000 loops, best of 5: 20 nsec per loop > x = None; x is None -- 1000 loops, best of 5: 20.7 nsec per loop > x = None; x is not None -- 1000 loops, best of 5: 20.8 nsec per loop > avg 20.3 nsec per loop Hum, please use perf timeit instead of timeit, it's more reliable. See also: "How to get reproductible benchmark results" http://perf.readthedocs.io/en/latest/run_benchmark.html#how-to-get-reproductible-benchmark-results > [2] Benchmarks comparing master and is_none_bytecode patch (each > compiled with --enable-optimizations) using python/performance: > > +-++--+ > | Benchmark | master_opt | is_none_bytecode_opt | > +=++==+ > | 2to3| 617 ms | 541 ms: 1.14x faster (-12%) | > +-++--+ > | chameleon | 19.9 ms| 18.6 ms: 1.07x faster (-7%) | > +-++--+ > | crypto_pyaes| 208 ms | 201 ms: 1.04x faster (-3%) | > +-++--+ > | deltablue | 13.8 ms| 12.9 ms: 1.07x faster (-7%) | FYI you can add the -G option to perf compare_to to sort results by speed (group faster & slower). It gives a more readable table. I also like using --min-speed=5 to ignore changes smaller than 5%, it reduces the noise and makes the stable more readable. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?
On Wed, 28 Jun 2017 09:50:10 -0400 Ben Hoyt wrote: > I've now got a patch to add new COMPARE_IS_NONE and > COMPARE_IS_NOT_NONE bytecodes to CPython to attempt to speed up the > very common "is None" and "is not None" expressions. It's a very > simple change overall. Here is my patch: > https://gist.github.com/benhoyt/e5ba19afe7b869fd743c1c39fc2afdf8 I don't want to discourage you, but this is unlikely to produce significant speedups on real-world code. The simple reason is that comparing to None is not a bottleneck for any real application -- comparing to None is probably damn fast compared to everything else the real application does. That said, it would be nice if you could get stable benchmark results to validate that theory (or not!) ;-) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?
Thanks for the pointers -- I wasn't aware of perf timeit. I'll try to get some more stable benchmark results. -Ben On Wed, Jun 28, 2017 at 10:20 AM, Victor Stinner wrote: > (Victor wears his benchmark hat.) > > 2017-06-28 15:50 GMT+02:00 Ben Hoyt : >> ../test_none.sh >> x = 1234; x is None -- 2000 loops, best of 5: 19.8 nsec per loop >> x = 1234; x is not None -- 1000 loops, best of 5: 20 nsec per loop >> x = None; x is None -- 1000 loops, best of 5: 20.7 nsec per loop >> x = None; x is not None -- 1000 loops, best of 5: 20.8 nsec per loop >> avg 20.3 nsec per loop > > Hum, please use perf timeit instead of timeit, it's more reliable. See also: > > "How to get reproductible benchmark results" > http://perf.readthedocs.io/en/latest/run_benchmark.html#how-to-get-reproductible-benchmark-results > >> [2] Benchmarks comparing master and is_none_bytecode patch (each >> compiled with --enable-optimizations) using python/performance: >> >> +-++--+ >> | Benchmark | master_opt | is_none_bytecode_opt | >> +=++==+ >> | 2to3| 617 ms | 541 ms: 1.14x faster (-12%) | >> +-++--+ >> | chameleon | 19.9 ms| 18.6 ms: 1.07x faster (-7%) | >> +-++--+ >> | crypto_pyaes| 208 ms | 201 ms: 1.04x faster (-3%) | >> +-++--+ >> | deltablue | 13.8 ms| 12.9 ms: 1.07x faster (-7%) | > > FYI you can add the -G option to perf compare_to to sort results by > speed (group faster & slower). It gives a more readable table. I also > like using --min-speed=5 to ignore changes smaller than 5%, it reduces > the noise and makes the stable more readable. > > Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?
> I don't want to discourage you, but this is unlikely to produce > significant speedups on real-world code. The simple reason is that > comparing to None is not a bottleneck for any real application -- > comparing to None is probably damn fast compared to everything else > the real application does. That's true. However, I don't think that makes small optimizations worthless. For example, the core devs speed up memory allocation or method calls or whatever by a few percent, and consider it worthwhile, because it all adds up. > That said, it would be nice if you could get stable benchmark results > to validate that theory (or not!) ;-) Yep, I'm more than willing to accept that outcome if that's what the results show! :-) -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com