Yury Selivanov added the comment:
BTW, there are a couple of unit-tests that fail. Both can be easily fixed.
To really move this thing forward, we need to profile the memory usage. First,
it would be interesting to see how much additional memory is consumed if we
optimize every code object
Yury Selivanov added the comment:
Yes, this patch depends on PEP 509.
--
___
Python tracker
<http://bugs.python.org/issue26219>
___
___
Python-bugs-list mailin
Yury Selivanov added the comment:
For those interested in reviewing this patch at some point: please wait until I
upload a new version. The current patch is somewhat outdated.
--
___
Python tracker
<http://bugs.python.org/issue26
Changes by Yury Selivanov :
--
hgrepos: +333
___
Python tracker
<http://bugs.python.org/issue26219>
___
___
Python-bugs-list mailing list
Unsubscribe:
Yury Selivanov added the comment:
Attaching a new version of the patch. Issues #26058 and #26110 need to be
merged before we can start reviewing it.
--
Added file: http://bugs.python.org/file41776/opcache2.patch
___
Python tracker
<h
Yury Selivanov added the comment:
`async` and `await` are only keywords in the context of an 'async def'
function. It will be that way until Python 3.7. That said, I'm not sure what
to do about the keywords module.
Nick, Victor, thoughts?
--
nosy: +haypo, ncogh
Yury Selivanov added the comment:
> On Feb 2, 2016, at 7:00 AM, STINNER Victor wrote:
>
> So, I ran ssh://h...@hg.python.org/benchmarks with my patch. It looks like
> some benchmarks are up to 4% faster:
Please use -r flag for perf.py
--
nosy: +Yu
Yury Selivanov added the comment:
> Aside of the performance, my colleague also told me that heavy pressure on
> the memory allocator can slowly create framgmentation of the heap memory,
> which is true.
Right, but there are so many other things which contribute to the memory
frag
Yury Selivanov added the comment:
I'm assigning this patch to myself to commit it in 3.6 later.
--
assignee: -> yselivanov
components: +Interpreter Core
stage: -> patch review
versions: +Python 3.6 -Python 3.5
___
Python tra
Yury Selivanov added the comment:
Hi Jesse, could you please update your patch with a detailed comment
summarizing your research? Also it would be great if you can provide a
separate patch for 2.7.
--
nosy: +yselivanov
___
Python tracker
<h
Yury Selivanov added the comment:
Victor,
Thanks for the initial review. I'll work on the patch sometime later next
week.
As for test_vicious_descriptor_nonsense -- yeah, I saw that too. I know what's
going on there and I know how to fix that. FWIW that test tests a ver
Yury Selivanov added the comment:
> I don't know what "softly deprecate" means. Hopefully someone involved with
> the PEP can answer that question.
I actually thought about emitting DeprecationWarnings in 3.6 for async/await
NAME tokens. I think it's a reasonable
Yury Selivanov added the comment:
unpack_sequence contains 400 lines of this: "a, b, c, d, e, f, g, h, i, j =
to_unpack". This code doesn't even touch BINARY_SUBSCR or BINARY_ADD.
Zach, could you please run your benchmarks in rigorous mode (perf.py -r)? I'd
also sugges
New submission from Yury Selivanov:
See also issue #21955
--
components: Interpreter Core
messages: 259492
nosy: haypo, yselivanov, zbyrne
priority: normal
severity: normal
stage: needs patch
status: open
title: ceval: Optimize [] operation similarly to CPython 2.7
type: performance
Yury Selivanov added the comment:
Attaching a new patch -- rewritten to optimize -, *, +, -=, *= and +=. I also
removed the optimization of [] operator -- that should be done in a separate
patch and in a separate issue.
Some nano-benchmarks (best of 3):
python -m timeit "sum([x + x +
Changes by Yury Selivanov :
--
nosy: +yselivanov
___
Python tracker
<http://bugs.python.org/issue26273>
___
___
Python-bugs-list mailing list
Unsubscribe:
Yury Selivanov added the comment:
> Yury suggested running perf.py twice with the binaries swapped
Yeah, I had some experience with perf.py when its results were skewed depending
on what you test first. Hopefully Victor's new patch will fix that
http://bugs.python.org/is
Yury Selivanov added the comment:
Antoine, yeah, it's probably turbo boost related. There is no easy way to turn
it off on mac os x, though. I hope Victor's patch to perf.py will help to
mitigate this.
Victor, Marc-Andre,
Updated results of nano-bench (best of 10):
-m timeit
Yury Selivanov added the comment:
Here's a very interesting table from Zach Byrne:
http://bugs.python.org/issue21955#msg259490
It shows that some benchmarks are indeed very unstable. This also correlates
with my own experience.
These ones are very unstable: pickle_dict, nbody, reg
Yury Selivanov added the comment:
> Fast patch is already implemented in long_mul(). May be we should just use
> this function if both arguments are exact int, and apply the switch
> optimization inside.
Agree.
BTW, what do you think about using __int128 when available? That w
Yury Selivanov added the comment:
> I don't think. I run benchmarks (for __int128) :-)
Never mind... Seems that __int128 is still an experimental feature and some
versions of clang even had bugs with it.
--
___
Python tracke
Yury Selivanov added the comment:
Zach, first I was going to collect some stats on this (as Serhiy also
suggests). It would be interesting to collect some stats on how many times
BINARY_SUBSCR receives lists, tuples, dicts, and other types. I'd instrument
the code to collect those stat
Yury Selivanov added the comment:
Zach, BTW, you can see how I instrumented ceval for stats collection in the
patch for issue #26219. That's only for the research purposes, we won't commit
that... or maybe we will, but in a sepa
Yury Selivanov added the comment:
> From the two checks on Python/compile.c:
Stuff done in compile.c is cached in *.pyc files.
The for-loop you saw shouldn't take a lot of time - it iterates through
function parameters, which an average function doesn't have
Yury Selivanov added the comment:
Attaching a second version of the patch. (BTW, Serhiy, I tried your idea of
using a switch statement to optimize branches
(https://github.com/1st1/cpython/blob/fastint2/Python/ceval.c#L5390) -- no
detectable speed improvement).
I decided to add fast path
Yury Selivanov added the comment:
> Let's apply this.
Merged. Martin, Berker, and Ned, thanks for testing this patch out.
--
resolution: -> fixed
stage: commit review -> resolved
status: open -> closed
___
Python tracker
<ht
Yury Selivanov added the comment:
> I agree with Marc-Andre, people doing FP-heavy math in Python use Numpy
> (possibly with Numba, Cython or any other additional library).
> Micro-optimizing floating-point operations in the eval loop makes little
> sense IMO.
I disagree.
30% f
Yury Selivanov added the comment:
>But the next question is then the overhead on the "slow" path, which requires
>a benchmark too! For example, use a subtype of int.
telco is such a benchmark (although it's very unstable). It uses decimals
extensively. I've tes
Yury Selivanov added the comment:
>> But it is faster. That's visible on many benchmarks. Even simple
> timeit oneliners can show that. Probably it's because that such
> benchmarks usually combine floats and ints, i.e. "2 * smth" instead of
> "2.0 *
Yury Selivanov added the comment:
>
> Stefan Krah added the comment:
>
> It's instructive to run ./python Modules/_decimal/tests/bench.py (Hit Ctrl-C
> after the first cdecimal result, 5 repetitions or so).
>
> fastint2.patch speeds up floats enormously a
Yury Selivanov added the comment:
Berker, I'll fix the windows issue in a few hours. Sorry for breaking things.
--
nosy: +Yury.Selivanov
___
Python tracker
<http://bugs.python.org/is
Yury Selivanov added the comment:
Everything should be OK now (both broken tests and using rlcompleter on
Windows). Please confirm.
--
___
Python tracker
<http://bugs.python.org/issue25
Yury Selivanov added the comment:
> People should stop getting hung up about benchmarks numbers and instead
> should first think about what they are trying to *achieve*. FP performance in
> pure Python does not seem like an important goal in itself.
I'm not sure how to respond t
Yury Selivanov added the comment:
tl;dr I'm attaching a new patch - fastint4 -- the fastest of them all. It
incorporates Serhiy's suggestion to export long/float functions and use them.
I think it's reasonable complete -- please review it, and let's get it
commi
Yury Selivanov added the comment:
Antoine, FWIW I agree on most of your points :) And yes, numpy, scipy, numba,
etc rock.
Please take a look at my fastint4.patch. All tests pass, no performance
regressions, no crazy inlining of floating point exceptions etc. And yet we
have a nice
New submission from Yury Selivanov:
The attached patch drastically speeds up PyLong_AsDouble for single digit longs:
-m timeit -s "x=2" "x*2.2 + 2 + x*2.5 + 1.0 - x / 2.0 + (x+0.1)/(x-0.1)*2 +
(x+10)*(x-30)"
with patch: 0.414
without: 0.612
spectral_norm: 1.05x faster
New submission from Yury Selivanov:
The attached patch optimizes floor division for ints.
### spectral_norm ###
Min: 0.319087 -> 0.289172: 1.10x faster
Avg: 0.322564 -> 0.294319: 1.10x faster
Significant (t=21.71)
Stddev: 0.00249 -> 0.01277: 5.1180x larger
-m timeit -s "x=22331
Yury Selivanov added the comment:
Attaching another approach -- fastint5.patch.
Similar to what fastint4.patch does, but doesn't export any new APIs. Instead,
similarly to abstract.c, it uses type slots directly.
--
Added file: http://bugs.python.org/file41815/fastint5.
Yury Selivanov added the comment:
Looks like we want to specialize it for lists, tuples, and dicts; as expected.
Not so sure about [-1, but I suggest to benchmark it anyways ;)
--
___
Python tracker
<http://bugs.python.org/issue26
Yury Selivanov added the comment:
Unless there are any objections, I'll commit fastint5.patch in a day or two.
--
___
Python tracker
<http://bugs.python.org/is
Yury Selivanov added the comment:
>> Unless there are any objections, I'll commit fastint5.patch in a day or two.
> Please don't. I would like to have time to benchmark all these patches (there
> are now 9 patches attached to the issue :-)) and I would like to hear
>
Yury Selivanov added the comment:
> Regardless of the performance, the fastint5.patch looks like the
least invasive approach to me. It also doesn't incur as much
maintenance overhead as the others do.
Thanks. It's a result of an enlightenment that can only come
after running bench
Yury Selivanov added the comment:
> Between fastintfloat_alt.patch and fastint5.patch, I prefer
> fastintfloat_alt.patch which is much easier to read, so probably much easier
> to debug. I hate huge macro when I have to debug code in gdb :-( I also like
> very much the idea
Yury Selivanov added the comment:
Anyways, if it's about macro vs non-macro, I can inline the macro by hand
(which I think is an inferior approach here). But I'd like the final code to
use my approach of using slots directly, instead of modifying
longobject/floatobject to expo
Yury Selivanov added the comment:
As to weather we want this patch committed or not, here's a
mini-macro-something benchmark:
$ ./python.exe -m timeit -s "x=2" "x + 10 + x * 20 + x* 10 + 20 -x"
1000 loops, best of 3: 0.115 usec per loop
$ python3.5 -m timeit
Yury Selivanov added the comment:
Thanks, Serhiy,
> But I don't quite understand why it adds any gain.
Perhaps, and this is just a guess - the fast path does only couple of eq tests
& one call for the actual op. If it's long+long then long_add will be called
directly.
Yury Selivanov added the comment:
> ast.Constant is *not* emited by the compiler to not break backward
compatibility. I *know* that there is no stable API on AST, but I
noticed some issues when working on my AST project. For example, pip
doesn't work because an internal library uses AST
Yury Selivanov added the comment:
Attached is the new version of fastint5 patch. I fixed most of the review
comments. I also optimized %, << and >> operators. I didn't optimize other
operators because they are less common. I guess we have to draw a line
somwhere...
Vic
Yury Selivanov added the comment:
Thanks a lot for the review, Serhiy!
--
resolution: -> fixed
stage: -> resolved
status: open -> closed
type: -> performance
___
Python tracker
<http://bugs.python
Changes by Yury Selivanov :
Added file: http://bugs.python.org/file41830/fastint5_3.patch
___
Python tracker
<http://bugs.python.org/issue21955>
___
___
Python-bugs-list m
Changes by Yury Selivanov :
Added file: http://bugs.python.org/file41831/fastint5_4.patch
___
Python tracker
<http://bugs.python.org/issue21955>
___
___
Python-bugs-list m
Yury Selivanov added the comment:
> Ok. Now I'm lost. We have so many patches :-) Which one do you prefer?
To no-ones surprise I prefer fastint5, because it optimizes almost all binary
operators on both ints and floats.
inline-2.patch only optimizes just + and - for just ints.
Yury Selivanov added the comment:
I think this is a duplicate of http://bugs.python.org/issue26280...
--
___
Python tracker
<http://bugs.python.org/issue26
Yury Selivanov added the comment:
You're also running a very small subset of all benchmarks available. Please try
the '-b all' option. I'll also run benchmarks on my machines.
--
___
Python tracker
<http://bug
Yury Selivanov added the comment:
> ### regex_v8 ###
> Min: 0.041323 -> 0.048099: 1.16x slower
> Avg: 0.041624 -> 0.049318: 1.18x slower
I think this is a random fluctuation, that benchmark (and re lib) doesn't use
the operators too much. It can't be THAT slower jus
Yury Selivanov added the comment:
> Actually, please fix the comment. We don't want someone wondering what those
> "macro-benchmarks" are.
If spectral-norm and nbody aren't good benchmarks then let's remove them from
our benchmarks suite.
I'll remove that
Yury Selivanov added the comment:
> Sorry, I was a bit brief: The current comment says "decimal" instead of
> "double". It should be changed to "double".
Oh, got it now, sorry. I rephrased the comment a bit, hopefully it
Yury Selivanov added the comment:
Alright, I ran a few benchmarks myself. In rigorous mode regex_v8 has the same
performance on my 2013 Macbook Pro and an 8-years old i7 CPU (Linux).
Here're results of "perf.py -b raytrace,spectral_norm,meteor_contest,nbody
../cpython/python.exe
Yury Selivanov added the comment:
>From what I can see there is no negative impact of the patch on stable macro
>benchmarks.
There is quite a detectable positive impact on most of integer and float
operations from my patch. 13-16% on nbody and spectral_norm benchmarks is
still impr
Yury Selivanov added the comment:
> Please don't commit it right now. Yes, due to using macros the patch looks
> simple, but macros expanded to complex code. We need more statistics.
But what you will use to gather statistics data? Test suite isn't
representative, and we a
Yury Selivanov added the comment:
Attaching another patch - fastint6.patch that only optimizes longs (no FP fast
path).
> #26288 brought a great speedup for floats. With fastint5_4.patch *on top of
> #26288* I see no improvement for floats and a big slowdown for _decimal.
What benchma
Yury Selivanov added the comment:
> I ran the mpmath test suite with Python 3.6 and with the fastint6 patch. The
> overall increase when using Python long type was about 1%. When using gmpy2's
> mpz type, there was a slowdown of about 2%.
> I will run more tests tonight.
P
Yury Selivanov added the comment:
I'm not sure why this issue is open... Closing it.
--
status: open -> closed
___
Python tracker
<http://bugs.python.org
Yury Selivanov added the comment:
Assigning the issue to myself to make sure it won't be forgotten before it's
too late. Anish or Marco, feel free to propose a patch.
--
assignee: -> yselivanov
stage: -> needs patch
versio
Yury Selivanov added the comment:
Serhiy, Victor, thanks for the review. Attaching an updated version of the
patch.
--
Added file: http://bugs.python.org/file41860/floor_div_2.patch
___
Python tracker
<http://bugs.python.org/issue26
Yury Selivanov added the comment:
There is no drastic difference on where you implement the fast path. I'd
implement all specializations/optimizations in longobject.c and optimize ceval
to call slots directly. That way, the implact on ceval performance would be
mi
New submission from Yury Selivanov:
The attached patch implements fast path for modulo division of single digit
longs.
Some timeit micro-benchmarks:
-m timeit -s "x=22331" "x%2;x%3;x%4;x%5;x%6;x%7;x%8;x%99;x%100;"
with patch: 0.213 usec
without patch: 0.602 usec
Changes by Yury Selivanov :
Added file: http://bugs.python.org/file41862/floor_div_3.patch
___
Python tracker
<http://bugs.python.org/issue26289>
___
___
Python-bug
Changes by Yury Selivanov :
--
assignee: -> yselivanov
___
Python tracker
<http://bugs.python.org/issue26289>
___
___
Python-bugs-list mailing list
Unsubscrib
Yury Selivanov added the comment:
Also, every other operation for longs (except %, for which I created issue
#26315) is optimized for single digit longs. This optimization is also
important for users of operator.floordiv etc. Even if we decide to provide a
fast path in ceval, it's goi
Yury Selivanov added the comment:
Attaching a new patch -- big thanks to Mark and Serhiy.
> div = ~((left - 1) / right)
The updated code works slightly faster - ~0.285 usec vs ~0.3 usec.
--
Added file: http://bugs.python.org/file41870/floor_div_4.pa
Yury Selivanov added the comment:
> Is it worth to move the optimization inside l_divmod? Will this speed up or
> slow down other operations that use l_divmod?
Attaching a new patch -- fast_divmod.patch
It combines patches for this issue and issue #26315.
Individual timeit benchmarks w
Yury Selivanov added the comment:
> Maybe we should just close the issue?
I'll take a closer look at gmpy later. Please don't close.
--
___
Python tracker
<http://bugs.pytho
Yury Selivanov added the comment:
> mod = left % right if size_a == size_b else right - 1 - (left - 1) % right
This works, Mark! Again, the difference in performance is very subtle, but the
code is more compact.
-m timeit -s "x=22331" "x//2;x//-3;x//4;x//5;x//-6;x//7;
Changes by Yury Selivanov :
Added file: http://bugs.python.org/file41876/fast_divmod_3.patch
___
Python tracker
<http://bugs.python.org/issue26289>
___
___
Python-bug
Yury Selivanov added the comment:
Attaching an updated patch.
> 1. I think you're missing the final multiplication by Py_SIZE(b) in the
> fast_mod code. Which is odd, because your tests should catch that. So either
> you didn't run the tests, or that code path isn
Changes by Yury Selivanov :
Added file: http://bugs.python.org/file41887/fast_divmod_6.patch
___
Python tracker
<http://bugs.python.org/issue26289>
___
___
Python-bug
Yury Selivanov added the comment:
> I prefer simpler and more strict rule:
> * Underscores are allowed only between digits in numeric literals.
+1. But in any case we need a PEP for this change.
--
nosy: +yselivanov
___
Python tracker
Changes by Yury Selivanov :
--
resolution: -> fixed
stage: patch review -> resolved
status: open -> closed
___
Python tracker
<http://bugs.python.or
Yury Selivanov added the comment:
Committed. Thank you Serhiy, Mark and Victor for helping with the patch!
--
resolution: -> fixed
stage: patch review -> resolved
status: open -> closed
___
Python tracker
<http://bugs.python.or
New submission from Yury Selivanov:
The attached patch implements a free-list for single-digit longs. We already
have free lists for many fundamental types, such as floats & unicode.
The patch improves performance in micro-benchmarks by 10-20%. It'll also
lessen memory fragmentati
New submission from Yury Selivanov:
This patch implements a fast path for &, |, and ^ bit operations for
single-digit positive longs. We already have fast paths for ~, and pretty much
every other long op.
-m timeit -s "x=21827623" "x&2;x&2;x&2;x&333;x&
Yury Selivanov added the comment:
You're right Serhiy, closing this one.
--
resolution: -> duplicate
superseder: -> Free list for single-digits ints
___
Python tracker
<http://bugs.python.
Yury Selivanov added the comment:
I think that we only need to add free-list for 1-digit longs. Please see my
patch & explanation in issue #26341.
--
nosy: +yselivanov
___
Python tracker
<http://bugs.python.org/iss
Yury Selivanov added the comment:
> Did you test on platform with 30-bit digits?
Yes.
> Could you repeat my microbenchmarks from msg242919?
Sure. With your patches or with mine from issue #26341?
--
___
Python tracker
<http://bugs.p
Changes by Yury Selivanov :
--
stage: patch review -> resolved
status: open -> closed
___
Python tracker
<http://bugs.python.org/issue26341>
___
___
Pyth
Yury Selivanov added the comment:
Best of 5s:
-m timeit -s "r = range(10**4)" -- "for i in r: pass"
orig: 239 usec
my patch: 148
int_free_list_2: 151
int_free_list_multi: 156
-m timeit -s "r = range(10**5)" -- "for i in r: pass"
orig: 2.4 m
Yury Selivanov added the comment:
I also ran benchmarks. For me, django was 1% faster, telco 5% slower, and the
rest were the same. telco is a decimal benchmarks (ints aren't used there),
and django/chameleon are unicode concatenation benchmarks.
I can see improvements in micro bench
Yury Selivanov added the comment:
> Did that comment come from a benchmark suite run? (i.e. actual applications
> and not micro benchmarks?) And, does it show a difference between the single-
> and multi-digit cases?
Yes, more details here: http://bugs.python.org/issue26341#
Yury Selivanov added the comment:
Hi Frederick, the patch looks good. Thanks for reporting this! Could you
please sign the contributor agreement so that I can commit your patch?
--
assignee: -> yselivanov
nosy: +yselivanov
___
Python trac
Yury Selivanov added the comment:
> You should add a tests. especially for edge cases, for negative values for
> example.
There are many binop tests in test_long.py
--
___
Python tracker
<http://bugs.python.org/i
Yury Selivanov added the comment:
> Does this patch have effect with results over 8 bits?
-m timeit -s "x=2**40" "x&2;x&2;x&2;x&333;x&3;x&3;x&;x&4"
with patch: 0.404usec without patch: 0.41
> Does it have effect after applyi
Yury Selivanov added the comment:
> with patch: 0.404usec without patch: 0.41
Sorry, I made a typo: these results should be flipped -- 0.41-0.404 is the
overhead of the fastpath's 'if' check. I'd say it's a pretty small overhead --
we already optimize all o
Yury Selivanov added the comment:
> I haven't looked at the patch, but the intent to make the 2nd
> await raise a RuntimeError seems strange for several reasons:
> - it's inconsistent with the Future/Task interface;
Well, coroutines are much more lower level than Future/Task
Yury Selivanov added the comment:
> After thinking about this some more, I think my problem with asyncio.wait()
> is a bit bigger than the simple fact that coroutine objects cannot be awaited
> multiple times. It seems to me like asyncio.wait() is completely broken for
> coroutin
Changes by Yury Selivanov :
--
resolution: -> fixed
stage: patch review -> resolved
status: open -> closed
___
Python tracker
<http://bugs.python.or
Yury Selivanov added the comment:
TBH I never ever needed to do membership tests on (done, failed) result of
asyncio.wait. If you need to do such tests - just wrap your coroutines into
tasks manually. I honestly don't understand what's the problem and why we need
to change a
Yury Selivanov added the comment:
The patch looks good. It would also be cool if you can add a short code
snippet somewhere:
sock = socket.socket()
with sock:
sock.bind(('127.0.0.1', 8080))
sock.listen()
...
--
nosy: +
Yury Selivanov added the comment:
We now have speed.python.org up, so I'd keep spectral_norm to make sure we
don't accidentally harm the performance of int/floats operations. It also
helped me to discover that PyLong_AsDouble was unnecessary
Yury Selivanov added the comment:
Vladimir, thanks for the patch!
--
resolution: -> fixed
stage: -> resolved
status: open -> closed
type: -> behavior
___
Python tracker
<http://bugs.python
2601 - 2700 of 3098 matches
Mail list logo