[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-10-20 Thread Ken Jin
Ken Jin added the comment: For future reference, the following opcodes specialized via the PEP 659 specializing adaptive interpreter: - LOAD_GLOBAL Issue44338 - LOAD_ATTR Issue44337 - STORE_ATTR Issue44826 (2% faster pyperformance) - BINARY_SUBSCR Issue26280 (2% faster pyperformance) - LOAD_M

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-10-20 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: Closing this as we have been implementing this idea already with the adaptative interpreter -- ___ Python tracker ___ ___

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-05-09 Thread Guido van Rossum
Guido van Rossum added the comment: Moving the needle on the pyperformance benchmarks is really hard! -- ___ Python tracker ___ ___

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-05-08 Thread Ken Jin
Ken Jin added the comment: > IMO you need to implement LOAD_METHOD support for all kinds of calls, > including the ones that use kwargs, to see any improvement. Recently I played around with that idea and extended LOAD/CALL_METHOD to keyword arguments (so CALL_FUNCTION_KW is replaced). I the

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-01-29 Thread Guido van Rossum
Guido van Rossum added the comment: Thanks! The loop overhead is presumably dramatic in that example. I think I measured a somewhat bigger speedup using timeit, which IIRC subtracts the cost of an empty loop. But you're right that 5% on a micro-benchmark is not very exciting. I wonder if th

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-01-29 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: I am attaching to this issue a patch with PR 23503 restricted to only classes without dict. I can measure a speedup but is super small (5%) in a function that keeps calling a method for a builtin: def foo(obj): for _ in range(1): res =

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-01-29 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: > What am I missing? Why is the hash of the name needed? To speed up the call to get the method from the dictionary using _PyDict_GetItem_KnownHash. The reason I was not caching the method is that as you mention there could still be an overriding valu

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-01-29 Thread Guido van Rossum
Guido van Rossum added the comment: I had a simpler idea for an inline cache for LOAD_METHOD than GH-23503. The essential part goes like this (sorry for the hybrid C/Python): if : if type == lm->type and type->tp_version_tag == lm->tp_version_tag: meth = lm->meth SET_TOP(m

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-01-27 Thread Pablo Galindo Salgado
Change by Pablo Galindo Salgado : -- Removed message: https://bugs.python.org/msg385782 ___ Python tracker ___ ___ Python-bugs-list

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-01-27 Thread Bis Matrimony
Bis Matrimony added the comment: I think it has something to do with the ENABLE_VIRTUAL_TERMINAL_PROCESSING flag which came to python.org release: https://www.bismatrimony.com/ Implemented FR #72768 (Add ENABLE_VIRTUAL_TERMINAL_PROCESSING flag f -- nosy: +bismatrimony ___

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-01-09 Thread Johan Dahlin
Change by Johan Dahlin : -- nosy: +Johan Dahlin ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-01-06 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: Agreed. In that regard the most standard thing that we have is the pyperformance suite, which are almost all macro benchmarks. Is also what is displayed in speed.python.org -- ___ Python tracker

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-01-06 Thread Guido van Rossum
Guido van Rossum added the comment: Undoubtedly -- but impact should be measured on what typical users would see, not on narrow benchmarks. -- ___ Python tracker ___

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-01-06 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: I subscribe that but is also a matter of what are the optimizations with higher ratio impact/(cost+complexity). Caching the function pointers on binary operations or (better IMHO) comparisons is likely a good candidate --

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-01-05 Thread Yury Selivanov
Yury Selivanov added the comment: > Do we have good intuition or data about which operations need speeding up > most? Everybody always assumes it's BINARY_ADD, but much Python code isn't > actually numeric, and binary operations aren't all that common. IMO, we shouldn't focus too much on opt

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-01-05 Thread Guido van Rossum
Guido van Rossum added the comment: Do we have good intuition or data about which operations need speeding up most? Everybody always assumes it's BINARY_ADD, but much Python code isn't actually numeric, and binary operations aren't all that common. (For example, IIRC at my previous employer

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-01-05 Thread Yury Selivanov
Yury Selivanov added the comment: > The gist seems to be to have extra opcodes that only work for certain > situations (e.g. INT_BINARY_ADD). In a hot function we can rewrite opcodes > with their specialized counterpart. The new opcode contains a guard that > rewrites itself back if the guar

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-01-05 Thread Guido van Rossum
Guido van Rossum added the comment: I've skimmed several papers by Stefan Brunthaler about something called Quickening that I first found via the Pyston blog, which linked to an ancient bpo issue with a patch created by Stefan (https://bugs.python.org/issue14757). The gist seems to be to hav

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2021-01-04 Thread Guido van Rossum
Change by Guido van Rossum : -- nosy: +Guido.van.Rossum ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-12-17 Thread Yury Selivanov
Yury Selivanov added the comment: > I will try to do some prototyping around that to see how much can we gain in > that route. In any case, adding LOAD_METHOD support for all kinds of calls > should be an improvement by itself even without caching, no? Exactly. As one argument for generaliz

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-12-17 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: > IMO you need to implement LOAD_METHOD support for all kinds of calls, > including the ones that use kwargs, to see any improvement. I will try to do some prototyping around that to see how much can we gain in that route. In any case, adding LOAD_MET

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-12-17 Thread Yury Selivanov
Yury Selivanov added the comment: > I think I am closing the PR as it seems that the gains are not good enough > (and there is quite a lot of noise by comparing the benchmarks together). IMO you need to implement LOAD_METHOD support for all kinds of calls, including the ones that use kwargs,

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-12-16 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: I think I am closing the PR as it seems that the gains are not good enough (and there is quite a lot of noise by comparing the benchmarks together). -- ___ Python tracker

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-12-16 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: OK, I have repeated the benchmarks after rebasing and this is what I get: venv ❯ python -m pyperf compare_to json/2020-12-16_11-20-master-8203c73f3bb1.json.gz json/2020-12-16_11-22-load_method-21b1566125b3.json.gz -G --min-speed=1 Slower (13): - regex_

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-12-16 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: Oh, I am quite confused about what's going on with pidigits and regex_v8. I will try to run the benchmarks again. Did you compare the current master against the PR? If that's the case we should rebase the PR it first to make sure we are comparing it c

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-12-15 Thread Inada Naoki
Inada Naoki added the comment: $ ./python -m pyperf compare_to master.json load_method.json -G --min-speed=1 Slower (15): - unpack_sequence: 63.2 ns +- 0.8 ns -> 68.5 ns +- 14.8 ns: 1.08x slower (+8%) - pathlib: 23.1 ms +- 0.3 ms -> 24.4 ms +- 0.4 ms: 1.05x slower (+5%) - scimark_fft: 418 ms +-

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-12-15 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: > pidigits and regex_v8 are LOAD_ATTR heavy benchmarks? The PR is for LOAD_METHOD infrastructure, not for LOAD_ATTR (There was an incorrect title in the PR that I corrected but the contents are correct). > I will run benchmarks in my machine to confir

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-12-14 Thread Inada Naoki
Inada Naoki added the comment: pidigits and regex_v8 are LOAD_ATTR heavy benchmarks? I will run benchmarks in my machine to confirm the results. -- ___ Python tracker ___

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-11-24 Thread STINNER Victor
Change by STINNER Victor : -- nosy: -vstinner ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.p

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-11-24 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: I may not know the what is the average airspeed velocity of a laden swallow, but I know the average speed of adding a LOAD_METHOD opcode cache as in PR 23503 (measured with PGO + LTO + CPU isolation): +-+---

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-11-24 Thread Pablo Galindo Salgado
Change by Pablo Galindo Salgado : -- pull_requests: +22392 pull_request: https://github.com/python/cpython/pull/23503 ___ Python tracker ___ ___

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-11-15 Thread Batuhan Taskaya
Change by Batuhan Taskaya : -- nosy: +BTaskaya ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.p

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-11-01 Thread STINNER Victor
STINNER Victor added the comment: > because you'd cache a pointer to the specific `+` operator implementation You should have a look at "INCA: Inline Caching meets Quickening in Python 3.3": https://bugs.python.org/issue14757 Stefan Brunthaler wrote a paper on his work: "Inline Caching Meets

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-10-22 Thread Pablo Galindo Salgado
Change by Pablo Galindo Salgado : -- keywords: +patch pull_requests: +21838 stage: -> patch review pull_request: https://github.com/python/cpython/pull/22907 ___ Python tracker __

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-10-22 Thread Yury Selivanov
Change by Yury Selivanov : -- Removed message: https://bugs.python.org/msg379330 ___ Python tracker ___ ___ Python-bugs-list mailing

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-10-22 Thread Yury Selivanov
Yury Selivanov added the comment: > This idea can be implemented without opcode cache. I will try it. I'd actually encourage trying to use the opcode cache because this way the optimization will be more generic. E.g. `decimal + decimal` would also be specialized via the cache, because you'd

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-10-22 Thread Yury Selivanov
Yury Selivanov added the comment: > This idea can be implemented without opcode cache. I will try it. I'd actually encourage trying to use the opcode cache because this way the optimization will be more generic. E.g. `decimal + decimal` would also be specialized via the cache, because you'd

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-10-22 Thread Barry A. Warsaw
Change by Barry A. Warsaw : -- nosy: +barry ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-10-22 Thread Dong-hee Na
Change by Dong-hee Na : -- nosy: +corona10 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pytho

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-10-22 Thread Inada Naoki
Inada Naoki added the comment: One more idea: BINARY_ADD_INT. Operand is int immediate. This idea can be implemented without opcode cache. I will try it. -- ___ Python tracker ___

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-10-22 Thread Inada Naoki
Inada Naoki added the comment: FWIW, php7 is about 5x faster than Python on spectral norm benchmark. https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/php-python3.html There two major reasons: * PHP uses scalar type for float and int * PHP uses type-specialized bytecode (PHP

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-10-21 Thread Yury Selivanov
Yury Selivanov added the comment: > Imagine that we have a secondary copy of the bytecode in the cache inside the > code object and we mutate that instead. The key difference with the current > cache infrastructure is that we don't accumulate all the optimizations on the > same opcode, which

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-10-21 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: > - Rewriting code objects in place is wrong, IMO: you always need to have a > way to deoptimize the entire thing, so you need to keep the original one. It > might be that you have well defined and static types for the first 1 > invocations and so

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-10-21 Thread Yury Selivanov
Yury Selivanov added the comment: Few thoughts in no particular order: - I'd suggest implementing the cache for 2-3 more opcodes on top of the existing infrastructure to get more experience and then refactoring it to make it more generic. - Generalizing LOAD_METHOD to work for methods with

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-10-21 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: Also, many of these ideas are not new, and many of them are inspired or taken from Yury's email (https://mail.python.org/pipermail/python-dev/2016-January/142945.html) but I wanted to add that I think that with some coordination between us we can ach

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-10-21 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: To clarify what I mean with: > - We could also do the same for operations like "some_container[]" if the > container is some builtin. We can substitute/specialize the opcode for > someone that directly uses built-in operations instead of the generic

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

2020-10-21 Thread Pablo Galindo Salgado
New submission from Pablo Galindo Salgado : After https://bugs.python.org/issue42093 and https://bugs.python.org/issue26219 is being clear that we can leverage some cache for different information in the evaluation loop to speed up CPython. This observation is also based on the fact that alth