[Python-Dev] A memory map based data persistence and startup speedup approach
Hi folks, as illustrated in faster-cpython#150 [1], we have implemented a mechanism that supports data persistence of a subset of python date types with mmap, therefore can reduce package import time by caching code object. This could be seen as a more eager pyc format, as they are for the same purpose, but our approach try to avoid [de]serialization. Therefore, we get a speedup in overall python startup by ~15%. Currently, we’ve made it a third-party library and have been working on open-sourcing. Our implementation (whose non-official name is “pycds”) mainly contains two parts: importlib hooks, this implements the mechanism to dump code objects to an archive and a `Finder` that supports loading code object from mapped memory. Dumping and loading (subset of) python types with mmap. In this part, we deal with 1) ASLR by patching `ob_type` fields; 2) hash seed randomization by supporting only basic types who don’t have hash-based layout (i.e. dict is not supported); 3) interned string by re-interning strings while loading mmap archive and so on. After pycds has been installed, complete workflow of our approach includes three parts: Record name of imported packages to heap.lst, `PYCDSMODE=TRACE PYCDSLIST=heap.lst python run.py` Dump memory archive of code objects of imported packages, this step does not involve the python script, `PYCDSMODE=DUMP PYCDSLIST=heap.lst PYCDSARCHIVE=heap.img python` Run other python processes with created archive, `PYCDSMODE=SHARE PYCDSARCHIVE=heap.img python run.py` We could even make use of immortal objects if PEP 683 [2] was accepted, which could give CDS more performance improvements. Currently, any archived object is virtually immortal, we add rc by 1 to who has been copied to the archive to avoid being deallocated. However, without changes to CPython, rc fields of archived objects will still be updated, therefore have extra footprint due to CoW. More background and detailed implementation could be found at [1]. We think it could be an effective way to improve python’s startup performance, and could even do more like sharing large data between python instances. As suggested in python-ideas [3], we posted this here, looking for questions/suggestions to the overall design and workflow, we also welcome code reviews after we get our lawyers happy and can publish the code. Best, Yichen Yan Alibaba Compiler Group [1] “Faster startup -- Share code objects from memory-mapped file”, https://github.com/faster-cpython/ideas/discussions/150 [2] PEP 683: "Immortal Objects, Using a Fixed Refcount" (draft), https://mail.python.org/archives/list/python-dev@python.org/message/TPLEYDCXFQ4AMTW6F6OQFINSIFYBRFCR/ [3] [Python-ideas] "A memory map based data persistence and startup speedup approach", https://mail.python.org/archives/list/python-id...@python.org/thread/UKEBNHXYC3NPX36NS76LQZZYLRA4RVEJ/ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/B77BQQFDSTPY4KA4HMHYXJEV3MOU7W3X/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Move the pythoncapi_compat project under the GitHub Python or PSF organization?
Results of the poll (which was open for 10 days): * Move pythoncapi_compat: 19 votes (90%) * Don't move pythoncapi_compat: 2 votes (10%) Victor On Fri, Feb 11, 2022 at 12:16 AM Victor Stinner wrote: > > I created a poll on Discourse: > https://discuss.python.org/t/move-the-pythoncapi-compat-project-under-the-github-python-or-psf-organization/13651 > > It will be closed automatically in 10 days. > > Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/B4T3FRH2F4MV7LXOTIUZHR2CLYMJSHHQ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Steering Council reply to PEP 670 -- Convert macros to functions in the Python C API
Well, maybe it's a bad example. I just wanted to say that converting macros to static inline functions provide more accurate profiler data and debuggers can more easily show static inline functions names when they are inlined and put breakpoints of them. But you're right, it's not a silver bullet ;-) Victor On Mon, Feb 14, 2022 at 11:29 AM Antoine Pitrou wrote: > > On Wed, 9 Feb 2022 17:49:19 +0100 > Victor Stinner wrote: > > On Wed, Feb 9, 2022 at 1:04 PM Petr Viktorin wrote: > > > > Right now, a large number of macros cast their argument to a type. A > > > > few examples: > > > > > > > > #define PyObject_TypeCheck(ob, type) > > > > PyObject_TypeCheck(_PyObject_CAST(ob), type) > > > > #define PyTuple_GET_ITEM(op, i) (_PyTuple_CAST(op)->ob_item[i]) > > > > #define PyDict_GET_SIZE(mp) (assert(PyDict_Check(mp)),((PyDictObject > > > > *)mp)->ma_used) > > > > > > When I look at the Rationale points, and for the first three of these > > > macros I didn't find any that sound very convincing: > > > - Functions don't have macro pitfalls, but these simple macros don't > > > fall into the pits. > > > - Fully defining the argument types means getting rid of the cast, > > > breaking some code that uses the macro > > > - Debugger support really isn't that useful for these simple macros > > > - There are no new variables > > > > Using a static inline function, profilers like Linux perf can count > > the CPU time spend in static inline functions (on each CPU instruction > > when using annotated assembly code). For example, you can see how much > > time (accumulated time) is spent in Py_INCREF(), to have an idea of > > the cost of reference counting in Python. > > The "time spent in Py_INCREF" doesn't tell you the cost of reference > counting. Modern CPUs execute code out-of-order and rely on many > internal structures (such as branch predictors, reorder buffers...). > The *visible* time elapsed between the instruction pointer entering and > leaving a function doesn't tell you whether Py_INCREF had adverse > effects on the utilization of such internal structures (making > reference counting more costly than it appears to be), or on the > contrary whether the instructions in Py_INCREF were successfully > overlapped with other computations (making reference counting > practically free). > > The only reliable way to evaluate the cost of reference counting is to > compare it against alternatives in realistic scenarios. > > Regards > > Antoine. > > > > > It's not possible when using > > macros. > > > > For debuggers, you're right that Py_INCREF() and PyTuple_GET_ITEM() > > macros are very simple and it's not so hard to guess that the debugger > > is executing their code in the C code or the assembly code. But the > > purpose of PEP 670 is to convert way more complex macros. I wrote a PR > > to convert unicodeobject.h macros, IMO there are one of the worst > > macros in Python C API: > > https://github.com/python/cpython/pull/31221 > > > > I always wanted to convert them, but some core devs were afraid of > > performance regressions. So I wrote a PEP to prove that there is no > > impact on performance. > > > > IMO the new unicodeobject.h code is way more readable. I added two > > "kind" variables which have a well defined scope. In macros, no macro > > is used currently to avoid macro pitfalls: name conflict if there is > > already a "kind" macro where the macro is used. > > > > The conversion to static inline macros also detected a bug with "const > > PyObject*": using PyUnicode_READY() on a const Unicode string is > > wrong, this function modifies the object if it's not ready (WCHAR > > kind). It also detected bugs on "const void *data" used to *write* > > into string characters (PyUnicode_WRITE). > > > > > > > - Macro tricks (parentheses and comma-separated expressions) are needed, > > > but they're present and tested. > > > > The PEP rationale starts with: > > "The use of macros may have unintended adverse effects that are hard > > to avoid, even for experienced C developers. Some issues have been > > known for years, while others have been discovered recently in Python. > > Working around macro pitfalls makes the macro coder harder to read and > > to maintain." > > > > Are you saying that all core devs are well aware of all macro pitfalls > > and always avoid them? I'm well aware of these pitfalls, and I fall > > into their trap multiple times. > > > > The bpo-30459 issue about PyList_SET_ITEM() is a concrete example of > > old bugs that nobody noticed before. > > > > > > > The "massive change to working code" part is important. Such efforts > > > tend to have unexpected issues, which have an unfortunate tendency to > > > surface before release and contribute to release manager burnout. > > > > Aren't you exaggerating a bit? Would you mind to elaborate? Do you > > have examples of issues caused by converting macros to static inline > > functions? > > > > I'm not talking about incompatible C API changes made on purpos
[Python-Dev] Re: PEP 683: "Immortal Objects, Using a Fixed Refcount"
On 2/21/22 21:44, Larry Hastings wrote: While I don't think it's fine to play devil's advocate," Oh! Please ignore the word "don't" in the above sentence. I /do/ think it's fine to play devil's advocate. Sheesh, //arry/ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TABGFU4OFTUDPGF72LY5QMSDTKDUUHHY/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 683: "Immortal Objects, Using a Fixed Refcount"
On 2/21/22 22:06, Chris Angelico wrote: On Mon, 21 Feb 2022 at 16:47, Larry Hastings wrote: While I don't think it's fine to play devil's advocate, given the choice between "this will help a common production use-case" (pre-fork servers) and "this could hurt a hypothetical production use case" (long-running applications that reload modules enough times this could waste a significant amount of memory), I think the former is more important. Can the cost be mitigated by reusing immortal objects? So, for instance, a module-level constant of 60*60*24*365 might be made immortal, meaning it doesn't get disposed of with the module, but if the module gets reloaded, no *additional* object would be created. I'm assuming here that any/all objects unmarshalled with the module can indeed be shared in this way. If that isn't always true, then that would reduce the savings here. It could, but we don't have any general-purpose mechanism for that. We have "interned strings" and "small ints", but we don't have e.g. "interned tuples" or "frequently-used large ints and floats". That said, in this hypothetical scenario wherein someone is constantly reloading modules but we also have immortal objects, maybe someone could write a smart reloader that lets them somehow propagate existing immortal objects to the new module. It wouldn't even have to be that sophisticated, just some sort of hook into the marshal step combined with a per-module persistent cache of unmarshalled constants. //arry/ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UN3BIEHDK2CCL563MSIJ4DXDWOWHNKHR/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 2)
On 19. 02. 22 8:46, Eric Snow wrote: Thanks to all those that provided feedback. I've worked to substantially update the PEP in response. The text is included below. Further feedback is appreciated. Thank you! This version is much clearer. I like the PEP more and more! I've sent a PR with a some typo fixes: https://github.com/python/peps/pull/2348 and I have a few comments: [...] Public Refcount Details [...] As part of this proposal, we must make sure that users can clearly understand on which parts of the refcount behavior they can rely and which are considered implementation details. Specifically, they should use the existing public refcount-related API and the only refcount value with any meaning is 0. All other values are considered "not 0". Should we care about hacks/optimizations that rely on having the only reference (or all references), e.g. mutating a tuple if it has refcount 1? Immortal objects shouldn't break them (the special case simply won't apply), but this wording would make them illegal. AFAIK CPython uses this internally, but I don't know how prevalent/useful it is in third-party code. [...] _Py_IMMORTAL_REFCNT --- We will add two internal constants:: #define _Py_IMMORTAL_BIT (1LL << (8 * sizeof(Py_ssize_t) - 4)) #define _Py_IMMORTAL_REFCNT (_Py_IMMORTAL_BIT + (_Py_IMMORTAL_BIT / 2)) As a nitpick: could you say this in prose? * ``_Py_IMMORTAL_BIT`` has the third top-most bit set. * ``_Py_IMMORTAL_REFCNT`` has the third and fourth top-most bits set. [...] Immortal Global Objects --- All objects that we expect to be shared globally (between interpreters) will be made immortal. That includes the following: * singletons (``None``, ``True``, ``False``, ``Ellipsis``, ``NotImplemented``) * all static types (e.g. ``PyLong_Type``, ``PyExc_Exception``) * all static objects in ``_PyRuntimeState.global_objects`` (e.g. identifiers, small ints) All such objects will be immutable. In the case of the static types, they will be effectively immutable. ``PyTypeObject`` has some mutable start (``tp_dict`` and ``tp_subclasses``), but we can work around this by storing that state on ``PyInterpreterState`` instead of on the respective static type object. Then the ``__dict__``, etc. getter will do a lookup on the current interpreter, if appropriate, instead of using ``tp_dict``. But tp_dict is also public C-API. How will that be handled? Perhaps naively, I thought static types' dicts could be treated as (deeply) immutable, and shared? Perhaps it would be best to leave it out here and say say "The details of sharing ``PyTypeObject`` across interpreters are left to another PEP"? Even so, I'd love to know the plan. (And even if these are internals, changes to them should be mentioned in What's New, for the sake of people who need to maintain old extensions.) Object Cleanup -- In order to clean up all immortal objects during runtime finalization, we must keep track of them. For GC objects ("containers") we'll leverage the GC's permanent generation by pushing all immortalized containers there. During runtime shutdown, the strategy will be to first let the runtime try to do its best effort of deallocating these instances normally. Most of the module deallocation will now be handled by ``pylifecycle.c:finalize_modules()`` which cleans up the remaining modules as best as we can. It will change which modules are available during __del__ but that's already defined as undefined behavior by the docs. Optionally, we could do some topological disorder to guarantee that user modules will be deallocated first before the stdlib modules. Finally, anything leftover (if any) can be found through the permanent generation gc list which we can clear after finalize_modules(). For non-container objects, the tracking approach will vary on a case-by-case basis. In nearly every case, each such object is directly accessible on the runtime state, e.g. in a ``_PyRuntimeState`` or ``PyInterpreterState`` field. We may need to add a tracking mechanism to the runtime state for a small number of objects. Out of curiosity: How does this extra work affect in the performance? Is it part of the 4% slowdown? And from the other thread: On 17. 02. 22 18:23, Eric Snow wrote: > On Thu, Feb 17, 2022 at 3:42 AM Petr Viktorin wrote: Weren't you planning a PEP on subinterpreter GIL as well? Do you want to submit them together? >>> >>> I'd have to think about that. The other PEP I'm writing for >>> per-interpreter GIL doesn't require immortal objects. They just >>> simplify a number of things. That's my motivation for writing this >>> PEP, in fact. :) >> >> Please think about it. >> If you removed the benefits for per-interpreter GIL, the motivation >> section would be reduced to is memory savings for fork/CoW. (And lots of >> performance improvements that are great in theory but sum up to a 4% loss.) > > Sound
[Python-Dev] Re: PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 2)
Petr Viktorin wrote: > Should we care about hacks/optimizations that rely on having the only > reference (or all references), e.g. mutating a tuple if it has refcount > 1? Immortal objects shouldn't break them (the special case simply won't > apply), but this wording would make them illegal. > AFAIK CPython uses this internally, but I don't know how > prevalent/useful it is in third-party code. For what it's worth Cython does this for string concatenation to concatenate in place if possible (this optimization was copied from CPython). It could be disabled relatively easily if it became a problem (it's already CPython only and version checked so it'd just need another upper-bound version check). ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CDNQK5RMXSLLYFNIXRORL7GTKU6B4BVR/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 683: "Immortal Objects, Using a Fixed Refcount"
On Tue, 22 Feb 2022 at 03:00, Larry Hastings wrote: > > > On 2/21/22 22:06, Chris Angelico wrote: > > On Mon, 21 Feb 2022 at 16:47, Larry Hastings wrote: > > While I don't think it's fine to play devil's advocate, given the choice > between "this will help a common production use-case" (pre-fork servers) and > "this could hurt a hypothetical production use case" (long-running > applications that reload modules enough times this could waste a significant > amount of memory), I think the former is more important. > > Can the cost be mitigated by reusing immortal objects? So, for > instance, a module-level constant of 60*60*24*365 might be made > immortal, meaning it doesn't get disposed of with the module, but if > the module gets reloaded, no *additional* object would be created. > > I'm assuming here that any/all objects unmarshalled with the module > can indeed be shared in this way. If that isn't always true, then that > would reduce the savings here. > > > It could, but we don't have any general-purpose mechanism for that. We have > "interned strings" and "small ints", but we don't have e.g. "interned tuples" > or "frequently-used large ints and floats". > > That said, in this hypothetical scenario wherein someone is constantly > reloading modules but we also have immortal objects, maybe someone could > write a smart reloader that lets them somehow propagate existing immortal > objects to the new module. It wouldn't even have to be that sophisticated, > just some sort of hook into the marshal step combined with a per-module > persistent cache of unmarshalled constants. > Fair enough. Since only immortal objects would affect this, it may be possible for the smart reloader to simply be told of all new immortals, and it can then intern things itself. IMO that strengthens the argument that prefork servers are a more significant use-case than reloading, without necessarily compromising the rarer case. Thanks for the explanation. ChrisA ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/3D436RCMHODEIBVDIWIJLKZU2TGHBE4J/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 2)
On 2/21/2022 11:11 AM, Petr Viktorin wrote: On 19. 02. 22 8:46, Eric Snow wrote: As part of this proposal, we must make sure that users can clearly understand on which parts of the refcount behavior they can rely and which are considered implementation details. Specifically, they should use the existing public refcount-related API and the only refcount value with any meaning is 0. All other values are considered "not 0". Should we care about hacks/optimizations that rely on having the only reference (or all references), e.g. mutating a tuple if it has refcount 1? Immortal objects shouldn't break them (the special case simply won't apply), but this wording would make them illegal. AFAIK CPython uses this internally, but I don't know how prevalent/useful it is in third-party code. We could say that the only refcounts with any meaning are 0, 1, and > 1. -- Terry Jan Reedy ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/C3R4FKO7PZETOSI5DTGMAXWVUTQM26AW/ Code of Conduct: http://python.org/psf/codeofconduct/