[Python-Dev] A memory map based data persistence and startup speedup approach

2022-02-21 Thread Yichen Yan via Python-Dev

Hi folks, as illustrated in faster-cpython#150 [1], we have implemented a 
mechanism that supports data persistence of a subset of python date types with 
mmap, therefore can reduce package import time by caching code object. This 
could be seen as a more eager pyc format, as they are for the same purpose, but 
our approach try to avoid [de]serialization. Therefore, we get a speedup in 
overall python startup by ~15%.

Currently, we’ve made it a third-party library and have been working on 
open-sourcing.

Our implementation (whose non-official name is “pycds”) mainly contains two 
parts:
importlib hooks, this implements the mechanism to dump code objects to an 
archive and a `Finder` that supports loading code object from mapped memory.
Dumping and loading (subset of) python types with mmap. In this part, we deal 
with 1) ASLR by patching `ob_type` fields; 2) hash seed randomization by 
supporting only basic types who don’t have hash-based layout (i.e. dict is not 
supported); 3) interned string by re-interning strings while loading mmap 
archive and so on.

After pycds has been installed, complete workflow of our approach includes 
three parts:
Record name of imported packages to heap.lst, `PYCDSMODE=TRACE 
PYCDSLIST=heap.lst python run.py`
Dump memory archive of code objects of imported packages, this step does not 
involve the python script, `PYCDSMODE=DUMP PYCDSLIST=heap.lst 
PYCDSARCHIVE=heap.img python`
Run other python processes with created archive, `PYCDSMODE=SHARE 
PYCDSARCHIVE=heap.img python run.py`

We could even make use of immortal objects if PEP 683 [2] was accepted, which 
could give CDS more performance improvements. Currently, any archived object is 
virtually immortal, we add rc by 1 to who has been copied to the archive to 
avoid being deallocated. However, without changes to CPython, rc fields of 
archived objects will still be updated, therefore have extra footprint due to 
CoW.

More background and detailed implementation could be found at [1].
We think it could be an effective way to improve python’s startup performance, 
and could even do more like sharing large data between python instances.
As suggested in python-ideas [3], we posted this here, looking for 
questions/suggestions to the overall design and workflow, we also welcome code 
reviews after we get our lawyers happy and can publish the code.

Best,
Yichen Yan
Alibaba Compiler Group

[1] “Faster startup -- Share code objects from memory-mapped file”, 
https://github.com/faster-cpython/ideas/discussions/150
[2] PEP 683: "Immortal Objects, Using a Fixed Refcount" (draft), 
https://mail.python.org/archives/list/python-dev@python.org/message/TPLEYDCXFQ4AMTW6F6OQFINSIFYBRFCR/
[3] [Python-ideas] "A memory map based data persistence and startup speedup 
approach", 
https://mail.python.org/archives/list/python-id...@python.org/thread/UKEBNHXYC3NPX36NS76LQZZYLRA4RVEJ/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/B77BQQFDSTPY4KA4HMHYXJEV3MOU7W3X/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Move the pythoncapi_compat project under the GitHub Python or PSF organization?

2022-02-21 Thread Victor Stinner
Results of the poll (which was open for 10 days):

* Move pythoncapi_compat: 19 votes (90%)
* Don't move pythoncapi_compat: 2 votes (10%)

Victor

On Fri, Feb 11, 2022 at 12:16 AM Victor Stinner  wrote:
>
> I created a poll on Discourse:
> https://discuss.python.org/t/move-the-pythoncapi-compat-project-under-the-github-python-or-psf-organization/13651
>
> It will be closed automatically in 10 days.
>
> Victor

-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/B4T3FRH2F4MV7LXOTIUZHR2CLYMJSHHQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Steering Council reply to PEP 670 -- Convert macros to functions in the Python C API

2022-02-21 Thread Victor Stinner
Well, maybe it's a bad example. I just wanted to say that converting
macros to static inline functions provide more accurate profiler data
and debuggers can more easily show static inline functions names when
they are inlined and put breakpoints of them. But you're right, it's
not a silver bullet ;-)

Victor

On Mon, Feb 14, 2022 at 11:29 AM Antoine Pitrou  wrote:
>
> On Wed, 9 Feb 2022 17:49:19 +0100
> Victor Stinner  wrote:
> > On Wed, Feb 9, 2022 at 1:04 PM Petr Viktorin  wrote:
> > > > Right now, a large number of macros cast their argument to a type. A
> > > > few examples:
> > > >
> > > > #define PyObject_TypeCheck(ob, type)
> > > > PyObject_TypeCheck(_PyObject_CAST(ob), type)
> > > > #define PyTuple_GET_ITEM(op, i) (_PyTuple_CAST(op)->ob_item[i])
> > > > #define PyDict_GET_SIZE(mp)  (assert(PyDict_Check(mp)),((PyDictObject
> > > > *)mp)->ma_used)
> > >
> > > When I look at the Rationale points, and for the first three of these
> > > macros I didn't find any that sound very convincing:
> > > - Functions don't have macro pitfalls, but these simple macros don't
> > > fall into the pits.
> > > - Fully defining the argument types means getting rid of the cast,
> > > breaking some code that uses the macro
> > > - Debugger support really isn't that useful for these simple macros
> > > - There are no new variables
> >
> > Using a static inline function, profilers like Linux perf can count
> > the CPU time spend in static inline functions (on each CPU instruction
> > when using annotated assembly code). For example, you can see how much
> > time (accumulated time) is spent in Py_INCREF(), to have an idea of
> > the cost of reference counting in Python.
>
> The "time spent in Py_INCREF" doesn't tell you the cost of reference
> counting. Modern CPUs execute code out-of-order and rely on many
> internal structures (such as branch predictors, reorder buffers...).
> The *visible* time elapsed between the instruction pointer entering and
> leaving a function doesn't tell you whether Py_INCREF had adverse
> effects on the utilization of such internal structures (making
> reference counting more costly than it appears to be), or on the
> contrary whether the instructions in Py_INCREF were successfully
> overlapped with other computations (making reference counting
> practically free).
>
> The only reliable way to evaluate the cost of reference counting is to
> compare it against alternatives in realistic scenarios.
>
> Regards
>
> Antoine.
>
>
>
> > It's not possible when using
> > macros.
> >
> > For debuggers, you're right that Py_INCREF() and PyTuple_GET_ITEM()
> > macros are very simple and it's not so hard to guess that the debugger
> > is executing their code in the C code or the assembly code. But the
> > purpose of PEP 670 is to convert way more complex macros. I wrote a PR
> > to convert unicodeobject.h macros, IMO there are one of the worst
> > macros in Python C API:
> > https://github.com/python/cpython/pull/31221
> >
> > I always wanted to convert them, but some core devs were afraid of
> > performance regressions. So I wrote a PEP to prove that there is no
> > impact on performance.
> >
> > IMO the new unicodeobject.h code is way more readable. I added two
> > "kind" variables which have a well defined scope. In macros, no macro
> > is used currently to avoid macro pitfalls: name conflict if there is
> > already a "kind" macro where the macro is used.
> >
> > The conversion to static inline macros also detected a bug with "const
> > PyObject*": using PyUnicode_READY() on a const Unicode string is
> > wrong, this function modifies the object if it's not ready (WCHAR
> > kind). It also detected bugs on "const void *data" used to *write*
> > into string characters (PyUnicode_WRITE).
> >
> >
> > > - Macro tricks (parentheses and comma-separated expressions) are needed,
> > > but they're present and tested.
> >
> > The PEP rationale starts with:
> > "The use of macros may have unintended adverse effects that are hard
> > to avoid, even for experienced C developers. Some issues have been
> > known for years, while others have been discovered recently in Python.
> > Working around macro pitfalls makes the macro coder harder to read and
> > to maintain."
> >
> > Are you saying that all core devs are well aware of all macro pitfalls
> > and always avoid them? I'm well aware of these pitfalls, and I fall
> > into their trap multiple times.
> >
> > The bpo-30459 issue about PyList_SET_ITEM() is a concrete example of
> > old bugs that nobody noticed before.
> >
> >
> > > The "massive change to working code" part is important. Such efforts
> > > tend to have unexpected issues, which have an unfortunate tendency to
> > > surface before release and contribute to release manager burnout.
> >
> > Aren't you exaggerating a bit? Would you mind to elaborate? Do you
> > have examples of issues caused by converting macros to static inline
> > functions?
> >
> > I'm not talking about incompatible C API changes made on purpos

[Python-Dev] Re: PEP 683: "Immortal Objects, Using a Fixed Refcount"

2022-02-21 Thread Larry Hastings


On 2/21/22 21:44, Larry Hastings wrote:


While I don't think it's fine to play devil's advocate,"



Oh!  Please ignore the word "don't" in the above sentence.  I /do/ think 
it's fine to play devil's advocate.


Sheesh,


//arry/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TABGFU4OFTUDPGF72LY5QMSDTKDUUHHY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 683: "Immortal Objects, Using a Fixed Refcount"

2022-02-21 Thread Larry Hastings


On 2/21/22 22:06, Chris Angelico wrote:

On Mon, 21 Feb 2022 at 16:47, Larry Hastings  wrote:

While I don't think it's fine to play devil's advocate, given the choice between "this will 
help a common production use-case" (pre-fork servers) and "this could hurt a hypothetical 
production use case" (long-running applications that reload modules enough times this could 
waste a significant amount of memory), I think the former is more important.


Can the cost be mitigated by reusing immortal objects? So, for
instance, a module-level constant of 60*60*24*365 might be made
immortal, meaning it doesn't get disposed of with the module, but if
the module gets reloaded, no *additional* object would be created.

I'm assuming here that any/all objects unmarshalled with the module
can indeed be shared in this way. If that isn't always true, then that
would reduce the savings here.



It could, but we don't have any general-purpose mechanism for that.  We 
have "interned strings" and "small ints", but we don't have e.g. 
"interned tuples" or "frequently-used large ints and floats".


That said, in this hypothetical scenario wherein someone is constantly 
reloading modules but we also have immortal objects, maybe someone could 
write a smart reloader that lets them somehow propagate existing 
immortal objects to the new module. It wouldn't even have to be that 
sophisticated, just some sort of hook into the marshal step combined 
with a per-module persistent cache of unmarshalled constants.



//arry/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UN3BIEHDK2CCL563MSIJ4DXDWOWHNKHR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 2)

2022-02-21 Thread Petr Viktorin

On 19. 02. 22 8:46, Eric Snow wrote:

Thanks to all those that provided feedback.  I've worked to
substantially update the PEP in response.  The text is included below.
Further feedback is appreciated.


Thank you! This version is much clearer. I like the PEP more and more!

I've sent a PR with a some typo fixes: 
https://github.com/python/peps/pull/2348

and I have a few comments:


[...]

Public Refcount Details

[...]

As part of this proposal, we must make sure that users can clearly
understand on which parts of the refcount behavior they can rely and
which are considered implementation details.  Specifically, they should
use the existing public refcount-related API and the only refcount value
with any meaning is 0.  All other values are considered "not 0".


Should we care about hacks/optimizations that rely on having the only 
reference (or all references), e.g. mutating a tuple if it has refcount 
1? Immortal objects shouldn't break them (the special case simply won't 
apply), but this wording would make them illegal.
AFAIK CPython uses this internally, but I don't know how 
prevalent/useful it is in third-party code.



[...]


_Py_IMMORTAL_REFCNT
---

We will add two internal constants::

 #define _Py_IMMORTAL_BIT (1LL << (8 * sizeof(Py_ssize_t) - 4))
 #define _Py_IMMORTAL_REFCNT (_Py_IMMORTAL_BIT + (_Py_IMMORTAL_BIT / 2))


As a nitpick: could you say this in prose?

* ``_Py_IMMORTAL_BIT`` has the third top-most bit set.
* ``_Py_IMMORTAL_REFCNT`` has the third and fourth top-most bits set.


[...]


Immortal Global Objects
---

All objects that we expect to be shared globally (between interpreters)
will be made immortal.  That includes the following:

* singletons (``None``, ``True``, ``False``, ``Ellipsis``, ``NotImplemented``)
* all static types (e.g. ``PyLong_Type``, ``PyExc_Exception``)
* all static objects in ``_PyRuntimeState.global_objects`` (e.g. identifiers,
   small ints)

All such objects will be immutable.  In the case of the static types,
they will be effectively immutable.  ``PyTypeObject`` has some mutable
start (``tp_dict`` and ``tp_subclasses``), but we can work around this
by storing that state on ``PyInterpreterState`` instead of on the
respective static type object.  Then the ``__dict__``, etc. getter
will do a lookup on the current interpreter, if appropriate, instead
of using ``tp_dict``.


But tp_dict is also public C-API. How will that be handled?
Perhaps naively, I thought static types' dicts could be treated as 
(deeply) immutable, and shared?


Perhaps it would be best to leave it out here and say say "The details 
of sharing ``PyTypeObject`` across interpreters are left to another PEP"?
Even so, I'd love to know the plan. (And even if these are internals, 
changes to them should be mentioned in What's New, for the sake of 
people who need to maintain old extensions.)





Object Cleanup
--

In order to clean up all immortal objects during runtime finalization,
we must keep track of them.

For GC objects ("containers") we'll leverage the GC's permanent
generation by pushing all immortalized containers there.  During
runtime shutdown, the strategy will be to first let the runtime try
to do its best effort of deallocating these instances normally.  Most
of the module deallocation will now be handled by
``pylifecycle.c:finalize_modules()`` which cleans up the remaining
modules as best as we can.  It will change which modules are available
during __del__ but that's already defined as undefined behavior by the
docs.  Optionally, we could do some topological disorder to guarantee
that user modules will be deallocated first before the stdlib modules.
Finally, anything leftover (if any) can be found through the permanent
generation gc list which we can clear after finalize_modules().

For non-container objects, the tracking approach will vary on a
case-by-case basis.  In nearly every case, each such object is directly
accessible on the runtime state, e.g. in a ``_PyRuntimeState`` or
``PyInterpreterState`` field.  We may need to add a tracking mechanism
to the runtime state for a small number of objects.


Out of curiosity: How does this extra work affect in the performance? Is 
it part of the 4% slowdown?




And from the other thread:

On 17. 02. 22 18:23, Eric Snow wrote:
> On Thu, Feb 17, 2022 at 3:42 AM Petr Viktorin  wrote:
 Weren't you planning a PEP on subinterpreter GIL as well? Do you 
want to

 submit them together?
>>>
>>> I'd have to think about that.  The other PEP I'm writing for
>>> per-interpreter GIL doesn't require immortal objects.  They just
>>> simplify a number of things.  That's my motivation for writing this
>>> PEP, in fact. :)
>>
>> Please think about it.
>> If you removed the benefits for per-interpreter GIL, the motivation
>> section would be reduced to is memory savings for fork/CoW. (And lots of
>> performance improvements that are great in theory but sum up to a 4% 
loss.)

>
> Sound

[Python-Dev] Re: PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 2)

2022-02-21 Thread dw-git
Petr Viktorin wrote:
> Should we care about hacks/optimizations that rely on having the only 
> reference (or all references), e.g. mutating a tuple if it has refcount 
> 1? Immortal objects shouldn't break them (the special case simply won't 
> apply), but this wording would make them illegal.
> AFAIK CPython uses this internally, but I don't know how 
> prevalent/useful it is in third-party code.

For what it's worth Cython does this for string concatenation to concatenate in 
place if possible (this optimization was copied from CPython). It could be 
disabled relatively easily if it became a problem (it's already CPython only 
and version checked so it'd just need another upper-bound version check).
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CDNQK5RMXSLLYFNIXRORL7GTKU6B4BVR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 683: "Immortal Objects, Using a Fixed Refcount"

2022-02-21 Thread Chris Angelico
On Tue, 22 Feb 2022 at 03:00, Larry Hastings  wrote:
>
>
> On 2/21/22 22:06, Chris Angelico wrote:
>
> On Mon, 21 Feb 2022 at 16:47, Larry Hastings  wrote:
>
> While I don't think it's fine to play devil's advocate, given the choice 
> between "this will help a common production use-case" (pre-fork servers) and 
> "this could hurt a hypothetical production use case" (long-running 
> applications that reload modules enough times this could waste a significant 
> amount of memory), I think the former is more important.
>
> Can the cost be mitigated by reusing immortal objects? So, for
> instance, a module-level constant of 60*60*24*365 might be made
> immortal, meaning it doesn't get disposed of with the module, but if
> the module gets reloaded, no *additional* object would be created.
>
> I'm assuming here that any/all objects unmarshalled with the module
> can indeed be shared in this way. If that isn't always true, then that
> would reduce the savings here.
>
>
> It could, but we don't have any general-purpose mechanism for that.  We have 
> "interned strings" and "small ints", but we don't have e.g. "interned tuples" 
> or "frequently-used large ints and floats".
>
> That said, in this hypothetical scenario wherein someone is constantly 
> reloading modules but we also have immortal objects, maybe someone could 
> write a smart reloader that lets them somehow propagate existing immortal 
> objects to the new module.  It wouldn't even have to be that sophisticated, 
> just some sort of hook into the marshal step combined with a per-module 
> persistent cache of unmarshalled constants.
>

Fair enough. Since only immortal objects would affect this, it may be
possible for the smart reloader to simply be told of all new
immortals, and it can then intern things itself.

IMO that strengthens the argument that prefork servers are a more
significant use-case than reloading, without necessarily compromising
the rarer case.

Thanks for the explanation.

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3D436RCMHODEIBVDIWIJLKZU2TGHBE4J/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 2)

2022-02-21 Thread Terry Reedy

On 2/21/2022 11:11 AM, Petr Viktorin wrote:

On 19. 02. 22 8:46, Eric Snow wrote:



As part of this proposal, we must make sure that users can clearly
understand on which parts of the refcount behavior they can rely and
which are considered implementation details.  Specifically, they should
use the existing public refcount-related API and the only refcount value
with any meaning is 0.  All other values are considered "not 0".


Should we care about hacks/optimizations that rely on having the only 
reference (or all references), e.g. mutating a tuple if it has refcount 
1? Immortal objects shouldn't break them (the special case simply won't 
apply), but this wording would make them illegal.
AFAIK CPython uses this internally, but I don't know how 
prevalent/useful it is in third-party code.


We could say that the only refcounts with any meaning are 0, 1, and > 1.


--
Terry Jan Reedy
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/C3R4FKO7PZETOSI5DTGMAXWVUTQM26AW/
Code of Conduct: http://python.org/psf/codeofconduct/