[Numpy-discussion] Support for Multiple Interpreters (Subinterpreters) in numpy

2022-08-22 Thread Eric Snow
Hi all,

CPython has supported multiple interpreters (in the same process) for
a long time, but only through the C-API.  I'm working on exposing that
functionality to Python code (see PEP 554), aiming for 3.12.  I expect
that users will find the feature useful (particularly with a
per-interpreter GIL--see PEP 684) and that it will be used a lot more
over the coming years.  This has the potential to impact extension
module projects, especially large ones like numpy, which is why I'm
reaching out to you.

Use of multiple interpreters depends on isolation between them.  When
an extension module is imported in multiple interpreters, it is loaded
separately into a new module object in each.  Extensions often store
module data/state in C globals, which means the the multiple instances
end up sharing data.  This causes problems, more so once we have one
GIL per interpreter.

Over the years we have added machinery to help extensions get the
necessary isolation, moving away from global variables.  This includes
PEPs 384, 3121, and 489.  This has culminated in the guide you can
find in PEP 630.

Note that nothing should change when only a single interpreter is in
use (basically the status quo).  With PEP 684, importing an
incompatible extension outside the main (initial) interpreter will now
be an ImportError.  (Currently the behavior is undefined and too often
results in hard-to-debug failures and crashes.)

Thus extension module maintainers do have the option to *not* support
multiple interpreters.  Unfortunately, that doesn't mean their users
won't pester them about adding support.  We all recognize how that
dynamic can be draining on a project.  The potential burden on
maintainers is a serious factor for these upcoming changes.  numpy is
likely to be affected more than any other project.  That's why I'm
starting this thread.

PEP 684 discusses all of the above.  What I'm after with this thread is:

* to make sure the numpy maintainers are clear on what interpreter
isolation requires of the project
* a clear picture of what changes numpy would need (and how much work
that would be)
* feedback on what the CPython team can do to minimize that work
(incl. adding new C APIs)

I'm fine with having the discussion here, but I will probably create a
new category on discuss.python.org for a variety of similar threads
related to multiple interpreters and supporting them.  Having our
discussion there may lead to more participation from more CPython core
devs than just me.  Do you have any preference for or against any
particular venue?

Thanks!

-eric
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Support for Multiple Interpreters (Subinterpreters) in numpy

2022-08-24 Thread Eric Snow
On Tue, Aug 23, 2022 at 3:47 AM Sebastian Berg
 wrote:
> What is the status of immortality?  None of these seem forbidding on
> first sight, so long that we can get the state everywhere.
> Having immortal object seems convenient, but probably not particularly
> necessary.

The current proposal for immortal objects (PEP 683) will be going to
the steering
council soon.  However, it only applies to the CPython runtime (internally).  We
don't have plans right now for a public API to make an object immortal.  (That
would be a separate proposal.)  If isolating the extension, a la PEP 630, isn't
feasible in the short term, we would certainly be open to discussing
alternatives
(incl. immortal objects).

> One other thing I am not quite sure about right now is GIL grabbing.
> `PyGILState_Ensure()` will continue to work reliably?
> This used to be one of my main worries.  It is also something we can
> fix-up (pass through additional information), but where a fallback
> seems needed.

Compatibility of the GIL state API with subinterpreters has been a long-standing
bug. [1]  That will be fixed.  Otherwise, PyGILState_Ensure() should
work correctly.

-eric


[1] https://github.com/python/cpython/issues/59956
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Support for Multiple Interpreters (Subinterpreters) in numpy

2022-08-24 Thread Eric Snow
On Tue, Aug 23, 2022 at 6:01 AM Petr Viktorin  wrote:
> And if the CPython API is lacking, it would be best to solve that in
> CPython.

+1

In some ways, new CPython APIs would be the most important artifacts of this
discussion.  We want to minimize the effort it takes to support
multiple interpreters.
So we definitely want to know what we could provide that would help.

> Per-interpreter GIL is an *additional* step. I believe it will need its
> own opt-in mechanism. But subinterpreter support is a prerequisite for it.

Yeah, that is an evolving point of discussion in PEP 684.

-eric
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Support for Multiple Interpreters (Subinterpreters) in numpy

2022-08-24 Thread Eric Snow
On Wed, Aug 24, 2022 at 4:42 AM Petr Viktorin  wrote:
> On 23. 08. 22 16:19, Sebastian Berg wrote:
> > Our public C-API is currently exported as a single static struct into
> > the library loading NumPy.  If types depend on the interpreter, it
> > would seem we need to redo the whole mechanism?
>
> Right, sounds like it needs to be a dynamically allocated struct.
> In the interim, one instance of the struct is static: that's the one
> used for anything that doesn't support multiple interpreters yet, and
> also as the module state in one “main” module object. (That would be the
> first module to be loaded, and until everything switches over, it'd get
> an unpaired incref to become “immortal” and leak at exit.)
>
> > Further, many of the functions would need to be adapted.  We might be
> > able to hack that the API looks the same [1].  However, it cannot be
> > ABI compatible, so we would need a whole new API table/export mechnism
> > and some sort of shim to allow compiling against older NumPy versions
> > but using it with all versions (otherwise we need 2+ years of
> > patience).
>
> Having one static “main” module state in the interim would also help here.
>
> > Of course there might be a point in saying that most C-API use is
> > initially not subinterpreter ready, but it does seem like a pretty huge
> > limitation...
>
> A huge limitation, but it might be a good way to break up the work to
> make it more manageable :)

FWIW, in CPython there's a similar issue.  We currently expose static
pointers to all the builtin exceptions in the C-API.  Even worse, we
expose the object *values* for all the static types and the several
singletons.  On top of that, these are all exposed in the limited API
(stable ABI).

As a result, moving to one each per interpreter is messy.  PEP 684
talks about the possible solutions.  The simplest for us is to make
all those objects immortal.  However, in some cases we also have to do
an interpreter-specific lookup internally.  I expect you would have to
do similar where/when compatibility remains essential.

-eric
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com