[Python-Dev] PEP 489: Redesigning extension module loading
Hello, On import-sig, I've agreed to continue Nick Coghlan's work on making extension modules act more like Python ones, work well with PEP 451 (ModuleSpec), and encourage proper subinterpreter and reloading support. Here is the resulting PEP. I don't have a patch yet, but I'm working on it. There's a remaining open issue: providing a tool that can be run in test suites to check if a module behaves well with subinterpreters/reloading. I believe it's out of scope for this PEP but speak out if you disagree. Please discuss on import-sig. === PEP: 489 Title: Redesigning extension module loading Version: $Revision$ Last-Modified: $Date$ Author: Petr Viktorin , Stefan Behnel , Nick Coghlan Discussions-To: import-...@python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2013 Python-Version: 3.5 Post-History: 23-Aug-2013, 20-Feb-2015 Resolution: Abstract This PEP proposes a redesign of the way in which extension modules interact with the import machinery. This was last revised for Python 3.0 in PEP 3121, but did not solve all problems at the time. The goal is to solve them by bringing extension modules closer to the way Python modules behave; specifically to hook into the ModuleSpec-based loading mechanism introduced in PEP 451. Extensions that do not require custom memory layout for their module objects may be executed in arbitrary pre-defined namespaces, paving the way for extension modules being runnable with Python's ``-m`` switch. Other extensions can use custom types for their module implementation. Module types are no longer restricted to types.ModuleType. This proposal makes it easy to support properties at the module level and to safely store arbitrary global state in the module that is covered by normal garbage collection and supports reloading and sub-interpreters. Extension authors are encouraged to take these issues into account when using the new API. Motivation == Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed (PEP 302). A ModuleSpec object (PEP 451) is used to hold information about the module, and passed to the relevant hooks. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. The initialisation function is not passed ModuleSpec information about the loaded module, such as the __file__ or fully-qualified name. This hinders relative imports and resource loading. This is specifically a problem for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of __file__ and __name__ information hinders the compilation of __init__.py modules, i.e. packages, especially when relative imports are being used at module init time. The other disadvantage of the discrepancy is that existing Python programmers learning C cannot effectively map concepts between the two domains. As long as extension modules are fundamentally different from pure Python ones in the way they're initialised, they are harder for people to pick up without relying on something like cffi, SWIG or Cython to handle the actual extension module creation. Currently, extension modules are also not added to sys.modules until they are fully initialized, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. Without the fully qualified module name, it is not trivial to correctly add the module to sys.modules either. Furthermore, the majority of currently existing extension modules has problems with sub-interpreter support and/or reloading, and, while it is possible with the current infrastructure to support these features, it is neither easy nor efficient. Addressing these issues was the goal of PEP 3121, but many extensions, including some in the standard library, took the least-effort approach to porting to Python 3, leaving these issues unresolved. This PEP keeps the backwards-compatible behavior, which should reduce pressure and give extension authors adequate time to consider these issues when porting. The current process === Currently, extension modules export an initialisation function named "PyInit_modulename", named after the file name of the shared library. This function is executed by the import machinery and must return either NULL in the case of an exception, or a fully initialised module object. The function receives no arguments, so it has no way of knowing about its import context. During its execution, the module init function creates a module object based on a PyModuleDef struct. It then continues to initialise it by adding attributes to the modu
Re: [Python-Dev] PEP 489: Redesigning extension module loading
On 16 March 2015 Petr Viktorin wrote: > If PyModuleCreate is not defined, PyModuleExec is expected to operate > on any Python object for which attributes can be added by PyObject_GetAttr* > and retrieved by PyObject_SetAttr*. I assume it is the other way around (add with Set and retrieve with Get), rather than a description of the required form of magic. > PyObject *PyModule_AddCapsule( > PyObject *module, > const char *module_name, > const char *attribute_name, > void *pointer, > PyCapsule_Destructor destructor) What happens if module_name doesn't match the module's __name__? Does it become a hidden attribute? A dotted attribute? Is the result undefined? Later, there is > void *PyModule_GetCapsulePointer( > PyObject *module, > const char *module_name, > const char *attribute_name) with the same apparently redundant arguments, but not a PyModule_SetCapsulePointer. Are capsule pointers read-only, or can they be replaced with another call to PyModule_AddCapsule, or by a simple PyObject_SetAttr? > Subinterpreters and Interpreter Reloading ... > No user-defined functions, methods, or instances may leak to different > interpreters. By "user-defined" do you mean "defined in python, as opposed to in the extension itself"? If so, what is the recommendation for modules that do want to support, say, callbacks? A dual-layer mapping that uses the interpreter as the first key? Naming it _module and only using it indirectly through module.py, which is not shared across interpreters? Not using this API at all? > To achieve this, all module-level state should be kept in either the module > dict, or in the module object. I don't see how that is related to leakage. > A simple rule of thumb is: Do not define any static data, except > built-in types > with no mutable or user-settable class attributes. What about singleton instances? Should they be per-interpreter? What about constants, such as PI? Where should configuration variables (e.g., MAX_SEARCH_DEPTH) be kept? What happens if this no-leakage rule is violated? Does the module not load, or does it just maybe lead to a crash down the road? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 489: Redesigning extension module loading
On Mon, Mar 16, 2015 at 4:42 PM, Jim J. Jewett wrote: > > On 16 March 2015 Petr Viktorin wrote: > >> If PyModuleCreate is not defined, PyModuleExec is expected to operate >> on any Python object for which attributes can be added by PyObject_GetAttr* >> and retrieved by PyObject_SetAttr*. > > I assume it is the other way around (add with Set and retrieve with Get), > rather than a description of the required form of magic. Right you are, I mixed that up. >> PyObject *PyModule_AddCapsule( >> PyObject *module, >> const char *module_name, >> const char *attribute_name, >> void *pointer, >> PyCapsule_Destructor destructor) > > What happens if module_name doesn't match the module's __name__? > Does it become a hidden attribute? A dotted attribute? Is the > result undefined? The module_name is used to name the capsule, following the convention from PyCapsule_Import. The "module.__name__" is not used or checked. The function would do this: capsule_name = module_name + '.' + attribute_name capsule = PyCapsule_New(pointer, capsule_name, destructor) PyModule_AddObject(module, attribute_name, capsule) just with error handling, and suitable C code for the "+". I will add the pseudocode to the PEP. > Later, there is > >> void *PyModule_GetCapsulePointer( >> PyObject *module, >> const char *module_name, >> const char *attribute_name) > > with the same apparently redundant arguments, Here the behavior would be: capsule_name = module_name + '.' + attribute_name capsule = PyObject_GetAttr(module, attribute_name) return PyCapsule_GetPointer(capsule, capsule_name) > but not a > PyModule_SetCapsulePointer. Are capsule pointers read-only, or can > they be replaced with another call to PyModule_AddCapsule, or by a > simple PyObject_SetAttr? You can replace the capsule using any of those two, or set the pointer using PyCapsule_SetPointer, or (most likely) change the data the pointer points to. The added functions are just simple helpers for common operations, meant to encourage keeping per-module state. >> Subinterpreters and Interpreter Reloading > ... >> No user-defined functions, methods, or instances may leak to different >> interpreters. > > By "user-defined" do you mean "defined in python, as opposed to in > the extension itself"? Yes. > If so, what is the recommendation for modules that do want to support, > say, callbacks? A dual-layer mapping that uses the interpreter as the > first key? Naming it _module and only using it indirectly through > module.py, which is not shared across interpreters? Not using this > API at all? There is a separate module object, with its own dict, for each subinterpreter (as when creating the module with "PyModuleDef.m_size == 0" today). Callbacks should be stored on the appropriate module instance. Does that answer your question? I'm not sure how you meant "callbacks". >> To achieve this, all module-level state should be kept in either the module >> dict, or in the module object. > > I don't see how that is related to leakage. > >> A simple rule of thumb is: Do not define any static data, except >> built-in types >> with no mutable or user-settable class attributes. > > What about singleton instances? Should they be per-interpreter? Yes, definitely. > What about constants, such as PI? In PyModuleExec, create the constant using PyFloat_FromDouble, and add it using PyModule_FromObject. That will do the right thing. (Float constants can be shared, since they cannot refer to user-defined code. But this PEP shields you from needing to know this for every type.) > Where should configuration variables (e.g., MAX_SEARCH_DEPTH) be > kept? On the module object. > What happens if this no-leakage rule is violated? Does the module > not load, or does it just maybe lead to a crash down the road? It may, as today, lead to unexpected behavior down the road. This is explained here: https://docs.python.org/3/c-api/init.html#sub-interpreter-support Unfortunately, there's no good way to detect such leakage. This PEP adds the tools, documentation, and guidelines to make it easy to do the right thing, but won't prevent you from shooting yourself in the foot in C code. Thank you for sharing your concerns! I will keep them in mind when writing the docs for this. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 448 review
Hi everyone, I was wondering what is left with the PEP 448 ( http://bugs.python.org/issue2292) code review? Big thanks to Benjamin, Ethan, and Serhiy for reviewing some (all?) of the code. What is the next step of this process? Thanks, Neil On Sun, Mar 8, 2015 at 4:38 PM, Neil Girdhar wrote: > Anyone have time to do a code review? > > http://bugs.python.org/issue2292 > > > On Mon, Mar 2, 2015 at 4:54 PM, Neil Girdhar > wrote: > >> It's from five days ago. I asked Joshua to take a look at something, but >> I guess he is busy. >> >> Best, >> >> Neil >> >> — >> >> The latest file there is from Feb 26, while your message that the patch >> was ready for review is from today -- so is the >> patch from five days ago the most recent? >> >> -- >> ~Ethan~ >> >> On Mon, Mar 2, 2015 at 3:18 PM, Neil Girdhar >> wrote: >> >>> http://bugs.python.org/issue2292 >>> >>> On Mon, Mar 2, 2015 at 3:17 PM, Victor Stinner >> > wrote: >>> Where is the patch? Victor Le lundi 2 mars 2015, Neil Girdhar a écrit : Hi everyone, > > The patch is ready for review now, and I should have time this week to > make changes and respond to comments. > > Best, > > Neil > > On Wed, Feb 25, 2015 at 2:42 PM, Guido van Rossum > wrote: > >> I'm back, I've re-read the PEP, and I've re-read the long thread with >> "(no subject)". >> >> I think Georg Brandl nailed it: >> >> """ >> >> >> >> >> >> >> >> >> *I like the "sequence and dict flattening" part of the PEP, mostly >> because itis consistent and should be easy to understand, but the >> comprehension syntaxenhancements seem to be bad for readability and >> "comprehending" what the codedoes.The call syntax part is a mixed bag on >> the one hand it is nice to be consistent with the extended possibilities >> in >> literals (flattening), but on the other hand there would be small but >> annoying inconsistencies anyways (e.g. the duplicate kwarg case above).* >> """ >> >> Greg Ewing followed up explaining that the inconsistency between dict >> flattening and call syntax is inherent in the pre-existing different >> rules >> for dicts vs. keyword args: {'a':1, 'a':2} results in {'a':2}, while >> f(a=1, >> a=2) is an error. (This form is a SyntaxError; the dynamic case f(a=1, >> **{'a': 1}) is a TypeError.) >> >> For me, allowing f(*a, *b) and f(**d, **e) and all the other >> combinations for function calls proposed by the PEP is an easy +1 -- >> it's a >> straightforward extension of the existing pattern, and anybody who knows >> what f(x, *a) does will understand f(x, *a, y, *b). Guessing what f(**d, >> **e) means shouldn't be hard either. Understanding the edge case for >> duplicate keys with f(**d, **e) is a little harder, but the error >> messages >> are pretty clear, and it is not a new edge case. >> >> The sequence and dict flattening syntax proposals are also clean and >> logical -- we already have *-unpacking on the receiving side, so allowing >> *x in tuple expressions reads pretty naturally (and the similarity with >> *a >> in argument lists certainly helps). From here, having [a, *x, b, *y] is >> also natural, and then the extension to other displays is natural: {a, >> *x, >> b, *y} and {a:1, **d, b:2, **e}. This, too, gets a +1 from me. >> >> So that leaves comprehensions. IIRC, during the development of the >> patch we realized that f(*x for x in xs) is sufficiently ambiguous that >> we >> decided to disallow it -- note that f(x for x in xs) is already somewhat >> of >> a special case because an argument can only be a "bare" generator >> expression if it is the only argument. The same reasoning doesn't apply >> (in >> that form) to list, set and dict comprehensions -- while f(x for x in xs) >> is identical in meaning to f((x for x in xs)), [x for x in xs] is NOT the >> same as [(x for x in xs)] (that's a list of one element, and the element >> is >> a generator expression). >> >> The basic premise of this part of the proposal is that if you have a >> few iterables, the new proposal (without comprehensions) lets you create >> a >> list or generator expression that iterates over all of them, essentially >> flattening them: >> >> >>> xs = [1, 2, 3] >> >>> ys = ['abc', 'def'] >> >>> zs = [99] >> >>> [*xs, *ys, *zs] >> [1, 2, 3, 'abc', 'def', 99] >> >>> >> >> But now suppose you have a list of iterables: >> >> >>> xss = [[1, 2, 3], ['abc', 'def'], [99]] >> >>> [*xss[0], *xss[1], *xss[2]] >> [1, 2, 3, 'abc', 'def', 99] >> >>> >> >> Wouldn't it be nice if you could write the latter using a >> comprehension? >> >>>