[Python-Dev] PEP 489: Redesigning extension module loading

2015-03-16 Thread Petr Viktorin

Hello,
On import-sig, I've agreed to continue Nick Coghlan's work on making 
extension modules act more like Python ones, work well with PEP 451 
(ModuleSpec), and encourage proper subinterpreter and reloading support. 
Here is the resulting PEP.


I don't have a patch yet, but I'm working on it.

There's a remaining open issue: providing a tool that can be run in test 
suites to check if a module behaves well with subinterpreters/reloading. 
I believe it's out of scope for this PEP but speak out if you disagree.


Please discuss on import-sig.

===

PEP: 489
Title: Redesigning extension module loading
Version: $Revision$
Last-Modified: $Date$
Author: Petr Viktorin ,
Stefan Behnel ,
Nick Coghlan 
Discussions-To: import-...@python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2013
Python-Version: 3.5
Post-History: 23-Aug-2013, 20-Feb-2015
Resolution:


Abstract


This PEP proposes a redesign of the way in which extension modules interact
with the import machinery. This was last revised for Python 3.0 in PEP
3121, but did not solve all problems at the time. The goal is to solve them
by bringing extension modules closer to the way Python modules behave;
specifically to hook into the ModuleSpec-based loading mechanism
introduced in PEP 451.

Extensions that do not require custom memory layout for their module objects
may be executed in arbitrary pre-defined namespaces, paving the way for
extension modules being runnable with Python's ``-m`` switch.
Other extensions can use custom types for their module implementation.
Module types are no longer restricted to types.ModuleType.

This proposal makes it easy to support properties at the module
level and to safely store arbitrary global state in the module that is
covered by normal garbage collection and supports reloading and
sub-interpreters.
Extension authors are encouraged to take these issues into account
when using the new API.



Motivation
==

Python modules and extension modules are not being set up in the same way.
For Python modules, the module is created and set up first, then the module
code is being executed (PEP 302).
A ModuleSpec object (PEP 451) is used to hold information about the module,
and passed to the relevant hooks.
For extensions, i.e. shared libraries, the module
init function is executed straight away and does both the creation and
initialisation. The initialisation function is not passed ModuleSpec
information about the loaded module, such as the __file__ or fully-qualified
name. This hinders relative imports and resource loading.

This is specifically a problem for Cython generated modules, for which it's
not uncommon that the module init code has the same level of complexity as
that of any 'regular' Python module. Also, the lack of __file__ and __name__
information hinders the compilation of __init__.py modules, i.e. packages,
especially when relative imports are being used at module init time.

The other disadvantage of the discrepancy is that existing Python 
programmers

learning C cannot effectively map concepts between the two domains.
As long as extension modules are fundamentally different from pure 
Python ones
in the way they're initialised, they are harder for people to pick up 
without
relying on something like cffi, SWIG or Cython to handle the actual 
extension

module creation.

Currently, extension modules are also not added to sys.modules until 
they are

fully initialized, which means that a (potentially transitive)
re-import of the module will really try to reimport it and thus run into an
infinite loop when it executes the module init function again.
Without the fully qualified module name, it is not trivial to correctly add
the module to sys.modules either.

Furthermore, the majority of currently existing extension modules has
problems with sub-interpreter support and/or reloading, and, while it is
possible with the current infrastructure to support these
features, it is neither easy nor efficient.
Addressing these issues was the goal of PEP 3121, but many extensions,
including some in the standard library, took the least-effort approach
to porting to Python 3, leaving these issues unresolved.
This PEP keeps the backwards-compatible behavior, which should reduce 
pressure
and give extension authors adequate time to consider these issues when 
porting.



The current process
===

Currently, extension modules export an initialisation function named
"PyInit_modulename", named after the file name of the shared library. This
function is executed by the import machinery and must return either NULL in
the case of an exception, or a fully initialised module object. The
function receives no arguments, so it has no way of knowing about its
import context.

During its execution, the module init function creates a module object
based on a PyModuleDef struct. It then continues to initialise it by adding
attributes to the modu

Re: [Python-Dev] PEP 489: Redesigning extension module loading

2015-03-16 Thread Jim J. Jewett

On 16 March 2015 Petr Viktorin wrote:

> If PyModuleCreate is not defined, PyModuleExec is expected to operate
> on any Python object for which attributes can be added by PyObject_GetAttr*
> and retrieved by PyObject_SetAttr*.

I assume it is the other way around (add with Set and retrieve with Get),
rather than a description of the required form of magic.


> PyObject *PyModule_AddCapsule(
> PyObject *module,
> const char *module_name,
> const char *attribute_name,
> void *pointer,
> PyCapsule_Destructor destructor)

What happens if module_name doesn't match the module's __name__?
Does it become a hidden attribute?  A dotted attribute?  Is the
result undefined?

Later, there is

> void *PyModule_GetCapsulePointer(
> PyObject *module,
> const char *module_name,
> const char *attribute_name)

with the same apparently redundant arguments, but not a
PyModule_SetCapsulePointer.  Are capsule pointers read-only, or can
they be replaced with another call to PyModule_AddCapsule, or by a
simple PyObject_SetAttr?

> Subinterpreters and Interpreter Reloading
...
> No user-defined functions, methods, or instances may leak to different
> interpreters.

By "user-defined" do you mean "defined in python, as opposed to in
the extension itself"?

If so, what is the recommendation for modules that do want to support,
say, callbacks?  A dual-layer mapping that uses the interpreter as the
first key?  Naming it _module and only using it indirectly through
module.py, which is not shared across interpreters?  Not using this
API at all?

> To achieve this, all module-level state should be kept in either the module
> dict, or in the module object.

I don't see how that is related to leakage.

> A simple rule of thumb is: Do not define any static data, except 
> built-in types
> with no mutable or user-settable class attributes.

What about singleton instances?  Should they be per-interpreter?
What about constants, such as PI?
Where should configuration variables (e.g., MAX_SEARCH_DEPTH) be
kept?


What happens if this no-leakage rule is violated?  Does the module
not load, or does it just maybe lead to a crash down the road?

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 489: Redesigning extension module loading

2015-03-16 Thread Petr Viktorin
On Mon, Mar 16, 2015 at 4:42 PM, Jim J. Jewett  wrote:
>
> On 16 March 2015 Petr Viktorin wrote:
>
>> If PyModuleCreate is not defined, PyModuleExec is expected to operate
>> on any Python object for which attributes can be added by PyObject_GetAttr*
>> and retrieved by PyObject_SetAttr*.
>
> I assume it is the other way around (add with Set and retrieve with Get),
> rather than a description of the required form of magic.

Right you are, I mixed that up.

>> PyObject *PyModule_AddCapsule(
>> PyObject *module,
>> const char *module_name,
>> const char *attribute_name,
>> void *pointer,
>> PyCapsule_Destructor destructor)
>
> What happens if module_name doesn't match the module's __name__?
> Does it become a hidden attribute?  A dotted attribute?  Is the
> result undefined?

The module_name is used to name the capsule, following the convention
from PyCapsule_Import. The "module.__name__" is not used or checked.
The function would do this:
capsule_name = module_name + '.' + attribute_name
capsule = PyCapsule_New(pointer, capsule_name, destructor)
PyModule_AddObject(module, attribute_name, capsule)
just with error handling, and suitable C code for the "+".
I will add the pseudocode to the PEP.

> Later, there is
>
>> void *PyModule_GetCapsulePointer(
>> PyObject *module,
>> const char *module_name,
>> const char *attribute_name)
>
> with the same apparently redundant arguments,

Here the behavior would be:
capsule_name = module_name + '.' + attribute_name
capsule = PyObject_GetAttr(module, attribute_name)
return PyCapsule_GetPointer(capsule, capsule_name)

> but not a
> PyModule_SetCapsulePointer.  Are capsule pointers read-only, or can
> they be replaced with another call to PyModule_AddCapsule, or by a
> simple PyObject_SetAttr?

You can replace the capsule using any of those two, or set the pointer
using PyCapsule_SetPointer, or (most likely) change the data the
pointer points to.
The added functions are just simple helpers for common operations,
meant to encourage keeping per-module state.

>> Subinterpreters and Interpreter Reloading
> ...
>> No user-defined functions, methods, or instances may leak to different
>> interpreters.
>
> By "user-defined" do you mean "defined in python, as opposed to in
> the extension itself"?

Yes.

> If so, what is the recommendation for modules that do want to support,
> say, callbacks?  A dual-layer mapping that uses the interpreter as the
> first key?  Naming it _module and only using it indirectly through
> module.py, which is not shared across interpreters?  Not using this
> API at all?

There is a separate module object, with its own dict, for each
subinterpreter (as when creating the module with "PyModuleDef.m_size
== 0" today).
Callbacks should be stored on the appropriate module instance.
Does that answer your question? I'm not sure how you meant "callbacks".

>> To achieve this, all module-level state should be kept in either the module
>> dict, or in the module object.
>
> I don't see how that is related to leakage.
>
>> A simple rule of thumb is: Do not define any static data, except
>> built-in types
>> with no mutable or user-settable class attributes.
>
> What about singleton instances?  Should they be per-interpreter?

Yes, definitely.

> What about constants, such as PI?

In PyModuleExec, create the constant using PyFloat_FromDouble, and add
it using PyModule_FromObject. That will do the right thing.
(Float constants can be shared, since they cannot refer to
user-defined code. But this PEP shields you from needing to know this
for every type.)

> Where should configuration variables (e.g., MAX_SEARCH_DEPTH) be
> kept?

On the module object.

> What happens if this no-leakage rule is violated?  Does the module
> not load, or does it just maybe lead to a crash down the road?

It may, as today, lead to unexpected behavior down the road. This is
explained here:
https://docs.python.org/3/c-api/init.html#sub-interpreter-support
Unfortunately, there's no good way to detect such leakage. This PEP
adds the tools, documentation, and guidelines to make it easy to do
the right thing, but won't prevent you from shooting yourself in the
foot in C code.


Thank you for sharing your concerns! I will keep them in mind when
writing the docs for this.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 448 review

2015-03-16 Thread Neil Girdhar
Hi everyone,

I was wondering what is left with the PEP 448 (
http://bugs.python.org/issue2292) code review?  Big thanks to Benjamin,
Ethan, and Serhiy for reviewing some (all?) of the code.  What is the next
step of this process?

Thanks,

Neil


On Sun, Mar 8, 2015 at 4:38 PM, Neil Girdhar  wrote:

> Anyone have time to do a code review?
>
> http://bugs.python.org/issue2292
>
>
> On Mon, Mar 2, 2015 at 4:54 PM, Neil Girdhar 
> wrote:
>
>> It's from five days ago.  I asked Joshua to take a look at something, but
>> I guess he is busy.
>>
>> Best,
>>
>> Neil
>>
>> —
>>
>> The latest file there is from Feb 26, while your message that the patch
>> was ready for review is from today -- so is the
>> patch from five days ago the most recent?
>>
>> --
>> ~Ethan~
>>
>> On Mon, Mar 2, 2015 at 3:18 PM, Neil Girdhar 
>> wrote:
>>
>>> http://bugs.python.org/issue2292
>>>
>>> On Mon, Mar 2, 2015 at 3:17 PM, Victor Stinner >> > wrote:
>>>
 Where is the patch?

 Victor

 Le lundi 2 mars 2015, Neil Girdhar  a écrit :

 Hi everyone,
>
> The patch is ready for review now, and I should have time this week to
> make changes and respond to comments.
>
> Best,
>
> Neil
>
> On Wed, Feb 25, 2015 at 2:42 PM, Guido van Rossum 
> wrote:
>
>> I'm back, I've re-read the PEP, and I've re-read the long thread with
>> "(no subject)".
>>
>> I think Georg Brandl nailed it:
>>
>> """
>>
>>
>>
>>
>>
>>
>>
>>
>> *I like the "sequence and dict flattening" part of the PEP, mostly
>> because itis consistent and should be easy to understand, but the
>> comprehension syntaxenhancements seem to be bad for readability and
>> "comprehending" what the codedoes.The call syntax part is a mixed bag on
>> the one hand it is nice to be consistent with the extended possibilities 
>> in
>> literals (flattening), but on the other hand there would be small but
>> annoying inconsistencies anyways (e.g. the duplicate kwarg case above).*
>> """
>>
>> Greg Ewing followed up explaining that the inconsistency between dict
>> flattening and call syntax is inherent in the pre-existing different 
>> rules
>> for dicts vs. keyword args: {'a':1, 'a':2} results in {'a':2}, while 
>> f(a=1,
>> a=2) is an error. (This form is a SyntaxError; the dynamic case f(a=1,
>> **{'a': 1}) is a TypeError.)
>>
>> For me, allowing f(*a, *b) and f(**d, **e) and all the other
>> combinations for function calls proposed by the PEP is an easy +1 -- 
>> it's a
>> straightforward extension of the existing pattern, and anybody who knows
>> what f(x, *a) does will understand f(x, *a, y, *b). Guessing what f(**d,
>> **e) means shouldn't be hard either. Understanding the edge case for
>> duplicate keys with f(**d, **e) is a little harder, but the error 
>> messages
>> are pretty clear, and it is not a new edge case.
>>
>> The sequence and dict flattening syntax proposals are also clean and
>> logical -- we already have *-unpacking on the receiving side, so allowing
>> *x in tuple expressions reads pretty naturally (and the similarity with 
>> *a
>> in argument lists certainly helps). From here, having [a, *x, b, *y] is
>> also natural, and then the extension to other displays is natural: {a, 
>> *x,
>> b, *y} and {a:1, **d, b:2, **e}. This, too, gets a +1 from me.
>>
>> So that leaves comprehensions. IIRC, during the development of the
>> patch we realized that f(*x for x in xs) is sufficiently ambiguous that 
>> we
>> decided to disallow it -- note that f(x for x in xs) is already somewhat 
>> of
>> a special case because an argument can only be a "bare" generator
>> expression if it is the only argument. The same reasoning doesn't apply 
>> (in
>> that form) to list, set and dict comprehensions -- while f(x for x in xs)
>> is identical in meaning to f((x for x in xs)), [x for x in xs] is NOT the
>> same as [(x for x in xs)] (that's a list of one element, and the element 
>> is
>> a generator expression).
>>
>> The basic premise of this part of the proposal is that if you have a
>> few iterables, the new proposal (without comprehensions) lets you create 
>> a
>> list or generator expression that iterates over all of them, essentially
>> flattening them:
>>
>> >>> xs = [1, 2, 3]
>> >>> ys = ['abc', 'def']
>> >>> zs = [99]
>> >>> [*xs, *ys, *zs]
>> [1, 2, 3, 'abc', 'def', 99]
>> >>>
>>
>> But now suppose you have a list of iterables:
>>
>> >>> xss = [[1, 2, 3], ['abc', 'def'], [99]]
>> >>> [*xss[0], *xss[1], *xss[2]]
>> [1, 2, 3, 'abc', 'def', 99]
>> >>>
>>
>> Wouldn't it be nice if you could write the latter using a
>> comprehension?
>>
>>>