[Python-Dev] Re: PEP 651 -- Robust Overflow Handling

2021-01-19 Thread Sebastian Berg
On Tue, 2021-01-19 at 13:31 +, Mark Shannon wrote:
> Hi everyone,
> 
> It's time for yet another PEP :)
> 
> Fortunately, this one is a small one that doesn't change much.
> It's aim is to make the VM more robust.
> 
> Abstract
> 
> 
> This PEP proposes that machine stack overflow is treated differently 
> from runaway recursion. This would allow programs to set the maximum 
> recursion depth to fit their needs and provide additional safety
> guarantees.
> 
> The following program will run safely to completion:
> 
>  sys.setrecursionlimit(1_000_000)
> 
>  def f(n):
>  if n:
>  f(n-1)
> 
>  f(500_000)
> 
> The following program will raise a StackOverflow, without causing a
> VM 
> crash:
> 
>  sys.setrecursionlimit(1_000_000)
> 
>  class X:
>  def __add__(self, other):
>  return self + other
> 
>  X() + 1
> 


This is appreciated! I recently spend quite a bit of time trying to
solve a StackOverflow like this in NumPy (and was unable to fully
resolve it).  Of course the code triggering it was bordering on
malicious, but it would be nice if it was clear how to not segfault.

Just some questions/notes:

* We currently mostly use `Py_EnterRecursiveCall()` in situations where
we need to safe-guard against "almost python" recursions. For example
an attribute lookup that returns `self`, or a list containing itself.
In those cases the python recursion limit seems a bit nicer (lower and
easier to understand).
I am not sure it actually matters much, but my question is: Are we sure
we want to replace all (or even many) C recursion checks?

* Assuming we swap `Py_EnterRecursiveCall()` logic, I am wondering if a
new `StackOverflow` exception name is useful. It may create two names
for almost identical Python code:  If you unpack a list containing
itself compared to a mapping implementing `__getitem__` in Python you
would get different exceptions.

* `Py_CheckStackDepthWithHeadRoom()` is usually not necessary, because
`Py_CheckStackDepth()` would leave plenty of headroom for typical
clean-up?
Can we assume that DECREF's (i.e. list, tuple), will never check the
depth, so head-room is usually not necessary?  This is all good, but I
am not immediately sure when `Py_CheckStackDepthWithHeadRoom()` would
be necessary (There are probably many cases where it clearly is, but is
it ever for fairly simple code?).
What happens if the maximum stack depth is reached while a
`StackOverflow` exception is already set?  Will the current "watermark"
mechanism remain, or could there be a simple rule that an uncleared
`StackOverflow` exception ensures some additional head-room?

Cheers,

Sebastian



> ---
> 
> The full PEP can be found here:
> https://www.python.org/dev/peps/pep-0651
> 
> As always, comments are welcome.
> 
> Cheers,
> Mark.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/ZY32N43YZJM3WYXSVD7OCGVNDGPR6DUM/
> Code of Conduct: http://python.org/psf/codeofconduct/
> 



signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/N456CVKWZ3E3VKPOE2DZMFLVSMOK5BSF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 651 -- Robust Overflow Handling

2021-01-19 Thread Sebastian Berg
On Tue, 2021-01-19 at 16:22 +, Mark Shannon wrote:
> 
> 
> On 19/01/2021 3:43 pm, Sebastian Berg wrote:
> > On Tue, 2021-01-19 at 13:31 +, Mark Shannon wrote:
> > > Hi everyone,
> > > 
> > > It's time for yet another PEP :)
> > > 
> > > Fortunately, this one is a small one that doesn't change much.
> > > It's aim is to make the VM more robust.
> > > 
> > > Abstract
> > > 
> > > 
> > > This PEP proposes that machine stack overflow is treated
> > > differently
> > > from runaway recursion. This would allow programs to set the
> > > maximum
> > > recursion depth to fit their needs and provide additional safety
> > > guarantees.
> > > 
> > > The following program will run safely to completion:
> > > 
> > >   sys.setrecursionlimit(1_000_000)
> > > 
> > >   def f(n):
> > >   if n:
> > >   f(n-1)
> > > 
> > >   f(500_000)
> > > 
> > > The following program will raise a StackOverflow, without causing
> > > a
> > > VM
> > > crash:
> > > 
> > >   sys.setrecursionlimit(1_000_000)
> > > 
> > >   class X:
> > >   def __add__(self, other):
> > >   return self + other
> > > 
> > >   X() + 1
> > > 
> > 
> > 
> > This is appreciated! I recently spend quite a bit of time trying to
> > solve a StackOverflow like this in NumPy (and was unable to fully
> > resolve it).  Of course the code triggering it was bordering on
> > malicious, but it would be nice if it was clear how to not
> > segfault.
> > 
> > Just some questions/notes:
> > 
> > * We currently mostly use `Py_EnterRecursiveCall()` in situations
> > where
> > we need to safe-guard against "almost python" recursions. For
> > example
> > an attribute lookup that returns `self`, or a list containing
> > itself.
> > In those cases the python recursion limit seems a bit nicer (lower
> > and
> > easier to understand).
> > I am not sure it actually matters much, but my question is: Are we
> > sure
> > we want to replace all (or even many) C recursion checks?
> 
> Would it help if you had the ability to increase and decrease the 
> recursion depth, as `Py_EnterRecursiveCall()` currently does?
> 
> I'm reluctant to expose it, as it might encourage C code authors to
> use 
> it, rather than `Py_CheckStackDepth()` resulting in crashes.
> 
> To be robust, C code must make a call to `Py_CheckStackDepth()`.
> To check the recursion limit as well would be extra overhead.
> 
> > 
> > * Assuming we swap `Py_EnterRecursiveCall()` logic, I am wondering
> > if a
> > new `StackOverflow` exception name is useful. It may create two
> > names
> > for almost identical Python code:  If you unpack a list containing
> > itself compared to a mapping implementing `__getitem__` in Python
> > you
> > would get different exceptions.
> 
> True, but they are different. One is a soft limit that can be
> increased, 
> the other is a hard limit that cannot (at least not easily).


Right. I think my confusion completely resolves around your proposed
change of `Py_EnterRecursiveCall()`.

A simple example (written in C):

def depth(obj, current=0):
Py_EnterRecursiveCall()

if isinstance(obj, sequence):  # has the sequence slots
return depth(obj[0], current+1)
return current

will never hit the "depth" limit for a self containing list or even
sequence (as long as `GetItem` can use the C-level slot).

But `obj[0]` could nevertheless return a non-trivial object (one with
`__del__`, definitely a container with unrelated objects that could use
deleting).

As the author of the function, I have no knowledge over how much stack
space cleaning those up may require?
And say someone adds a check for `Py_CheckStackDepth()` inside a
dealloc, then this might have to cause a fatal error?

Maybe it should even be a fatal error by default in some cases?

Also, if the code is slow, the previous recursion may guard against
hanging (arguably, if that is the case I probably add an interrupt
check, I admit).


Long story short, I will trust you guys on it of course, but I am not
yet convinced that replacing the check will actually do any good (as
opposed to adding and/or providing the additional check) or even be a
service to users (since I assume that the vast majority do not crank up
the recursion limit to huge values).

Cheers,

Sebastian



> 
> > 
> > * `Py_CheckStackDepthWi

[Python-Dev] Re: The repr of a sentinel

2021-05-20 Thread Sebastian Berg
On Thu, 2021-05-20 at 19:00 +0100, Paul Moore wrote:
> On Thu, 20 May 2021 at 18:13, Luciano Ramalho 
> wrote:
> > 
> > I'd like to learn about use cases where `...` (a.k.a. `Ellipsis`)
> > is
> > not a good sentinel. It's a pickable singleton testable with `is`,
> > readily available, and extremely unlikely to appear in a data
> > stream.
> > Its repr is "Ellipsis".
> 
> Personally, I'm quite tempted by the idea of using ellipsis. It just
> sort of feels reasonable (and in the context `def f(x,
> optional_arg=...)` it even looks pretty natural).
> 
> But it nevertheless feels like a bit of an abuse - the original point
> of ellipsis was for indexing, and in particular complex slices like
> a[1:20:2, ..., 3:5]. That usage is common in numpy, as I understand
> it, even if it's relatively rare in everyday Python. So while I like
> the idea in principle, I'm mildly worried that it's not "the right
> thing to do".
> 
> I can't put my ambivalence about the idea any more precisely than
> this, unfortunately.


In NumPy we use a "missing argument" sentinel currently.  Mainly for
things roughly like:


def mean(arr, *, axis=np._NoValue):
if not hasattr(arr, "mean"):
# Not a duck that defines `mean`, coerce to ndarray:
arr = np.asarray(arr)

if axis is np._NoValue:
return arr.mean()
return arr.mean(axis=axis)


This allows us to add new keyword arguments without breaking backward
compatibility.  I do not remember if we had particularly important
reasons for not wanting to drop the default `None`, or it was just
erring on the safe side.


In any case, I tend to agree that `Ellipsis` should be considered
"user-facing" value.  And in the above code, we do not expect anyone to
ever call `np.mean(something, axis=np._NoValue)` – its not even
accessible – but if the value was `...` then I would expect users to be
encouraged to write `np.mean(arr, axis=...)` in normal code.

More importantly, I can think of a reasonable "meaning" for `axis=...`!
In NumPy `axis=None` (default) returns a scalar, `axis=...` could
return a 0-D array.
This would borrow meanings that `Ellipsis` carries in indexing. [1]

Cheers,

Sebastian



[1] In such a mental model, it would mean the same as
`axis=range(arr.ndim)`.  To be clear, NumPy doesn't do this, its just a
plausible meaning if it has to continue to juggle scalars and 0-D
arrays and wants to be "clearer" about it.



> 
> Paul
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/ZGMZRGHFXQQZZLKBBZKXXAO65TRB6VYX/
> Code of Conduct: http://python.org/psf/codeofconduct/
> 


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/M2X6VMEDFNZ4GA3CLXKXMA56SNCEPX4O/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 674: Disallow using macros as l-value

2021-12-07 Thread Sebastian Berg
On Tue, 2021-12-07 at 13:54 -0300, Joao S. O. Bueno wrote:
> Sorry for stepping in - but I am seeing too many arguments in favour
> of the rules because "they are the rules", and just Victor arguing
> with
> what is met in the "real world".
> 
> But if this update can be done by a simple search/replace on the C
> source
> of projects,
> I can only perceive two scenarios this will affect: well maintained
> projects,
>  for which it is fixable in minutes, and  stale packages, no longer
> released
> that "happen to work" when someone downloads and builds for new
> Python versions. In these cases, the build will fail. If the person
> trying
> the build can't fix it, but can take the error to a proper, or high
> visibility,
> forum, someone will be able to come to the fix, leading to renewed
> visibility for the otherwise stale package.
> 

The problem are really less-maintained projects that may miss it or
take very long to react.  And that may be very frustrating for their
users (who will not have a work-around beyond patching the project!).

So the question is whether these are so few users, that it is OK to
break them (even many bug fixes will break someone, after all).
The other consideration may be that documenting the change for a 1-2
years may achieve almost nothing except frustrating Victor ;).


One thing we once did in NumPy (for a runtime problem), was to
intentionally break everyone at pre-release/dev time to point out what
code needed fixing.  Then flip the switch back at release time as to
not break production.
After a long enough time we enabled it for release mode.

Not saying that it was nice, but it was the only alternative would have
been to never fix it.

A similar switch could be worthwhile if it helps Victor with
experimenting on the dev-branch or reach a useful amount of projects.
Of course, I am not sure it would do either...

Cheers,

Sebastian



> 
> 
> On Tue, 7 Dec 2021 at 12:40, Antoine Pitrou 
> wrote:
> 
> > On Tue, 7 Dec 2021 15:39:25 +0100
> > Petr Viktorin  wrote:
> > 
> > > On 30. 11. 21 19:52, Victor Stinner wrote:
> > > > On Tue, Nov 30, 2021 at 7:34 PM Guido van Rossum
> > > > 
> > wrote:
> > > > > How about *not* asking for an exception and just following
> > > > > the PEP
> > 387 process? Is that really too burdensome?
> > > > 
> > > > The Backward Compatibility section gives an explanation:
> > > > 
> > > > "This change does not follow the PEP 387 deprecation process.
> > > > There is
> > no
> > > > known way to emit a deprecation warning when a macro is used as
> > > > a
> > > > l-value, but not when it's used differently (ex: r-value)."
> > > > 
> > > > Apart of compiler warnings, one way to implement the PEP 387
> > > > "deprecation process" would be to announce the change in two
> > > > "What's
> > > > New in Python 3.X?" documents. But I expect that it will not be
> > > > efficient. Extract of the Rejected Idea section:
> > > > 
> > > > "(...) only few developers read the documentation, and only a
> > > > minority
> > > > is tracking changes of the Python C API documentation."
> > > > 
> > > > In my experience, even if a DeprecationWarning is emitted at
> > > > runtime,
> > > > developers miss or ignore it. See the recent "[Python-Dev] Do
> > > > we need
> > > > to remove everything that's deprecated?" discussion and
> > > > complains
> > > > about recent removal of deprecated features, like:
> > > > 
> > > > * collections.MutableMapping was deprecated for 7 Python
> > > > versions
> > > > (deprecated in 3.3) -- removed in 3.9 alpha, reverted in 3.9
> > > > beta,
> > > > removed again in 3.11
> > > > * the "U" open() flag was deprecated for 10 Python versions
> > > > (deprecated in 3.0) -- removed in 3.9 alpha, reverted in 3.9
> > > > beta,
> > > > removed again in 3.11
> > > > 
> > > > For this specific PEP changes, I consider that the number of
> > > > impacted
> > > > projects is low enough to skip a deprecation process: only 4
> > > > projects
> > > > are known to be impacted. One year ago (Python 3.10), 16 were
> > > > impacted, and 12 have already been updated in the meanwhile.
> > > > I'm
> > > > talking especially about Py_TYPE() and Py_SIZE() changes which,
> > > > again,
> > > > has been approved by the Steering Council.
> > > 
> > > 
> > > The current version of the PEP looks nice, but I don't think the
> > > rationale is strong enough.
> > > I believe we should:
> > > - Mark the l-value usage as deprecated in the docs,
> > > - And then do nothing until we find an actual case where this
> > > issue
> > > blocks development (or is actively dangerous for users).
> > 
> > Is there a way to emit a compilation warning when those macros are
> > used
> > as l-values? Even if only enabled on some compilers.
> > 
> > Regards
> > 
> > Antoine.
> > 
> > 
> > ___
> > Python-Dev mailing list -- python-dev@python.org
> > To unsubscribe send an email to python-dev-le...@python.org
> > https://mail.python.org/mailman3/lists/p

[Python-Dev] Re: Python 3.10 vs 3.8 performance degradation

2021-12-19 Thread Sebastian Berg
On Sun, 2021-12-19 at 18:48 +, Tigran Aivazian wrote:
> To eliminate the possibility of being affected by the different
> versions of numpy I have just now upgraded numpy in Python 3.8
> environment to the latest version, so both 3.8 and 3.10 and using
> numpy 1.21.4 and still the timing is exactly the same.

NumPy is very unlikely to have gotten slower.  Please please time your
script before jumping to conclusion.  For example 2/3 of the time of
that pendulum plotter is spend in plotting, and most of that seems to
be spend in text rendering.
(Yeah, there is a a little bit of time in NumPy's `arr.take()` also,
but I doubt that has anything to do with this.)

Now, I don't know what does the text rendering, but maybe that got
slower.

Cheers,

Sebastian


> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/THPN4OWM3A335LDO7HVIQSIDFFVO5URZ/
> Code of Conduct: http://python.org/psf/codeofconduct/
> 



signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6W7T6NY2LOOKH5IEIZIKNEOXSO47TZWR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-04 Thread Sebastian Berg
On Thu, 2022-02-03 at 15:32 +0100, Victor Stinner wrote:
> By the way, Argument Clinic now produces way faster calling
> conventions than hand-written METH_VARARGS with PyArg_ParseTuple().
> It
> would be make this tool available to 3rd party projects.
> 
> Either extract it and put it on PyPI, but it means that Python will
> need to Python and pip install something to build itself... not good?
> 
> Or we can add it to stdlib. IMO we need to write more tests and more
> documentation. Right now, test_clinic is "light". Another issue is
> that it requires on the currently privte _PyArg_Parser structure. By
> the way, this structure is causing crashes if sub-interpreters are
> run
> in parallel (one GIL per interpreter) because of the
> _PyArg_Parser.kwtuple tuple of keyword parameter names (tuple of
> str).
> 
> If Eric's tool becomes successful inside CPython, we may consider to
> make it available to 3rd party projects as well. Such tool and
> Argument Clinic and not so different than Cython which generates C
> code to ease the development of C extensions modules.


It would be great to have a clear API available to store constants like
this, for NumPy these currently are:
* mainly interned strings
* Some custom singletons which are currently defined in Python
  (including exceptions, but not so important)
* Python functions that are called from C.

I am pretty sure that some of the places that do not have module state
available, but likely many (most?) of them could be refactored (e.g. a
converter function that needs a singleton: it isn't terrible to
manually converter later on).  Many will be in methods.


It would be great to have a "blessed" solution for kwarg parsing.  I
had shunned argument clinic because it felt too internal, probably
CPython specific, and it seemed like you basically need to check in the
generated code for each supported Python version...

The current solution in NumPy [1] is a bit limited and uses those
statics to store the interned strings. And it duplicates Python logic
that would be nice to not duplicate.

Cheers,

Sebastian



[1] For the curious, the API for NumPy looks like this:

/* 
 * Declare a function static struct for string creation/interning
 * and caching other prep (generated on the first call.
 * `npy_parse_arugments` is a macro, but only to insert the cache name)
 */
NPY_PREPARE_ARGPARSER;

if (npy_parse_arguments(funcname, args, len_args, kwnames
"", NULL, &pos_only_obj,
"kw1", &Converter, &kw1_val,
"|kw2", NULL, &kw2_obj,
"$kw_only1", NULL, &kw_only2_obj,
NULL, NULL, NULL) < 0) {
return NULL;
}

So it is very limited to converters (e.g. no "i" or "!O" convenience),
since we don't use that much anyway.
One annoyance is that "i" and "s" don't have a clear "converter" and
that converters can't raise an error message that includes the function
and parameter name.


> 
> Victor
> 
> On Thu, Feb 3, 2022 at 7:50 AM Inada Naoki 
> wrote:
> > 
> > +1 for overall.
> > 
> > On Thu, Feb 3, 2022 at 7:45 AM Eric Snow <
> > ericsnowcurren...@gmail.com> wrote:
> > > 
> > > 
> > > I'd also like to actually get rid of _Py_IDENTIFIER(), along with
> > > other related API including ~14 (private) C-API functions. 
> > > Dropping
> > > all that helps reduce maintenance costs.  However, at least one
> > > PyPI
> > > project (blender) is using _Py_IDENTIFIER().  So, before we could
> > > get
> > > rid of it, we'd first have to deal with that project (and any
> > > others).
> > > 
> > 
> > It would be nice to provide something similar to _PY_IDENTIFIER,
> > but
> > designed (and documented) for 3rd party modules like this.
> > 
> > ```
> > typedef struct {
> >     Py_IDENTIFIER(foo);
> > ...
> > } Modstate;
> > ...
> >     // in some func
> >     Modstate *state = (Modstate*)PyModule_GetState(module);
> >     PyObject_GetAttr(o, PY_IDENTIFIER_STR(state->foo));
> > ...
> > // m_free()
> > static void mod_free(PyObject *module) {
> >     Modstate *state = (Modstate*)PyModule_GetState(module);
> >     Py_IDENTIFIER_CLEAR(state->foo);
> > }
> > ```
> > 
> > 
> > --
> > Inada Naoki  
> > ___
> > Python-Dev mailing list -- python-dev@python.org
> > To unsubscribe send an email to python-dev-le...@python.org
> > https://mail.python.org/mailman3/lists/python-dev.python.org/
> > Message archived at 
> > https://mail.python.org/archives/list/python-dev@python.org/message/ZZ5QOZDOAO734SDRJGMXW6AJGAVEPUHE/
> > Code of Conduct: http://python.org/psf/codeofconduct/
> 
> 
> 


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PBBH2RXKZO2HYWYLNZU5VJ6B7BJSABKZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 2)

2022-02-23 Thread Sebastian Berg
On Thu, 2022-02-24 at 00:21 +0100, Antonio Cuni wrote:
> On Mon, Feb 21, 2022 at 5:18 PM Petr Viktorin 
> wrote:
> 
> Should we care about hacks/optimizations that rely on having the only
> > reference (or all references), e.g. mutating a tuple if it has
> > refcount
> > 1? Immortal objects shouldn't break them (the special case simply
> > won't
> > apply), but this wording would make them illegal.
> > AFAIK CPython uses this internally, but I don't know how
> > prevalent/useful it is in third-party code.
> > 
> 
> FWIW, a real world example of this is numpy.ndarray.resize(...,
> refcheck=True):
> https://numpy.org/doc/stable/reference/generated/numpy.ndarray.resize.html#numpy.ndarray.resize
> https://github.com/numpy/numpy/blob/main/numpy/core/src/multiarray/shape.c#L114
> 
> When refcheck=True (the default), numpy raises an error if you try to
> resize an array inplace whose refcnt > 2 (although I don't understand
> why >
> 2 and not > 1, and the docs aren't very clear about this).
> 
> That said, relying on the exact value of the refcnt is very bad for
> alternative implementations and for HPy, and in particular it is
> impossible
> to implement ndarray.resize(refcheck=True) correctly on PyPy. So from
> this
> point of view, a wording which explicitly restricts the "legal" usage
> of
> the refcnt details would be very welcome.

Yeah, NumPy resizing is a bit of an awkward point, I would be on-board
for just replacing resize for non

NumPy does also have a bit of magic akin to the "string concat" trick
for operations like:

a + b + c

where it will try do magic and use the knowledge that it can
mutate/reuse the temporary array, effectively doing:

tmp = a + b
tmp += c

(which requires some stack walking magic additionally to the refcount!)

Cheers,

Sebastian


> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/ACJIER45M6XLKUWT6TCLB6QXVZSB74EH/
> Code of Conduct: http://python.org/psf/codeofconduct/



signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HSCF5XPQMWRX45Y2PVNPVSCDT4GC6PTB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: C API: Move PEP 523 "Adding a frame evaluation API to CPython" private C API to the internal C API

2022-03-30 Thread Sebastian Berg
On Wed, 2022-03-30 at 17:51 +0200, Petr Viktorin wrote:
> On 30. 03. 22 17:42, Guido van Rossum wrote:
> > In the not so distant past I have proposed to introduce a new
> > category, 
> > "Unstable APIs". These are public but are not guaranteed to be
> > backwards 
> > compatible in feature releases (though I feel they should remain so
> > in 
> > bugfix releases).
> > 
> > I'm not sure whether those should have a leading underscore or not.
> > Perhaps (like some other languages do and like maybe we've used in
> > a few 
> > places) the name could just include the word "Unstable"?
> 
> IMO, the underscore should mark an API as internal: it can change at
> any 
> time (though in practice it often doesn't, e.g. to accommodate
> projects 
> that used it before a policy is written down).
> 

That is fair, although there are documented underscored ones:
https://docs.python.org/3/search.html?q=_Py

I suppose that means all bets are off _unless_ it is documented or
later adopted as stable API (e.g. `_PyObject_Vectorcall`).

With that, the only "not obviously OK" use in NumPy that I am aware of
is `_Py_HashDouble` (it seems undocumented).

Maybe "unless documented" is just a clear enough distinction in
practice.
Although, to some degree, I think it would be clearer if symbols that
have a realistic chance of changing in bug-fix releases had an
additional safe-guard.

- Sebastian



> This is useful e.g. for macros/static functions that wrap access to 
> something private, where the definition needs to be available but
> marked 
> "keep off".
> 
> 
> > On Wed, Mar 30, 2022 at 8:08 AM Victor Stinner
> >  > > wrote:
> > 
> >     The internal C API can be used on purpose. But there is no
> > backward
> >     compatibility warranty and it can change anytime. In practice,
> > usually
> >     it only changes in 3.x.0 releases. For example, these private C
> > API
> >     changed in Python 3.9 and Python 3.11 (see my first email in
> > the other
> >     PEP 523 thread).
> > 
> >     To use the internal C API, you have to declare the
> > Py_BUILD_CORE macro
> >     and include an internal C API header file. For
> >     _PyInterpreterState_SetEvalFrameFunc(), it should be:
> > 
> >     #ifndef Py_BUILD_CORE_MODULE
> >     #  define Py_BUILD_CORE_MODULE
> >     #endif
> >     #include 
> >     #include  //
> >     _PyInterpreterState_SetEvalFrameFunc()
> >     #include   // _PyEval_EvalFrameDefault
> > 
> >     Victor
> > 
> >     On Tue, Mar 29, 2022 at 12:26 AM Jason Ansel via Python-Dev
> >     mailto:python-dev@python.org>> wrote:
> >  >
> >  > The PyTorch team plans to use PEP 523 as a part of PyTorch
> > 2.0,
> >     so this proposal may break the next major release of PyTorch.
> >  >
> >  > The related project is TorchDynamo, which can be found here:
> >  > https://github.com/facebookresearch/torchdynamo
> >     
> >  >
> >  > We will likely move this into the core of PyTorch closer to
> > release.
> >  >
> >  > If the changed happens, would PyTorch still be able to use
> > the
> >     eval frame API?  Or would it prevent from being used entirely?
> >  > ___
> >  > Python-Dev mailing list -- python-dev@python.org
> >     
> >  > To unsubscribe send an email to python-dev-le...@python.org
> >     
> >  >
> > https://mail.python.org/mailman3/lists/python-dev.python.org/
> >     
> >  > Message archived at
> >    
> > https://mail.python.org/archives/list/python-dev@python.org/message/RVQ7LDIJ2OYAN4QMIPTI3A3PODGBLNN7/
> >    
> >  > e/RVQ7LDIJ2OYAN4QMIPTI3A3PODGBLNN7/>
> >  > Code of Conduct: http://python.org/psf/codeofconduct/
> >     
> > 
> > 
> > 
> >     --
> >     Night gathers, and now my watch begins. It shall not end until
> > my death.
> >     ___
> >     Python-Dev mailing list -- python-dev@python.org
> >     
> >     To unsubscribe send an email to python-dev-le...@python.org
> >     
> >     https://mail.python.org/mailman3/lists/python-dev.python.org/
> >     
> >     Message archived at
> >    
> > https://mail.python.org/archives/list/python-dev@python.org/message/OQTAF6CQRKHQPYUY5HWVOTUAEXKHI5WE/
> >    
> >  > e/OQTAF6CQRKHQPYUY5HWVOTUAEXKHI5WE/>
> >     Code of Conduct: http://python.org/psf/codeofconduct/
> >     
> > 
> > 
> > 
> > -- 
> > --Guido van Rossum (python.org/~guido )
> > /Pr

[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

2020-01-23 Thread Sebastian Berg
On Thu, 2020-01-23 at 18:36 -0800, Guido van Rossum wrote:
> Good question!
> 

It is, below mostly lamenting, so just to say my personal gut feeling
would be that it should probably be considered an "implementation
detail" that this used e.g. by most containers. But besides that it
leads to unexpected behaviour sometimes, I am not sure I have any
actual reasons. (Unless some typing JIT could run into it?)

> I think this started with a valuable optimization for `x in `.
> I don't know if that was ever carefully documented, but I remember
> that it was discussed a few times (and IIRC Raymond was adamant that
> this should be so optimized -- which is reasonable).
> 
> I'm tempted to declare this implementation-defined behavior --
> *implicit* calls to __eq__ and __ne__ *may* be skipped if both sides
> are the same object depending on the whim of the implementation.
> 
> We should probably also strongly recommend that __eq__ and __ne__ not
> do what math.nan does.


Another object similar to this are masked values (which e.g. pandas is
looking at). [2]
In their current definition, masked values would have to behave in a
similar way as numpy arrays, since `bool(NA == NA) -> bool(boolean_NA)`
is an error rather than `True`.
These objects are rare but hard to avoid completely, I guess...

> 
> However we cannot stop rich compare __eq__ implementations that
> return arrays of pairwise comparisons, since numpy does this. (And
> yes, it seems that this means that `x in y` is computed incorrectly
> if x is an array with such an __eq__ implementation and y is a tuple
> of such objects. I'm sure there's a big warning somewhere in the
> numpy docs about this, and I presume if y is a numpy array they make
> sure to do something better.)
> 

I somewhat doubt there is a big a warning currently...

In NumPy we actually stopped using `PyObject_RichCompareBool` within
`np.equal` a pretty long time ago [1]. IIRC we perceived it as a
bug.
However, as you said, object arrays which would succeed randomly [2]
were probably the more important motivation (rather than NaN). 

I do not think anyone has ever evaluated the performance impact of that
change though...

- Sebastian


[0] 
https://github.com/pandas-dev/pandas/pull/29597/files#diff-239ec95d581257ed256954660663b277R825-R827

[1] 
https://numpy.org/devdocs/release/1.13.0-notes.html#futurewarning-to-changed-behavior

[2] For those not familiar with NumPy, in NumPy: `[1, 2] == [1, 2]`
returns `[True, True]` but may here return True (if they are the same
object). The comparison should raise an error because [True, True] does
not generally have a truthiness defined but will succeed randomly.




> On Thu, Jan 23, 2020 at 5:33 PM Tim Peters 
> wrote:
> > PyObject_RichCompareBool(x, y, op) has a (valuable!) shortcut:  if
> > x
> > and y are the same object, then equality comparison returns True
> > and
> > inequality False.  No attempt is made to execute __eq__ or __ne__
> > methods in those cases.
> > 
> > This has visible consequences all over the place, but they don't
> > appear to be documented.  For example,
> > 
> > >>> import math
> > >>> ([math.nan] * 5).count(math.nan)
> > 5
> > 
> > despite that `math.nan == math.nan` is False.
> > 
> > It's usually clear which methods will be called, and when, but not
> > really here.  Any _context_ that calls PyObject_RichCompareBool()
> > under the covers, for an equality or inequality test, may or may
> > not
> > invoke __eq__ or __ne__, depending on whether the comparands are
> > the
> > same object.  Also any context that inlines these special cases to
> > avoid the overhead of calling PyObject_RichCompareBool() at all.
> > 
> > If it's intended that Python-the-language requires this, that needs
> > to
> > be documented.
> > 
> > Or if it's implementation-defined, then _that_ needs to be
> > documented.
> > 
> > Which isn't straightforward in either case, in part because
> > PyObject_RichCompareBool isn't a language-level concept.
> > 
> > This came up recently when someone _noticed_ the list.count(NaN)
> > behavior, and Victor made a PR to document it:
> > 
> > https://github.com/python/cpython/pull/18130
> > 
> > I'm pushing back, because documenting it _only_ for .count() makes
> > .count() seem unique in a way it isn't, and doesn't resolve the
> > fundamental issue:  is this language behavior, or implementation
> > behavior?
> > 
> > Which I don't want to argue about.  But you certainly should ;-)
> > ___
> > Python-Dev mailing list -- python-dev@python.org
> > To unsubscribe send an email to python-dev-le...@python.org
> > https://mail.python.org/mailman3/lists/python-dev.python.org/
> > Message archived at 
> > https://mail.python.org/archives/list/python-dev@python.org/message/3ZAMS473HGHSI64XB3UV4XBICTG2DKVF/
> > Code of Conduct: http://python.org/psf/codeofconduct/
> 
> 
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsu

[Python-Dev] Limited API, MetaClasses, and ExtensionMetaClasses

2020-01-31 Thread Sebastian Berg
Hi all,

I may be thinking in the wrong direction, but right now
`PyType_Type.tp_new` will resolves the `metaclass` from the bases and
call:

type = (PyTypeObject *)metatype->tp_alloc(metatype, nslots);

where `metatype` is actually resolved from the metatype of the bases.

In contrast `PyType_FromSpecWithBases` immediately calls:

res = (PyHeapTypeObject*)PyType_GenericAlloc(&PyType_Type, 0);

So I am curious whether `PyType_FromSpecWithBases` should not do the
same thing, or as to why it does not?
I would also assume that it actually fails to inherit a possible
MetaClass completely, but I have not checked that.

That is the first question. The second, more important one, is whether
ExtensionMetaClasses are a thing at all? Is it reasonable to explore
them, or should I rather give up and use something more like ABCMeta,
which stores information in the `HeapType->tp_dict`, plus tag on
information to the actual instances as needed?
Right now it seems completely fine to me, except that the creation
itself is complicated (and a confusing, but its MetaClasses...).

I am exploring this for NumPy. PySIDE is using such things and wrangled
it into the limited API [1] (PySIDE needs to store a QT pointer
additionally). When I looked the code, I was fairly sure that this only
happened to work because Python allocates a slightly larger space
(effectively making it `nslots+1`, or 1 above).

As far as I can see, the only thing that happens if I use such an
ExtensionMetaClass is that I have a HeapType with a different
tp_basicsize. And I do not see why that should be any different than
a normal Python object.

Best,

Sebastian


[1] 
https://github.com/pyside/pyside2-setup/blob/5.11/sources/shiboken2/libshiboken/pep384impl_doc.rst#diversification


signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4Z47VDZUOEYBWFDKOKUBITG2PFRAWH23/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

2020-02-03 Thread Sebastian Berg
Now, probably this has been rejected a hundred times before, and there
are some very good reason why it is a horrible thought...

But if `PyObject_RichCompareBool(..., Py_EQ)` is such a fundamental
operation (and in a sense it seems to me that it is), is there a point
in explicitly defining it?

That would mean adding `operator.equivalent(a, b) -> bool` which would
allow float to override the result and let
`operator.equivalent_value(float("NaN"), float("NaN))` return True;
luckily very few types would actually override the operation.

That operator would obviously be allowed to use the shortcut.

At that point container `==` and `in` (and equivalence) is defined
based on element equivalence.
NAs (missing value handling) may be an actual use-case where it is more
than a theoretical thought. However, I do not seriously work with NAs
myself.

- Sebastian


On Mon, 2020-02-03 at 16:00 -0600, Tim Peters wrote:
> [Tim]
> > > PyObject_RichCompareBool(x, y, op) has a (valuable!) shortcut: if
> > > x
> > > and y are the same object, then equality comparison returns True
> > > and inequality False. No attempt is made to execute __eq__ or
> > > __ne__ methods in those cases.
> > > ...
> > > If it's intended that Python-the-language requires this, that
> > > needs to
> > > be documented.
> 
> [Raymond]
> > This has been slowly, but perhaps incompletely documented over the
> > years and has become baked in the some of the collections ABCs as
> > well.
> >  For example, Sequence.__contains__() is defined as:
> > 
> > def __contains__(self, value):
> > for v in self:
> > if v is value or v == value:  # note the
> > identity test
> > return True
> > return False
> 
> But it's unclear to me whether that's intended to constrain all
> implementations, or is just mimicking CPython's list.__contains__.
> That's always a problem with operational definitions.  For example,
> does it also constrain all implementations to check in iteration
> order?  The order can be visible, e.g, in the number of times
> v.__eq__
> is called.
> 
> 
> > Various collections need to assume reflexivity, not just for speed,
> > but so that we
> > can reason about them and so that they can maintain internal
> > consistency. For
> > example, MutableSet defines pop() as:
> > 
> > def pop(self):
> > """Return the popped value.  Raise KeyError if empty."""
> > it = iter(self)
> > try:
> > value = next(it)
> > except StopIteration:
> > raise KeyError from None
> > self.discard(value)
> > return value
> 
> As above, except  CPyhon's own set implementation implementation
> doesn't faithfully conform to that:
> 
> > > > x = set(range(0, 10, 2))
> > > > next(iter(x))
> 0
> > > > x.pop() # returns first in iteration order
> 0
> > > > x.add(1)
> > > > next(iter(x))
> 1
> > > > x.pop()  # ditto
> 1
> > > > x.add(1)  # but try it again!
> > > > next(iter(x))
> 1
> > > > x.pop() # oops! didn't pop the first in iteration order
> 2
> 
> Not that I care ;-)  Just emphasizing that it's tricky to say no more
> (or less) than what's intended.
> 
> > That pop() logic implicitly assumes an invariant between membership
> > and iteration:
> > 
> >assert(x in collection for x in collection)
> 
> Missing an "all".
> 
> > We really don't want to pop() a value *x* and then find that *x* is
> > still
> > in the container.   This would happen if iter() found the *x*, but
> > discard()
> > couldn't find the object because the object can't or won't
> > recognize itself:
> 
> Speaking of which, why is "discard()" called instead of "remove()"?
> It's sending a mixed message:  discard() is appropriate when you're
> _not_ sure the object being removed is present.
> 
> 
> >  s = {float('NaN')}
> >  s.pop()
> >  assert not s  # Do we want the language to
> > guarantee that
> >   # s is now empty?  I
> > think we must.
> 
> I can't imagine an actual container implementation that wouldn't. but
> no actual container implements pop() in the odd way MutableSet.pop()
> is written.  CPython's set.pop does nothing of the sort - doesn't
> even
> have a pointer equality test (except against C's NULL and `dummy`,
> used merely to find "the first (starting at the search finger)" slot
> actually in use).
> 
> In a world where we decided that the identity shortcut is _not_
> guaranteed by the language, the real consequence would be that the
> MutableSet.pop() implementation would need to be changed (or made
> NotImplemented, or documented as being specific to CPython).
> 
> > The code for clear() depends on pop() working:
> > 
> > def clear(self):
> > """This is slow (creates N new iterators!) but
> > effective."""
> > try:
> > while True:
> > self.pop()
> > except KeyError:
> > pass
> > 
> > It would unfortunate if clear() could n

[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

2020-02-03 Thread Sebastian Berg
On Mon, 2020-02-03 at 16:43 -0800, Larry Hastings wrote:
> On 2/3/20 3:07 PM, Sebastian Berg wrote:
> > That would mean adding `operator.equivalent(a, b) -> bool` which
> > would
> > allow float to override the result and let
> > `operator.equivalent_value(float("NaN"), float("NaN))` return True;
> > luckily very few types would actually override the operation.
> 
> You misunderstand what's going on here.  Python deliberately makes
> float('NaN') != float('NaN'), and in fact there's special code to
> ensure that behavior.  Why?  Because it's mandated by the IEEE 754
> floating-point standard.
> 
> > https://en.wikipedia.org/wiki/NaN#Comparison_with_NaN
> > 
> 
> This bizarre behavior is often exploited by people exploring the
> murkier corners of Python's behavior.  Changing it is (sadly) not
> viable.
> 

Of course it is not, I am not saying that it should be changed. What I
mainly meant is that in this discussion there was always the talk about
two distinct, slightly different operations:

1. `==` has of course the logic `NaN == NaN -> False`
2. `PyObject_RichCompareBool(a, b, Py_EQ)` was argued to have a useful
   logic of `a is b or a == b`. And I argued that you could define:
   
   def operator.identical(a, b):
   res = a is b or a == b
   assert type(res) is bool  # arrays have unclear logic
   return res

   to "bless" it as its own desired logic when dealing with containers
   (mainly).

And that making that distinction on the language level would be
a(possibly ugly) resolution of the problem.
Only `identical` is actually always allowed to use the `is` shortcut.
Now, for all practical purposes "identical" is maybe already correctly
defined by `a is b or bool(a == b)` (NaN being the largest
inconsistency, since NaN is not a singleton).
Along that line, I could argue that `PyObject_RichCompareBool` is
actually incorrectly implemented and it should be replaced with a new
`PyObject_Identical` in most places where it is used.

Once you get to the point where you accept the existance of `identical`
as a distinct operation, allowing `identical(NaN, NaN)` to be always
true *can* make sense, and resolves current inconsistencies w.r.t.
containers and NaNs.

- Sebastian

> 
> /arry
> 
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/GOJNWAJSFHBSCCJD2RYWNDRN7RJHYWD3/
> Code of Conduct: http://python.org/psf/codeofconduct/


signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CQIATNXMQW3GKZMAKF22GD6TVAO2X5KK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

2020-02-03 Thread Sebastian Berg
On Tue, 2020-02-04 at 13:44 +1100, Steven D'Aprano wrote:
> On Mon, Feb 03, 2020 at 05:26:38PM -0800, Sebastian Berg wrote:
> 


> If you want to argue that "identical or equal" is such a fundamental
> and 
> important operation in Python code that we ought to offer it ready-
> made 
> in the operator module, I'm listening. But my gut feeling here is to
> say 
> "not every one line expression needs to be in the stdlib".
> 

Probably, yes. I am only semi seriously suggesting it. I am happy to
get to the conclusion: NumPy is weird and NaNs are a corner case that
you just have to understand at some point.

Anyway, yes, I hinted at a dunder, I am not sure that is remotely
reasonable. And yes, I thought that if this is an important enough of a
"concept" it may make sense to bless it with a python side function.


> PyObject_RichCompareBool is a different story. "Identical or equal"
> is 
> not so simple to implement correctly in C code, and it is a common


Of course, it is just as simple C. If PyObject_RichCommpareBool would
simply not include the identity check, in which case it is identical to
`bool(a == b)` in python. (Which of course would be annoying to have to
type out.)

>  
> operation used in lists, tuples, dicts and possibly others, so it
> makes 
> sense for there to be a C API for it.
> 
> 
> > and resolves current inconsistencies w.r.t. containers and NaNs.
> 
> How does it resolve these (alleged) inconsistencies?
> 

The alleged inconsistencies (which may be just me) are along these
lines (plus those with NumPy):

import math
print({math.inf - math.inf for i in range(100})
print({math.nan for i in range(10)})

maybe I am alone to perceive that as an inconsistency. I _was_ saying
that if you had a dunder, for this you could enforce that:

 * `a is b` implies `congruent(a, b)`
 * `a == b` implies `congruent(a, b)`
 * `hash(a) == hash(b)` implies `congruent(a, b)`.

So the "inconsistencies" are that of course `hash(NaN)` and `NaN is
NaN` fail to imply `NaN == NaN`, while congruent could be enforced to
do it "right".

Chris said it much better anyway, and is probably right to disregard
the dunder part:

1. Name the current operation (congruent?) to reason about it?
2. Bless it with its own function? (helps maybe documenting it)
3. Consider if its worth resolving the above inconsistencies by making
   it an operator with a dunder.

I am happy to stop at 0 :). I am sure similar discussions about the
hash of NaN come up once a year.

- Sebastian


> The current status quo is that containers perform operations such as 
> equality by testing for identity or equality, which they are
> permitted 
> to do and is documented. Changing them to use your "identical or
> equal" 
> API will (as far as I can see) change nothing about the semantics, 
> behaviour or even implementation (since the C-level containers like
> list 
> will surely still call PyObject_RichCompareBool rather than a 
> Python-level wrapper).
> 
> So whatever inconsistencies exist, they will still exist.
> 
> If I have missed something, please tell me.
> 
> 
> 


signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YEZTWEH7B3SPBV2GIBOOXC2OGWC2CM2T/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 554 for 3.9 or 3.10?

2020-04-21 Thread Sebastian Berg
On Tue, 2020-04-21 at 16:21 +0200, Victor Stinner wrote:
> Le mar. 21 avr. 2020 à 00:50, Nathaniel Smith  a écrit
> :

> 
> 
> > tl;dr: accepting PEP 554 is effectively a C API break, and will
> > force
> > many thousands of people worldwide to spend many hours wrangling
> > with
> > subinterpreter support.
> 
> I fail to follow your logic. When the asyncio PEP was approved, I
> don't recall that suddenly the whole Python community started to
> rewrite all projects to use coroutines everywhere. I tried hard to
> replace eventlet with asyncio in OpenStack and I failed because such
> migration was a very large project with dubious benefits (people
> impacted by eventlet issues were the minority).

Sure, but this is very different. You can still use NumPy in a project
using asyncio. You are _not_ able to use NumPy in a project using
subinterpreters.
Right now, I have to say as soon as the first bug report asking for
this is opened and tells me: But see PEP 554 you should support it! I
would be tempted to put on the NumPy Roadmap/Vision that no current
core dev will put serious efforts into subinterpreters. Someone is
bound to be mad.
Basically, if someone wants it in NumPy, I personally may expect them
to be prepared to invest a year worth of good dev time [1]. Maybe that
is pessimistic, but your guess is as good as mine. At normal dev-pace
it will be at least a few years of incremental changes before NumPy
might be ready (how long did it take Python?)?

The PEP links to NumPy bugs, I am not sure that we ever fixed a single
one. Even if, the remaining ones are much larger and deeper. As of now,
the NumPy public API has to be changed to even start supporting
subinterpreters as far as I aware [2]. This is because right now we
sometimes need to grab the GIL (raise errors) in functions that are not
passed GIL state.


This all is not to say that this PEP itself doesn't seem harmless. But
the _expectation_ that subinterpreters should be first class citizens
will be a real and severe transition burden. And if it does not, the
current text of the PEP gives me, as someone naive about
subinterpreters, very few reasons why I should put in that effort or
reasons to make me believe that it actually is not as bad a transition
as it seems.

Right now, I would simply refuse to spend time on it. But as Nathaniel
said, it may be worse if I did not refuse and in the end only a handful
of users get anything out of my work: The time is much better spend
elsewhere. And you, i.e. CPython will spend your "please fix your C-
extension" chips on subinterpreters. Maybe that is the only thing on
the agenda, but if it is not, it could push other things away.

Reading the PEP, it is fuzzy on the promises (the most concrete I
remember is that it may be good for security relevant reasons), which
is fine, because the goal is "experimentation" more than use?

So if its more about "experimentation", then I have to ask, whether:

1. The PEP can state that more obviously, it wants to be
provisionally/experimentally accept? So maybe it should even say that
that extension modules are not (really) encouraged to transition unless
they feel a significant portion of their users will gain.

2. The point about developing it outside of the Python standard lib
should be considered more seriously. I do not know if that can be done,
but C-API additions/changes/tweaks seem a bit orthogonal to the python
exposure? So maybe it actually is possible?

As far as I can tell, nobody can or _should_ expect subinterpreters to
actually run most general python code for many years. Yes, its a
chicken-and-egg problem, unless users start to use subinterpreters
successfully, C-extensions should probably not even worry to
transition.
This PEP wants to break the chicken-and-egg problem to have a start,
but as of now, as far as I can tell, it *must not* promise that it will
ever work out.

So, I cannot judge the sentiment or subinterpreters. But it may be good
to make it *painfully* clear what you expect from a project like NumPy
in the next few years. Alternatively, make it painfully clear that you
possibly even discourage us from spending time on it now, if its not
straight forward. Those using this module are on their own for many
years, probably even after success is proven.

Best,

Sebastian


[1] As of now, the way I see it is that I could not even make NumPy
(and probably many C extensions) work, because I doubt that the limited
API has been exercised enough [2] and I am pretty sure it has holes.
Also the PEP about passing module state around to store globals
efficiently seems necessary, and is not in yet? (Again, trust: I have
to trust you that e.g. what you do to make argument parsing not have
overhead in argument clinic will be something that I can use for
similar purposes within NumPy)

[2]  I hope that we will do (many) these changes for other reasons
within a year or so, but they go deep into code barely touched in a
decade. Realistically, even after the straigh

[Python-Dev] Re: PEP 554 for 3.9 or 3.10?

2020-04-21 Thread Sebastian Berg
On Tue, 2020-04-21 at 11:21 -0500, Sebastian Berg wrote:
> On Tue, 2020-04-21 at 16:21 +0200, Victor Stinner wrote:
> > Le mar. 21 avr. 2020 à 00:50, Nathaniel Smith  a
> > écrit
> > :
> 

> As far as I can tell, nobody can or _should_ expect subinterpreters
> to
> actually run most general python code for many years. Yes, its a
> chicken-and-egg problem, unless users start to use subinterpreters
> successfully, C-extensions should probably not even worry to
> transition.
> This PEP wants to break the chicken-and-egg problem to have a start,
> but as of now, as far as I can tell, it *must not* promise that it
> will
> ever work out.
> 
> So, I cannot judge the sentiment or subinterpreters. But it may be
> good
> to make it *painfully* clear what you expect from a project like
> NumPy

Maybe one of the frustrating points about this criticism is that it
does not belong in this PEP. And that is actually true! I
wholeheartedly agree that it doesn't really belong in this PEP itself.

*But* the existence of a document detailing the "state and vision for
subinterpreters" that includes these points is probably a prerequisite
for this PEP. And this document must be linked prominently from the
PEP.

So the suggestion should maybe not be to discuss it in the PEP, but to
to write it either in the documentation on subinterpreters or as an
informational PEP. Maybe such document already exists, but then it is
not linked prominently enough probably.

- Sebastian


> in the next few years. Alternatively, make it painfully clear that
> you
> possibly even discourage us from spending time on it now, if its not
> straight forward. Those using this module are on their own for many
> years, probably even after success is proven.
> 
> Best,
> 
> Sebastian
> 
> 
> [1] As of now, the way I see it is that I could not even make NumPy
> (and probably many C extensions) work, because I doubt that the
> limited
> API has been exercised enough [2] and I am pretty sure it has holes.
> Also the PEP about passing module state around to store globals
> efficiently seems necessary, and is not in yet? (Again, trust: I have
> to trust you that e.g. what you do to make argument parsing not have
> overhead in argument clinic will be something that I can use for
> similar purposes within NumPy)
> 
> [2]  I hope that we will do (many) these changes for other reasons
> within a year or so, but they go deep into code barely touched in a
> decade. Realistically, even after the straight forward changes (such
> as
> using the new PEPs for module initialization), these may take up an
> additional few months of dev time (sure, get someone very good or
> does
> nothing else, they can do it much quicker maybe).
> So yes, from the perspective of a complex C-extension, this is
> probably
> very comparable to the 2to3 change (it happened largely before my
> time
> though).
> 
> [3] E.g. I think I want an ExtensionMetaClass, a bit similar as an
> ABC,
> but I would prefer to store the data in a true C-slot fashion. The
> limited API cannot do MetaClasses correctly as far as I could tell
> and
> IIRC is likely even a bit buggy.
> Are ExtensionMetaClasses crazy? Maybe, but PySide does it too (and as
> far as I can tell, they basically get away with it by a bit of
> hacking
> and relying on Python implementation details.
> 
> 
> 
> > When asyncio landed in Python 3.4, a few people started to
> > experiment
> > it. Some had a bad experience. Some others were excited and put a
> > few
> > applications in production.
> > 
> > Even today, asyncio didn't replace threads, multiprocessing,
> > concurrent.futures, etc. There are even competitor projects like
> > Twisted, trio and curio! (Also eventlet and gevent based on
> > greenlet
> > which is a different approach). I only started to see very recently
> > project like httpx which supports both blocking and asynchronous
> > API.
> > 
> > I see a slow adoption of asyncio because asyncio solves very
> > specific
> > use cases. And that's fine!
> > 
> > I don't expect that everyone will suddenly spend months of work to
> > rewrite their C code and Python code to be more efficient or fix
> > issues with subinterpreters, until a critical mass of users proved
> > that subinterpreters are amazing and way more efficient!
> > 
> > Victor
> > -- 
> > Night gathers, and now my watch begins. It shall not end until my
> > death.
> > ___
> > Python-Dev mailing list -- python-dev@python.org
> > To unsubscribe send an email to python-dev-le...@python.org
> 

[Python-Dev] Re: PEP 554 for 3.9 or 3.10?

2020-04-29 Thread Sebastian Berg
On Tue, 2020-04-28 at 19:20 -0600, Eric Snow wrote:
> On Tue, Apr 21, 2020 at 11:17 AM Sebastian Berg
>  wrote:
> > Maybe one of the frustrating points about this criticism is that it
> > does not belong in this PEP. And that is actually true! I
> > wholeheartedly agree that it doesn't really belong in this PEP
> > itself.
> > 
> > *But* the existence of a document detailing the "state and vision
> > for
> > subinterpreters" that includes these points is probably a
> > prerequisite
> > for this PEP. And this document must be linked prominently from the
> > PEP.
> > 
> > So the suggestion should maybe not be to discuss it in the PEP, but
> > to
> > to write it either in the documentation on subinterpreters or as an
> > informational PEP. Maybe such document already exists, but then it
> > is
> > not linked prominently enough probably.
> 
> That is an excellent point.  It would definitely help to have more
> clarity about the feature (subinterpreters).  I'll look into what
> makes the most sense.  I've sure Victor has already effectively
> written something like this. :)
> 

I will note one more time that I want to back up almost all that
Nathaniel said (I simply cannot judge the technical side though).

While I still think it is probably not part of PEP 554 as such, I guess
it needs a full blown PEP on its own. Saying that Python should
implement subinterpreters. (I am saying "implement" because I believe
you must consider subinterpreters basically a non-feature at this time.
It has neither users nor reasonable ecosystem support.)

In many ways I assume that a lot of the ground work for subinterpreters
was useful on its own. But please do not underestimate how much effort
it will take to make subinterpreters first class citizen in the
language!

Take PyPy for example, it took years for PyPy support in NumPy (and the
PyPy people did pretty much _all_ the work).
And PyPy provides a compatibility layer that makes the support orders
of magnitude simpler than supporting subinterpreters.

And yet, I am sure there are many many C-Extensions out there that will
fail on PyPy.  So unless the potential subinterpreter userbase is
magnitudes larger than PyPy's the situation will be much worse.  With
the added frustration because PyPy users probably expect
incompatibilities, but Python users may get angry if they think
subinterpreters are a language feature.


There have been points made about e.g. just erroring on import for
modules which do not choose to support subinterpreters. And maybe in
the sum of saying:

* We warn that most C-extensions won't work
   -> If you error, at least it won't crash silently (some mitigation)

* Nobody must expect any C-extension to work until subinterpreters
  have proven useful *and* a large userbase!

* In times, we are taking here about, what?
  - Maybe 3 years until proven useful and a potentially large userbase?
  - Some uncertain amount longer until the user-base actually grows
  - Maybe 5 years until fairly widespread support for some central
libraries after that? (I am sure Python 2 to 3 took that long)

* Prototyping the first few years (such as an external package, or
  even a fork!) are not really very good, because... ?
  Or alternatively the warnings will be so penetrant that prototyping
  within cpython is acceptable. Maybe you have to use:

 python 
--subinterpreters-i-know-this-is-only-a-potential-feature-which-may-be-removed-in-future-python-versions
 myscript.py

  This is not in the same position as most "experimental" APIs, which
  are almost settled but you want to be careful. This one has a real
  chance of getting ripped out entirely!?

is good enough, maybe it is not. Once it is written down I am confident
the Python devs and steering council will make the right call.

Again, I do not want to hinder the effort. It takes courage and a good
champion to go down such a long and windy road.
But Nathaniel is right that putting in the effort puts you into the
trap of thinking that now that we are 90% there from a Python
perspective, we should go 100%.
100% is nice, but you may have to reach 1000++% (i.e. C-extension
modules/ecysystem support) to actually have a fully functional feature.

You should get the chance to prove them useful, there seems enough
positivity around the idea.  But the question is within what framework
that can reasonably happen.

Simply pushing in PEP 554 with a small warning in the documentation is
not the right framework.
I hope you can find the right framework to push this on. But
unfortunately it is the difficult job of the features champion. And
Nathaniel, etc. are actually trying to help you with it (but in the end
are not champions for it, so you cannot expect too much). E.g. by
asking if it cannot be developed outside of cpython and po

[Python-Dev] Re: Detect memory leaks in unit tests

2020-05-13 Thread Sebastian Berg
On Wed, 2020-05-13 at 13:14 +0100, Pablo Galindo Salgado wrote:
> > But again this is for PyObjects only.
> 
> Not really, we check also memory blocks:
> 
> https://github.com/python/cpython/blob/master/Lib/test/libregrtest/refleak.py#L72
> 
> as long as you don't directly call malloc and use one of the Python
> specific APIs like PyMem_Malloc
> then the reflect code should catch that.

Maybe worth briefly mentioning in this discussion that pytest-leaks
exists:

https://github.com/abalkin/pytest-leaks

which serves much the same purpose as refleak.py for pytest users. (I
run it semi-regularly on NumPy and it helped me get the tests de-facto
free of leaks.) I have used valgrind for similar purpose, but
refleaks/pytest-leaks is better for leaks.

Cheers,

Sebastian


> 
> On Wed, 13 May 2020 at 12:29, Giampaolo Rodola' 
> wrote:
> 
> > On Wed, May 13, 2020 at 9:17 AM Ammar Askar 
> > wrote:
> > 
> > >  > Py_DECREF calls in the C code
> > > 
> > > I think this part specifically is already covered through refleak
> > > checks:
> > > https://github.com/python/cpython/blob/master/Lib/test/libregrtest/refleak.py
> > > 
> > > Since it can involve the repetition of tests many times, these
> > > aren't
> > > run on the CI though, they do get run on the refleak buildbots so
> > > issues get caught eventually:
> > > https://buildbot.python.org/all/#/builders?tags=%2Brefleak
> > > 
> > > But again this is for PyObjects only. Were you able to find any
> > > memory
> > > leaks with your proof-of-concept? I don't think there's a lot of
> > > chances of someone missing a PyMem_Free call and there's not a
> > > lot of
> > > other manual memory management but I could be wrong. Anything
> > > found
> > > there could help motivate adding this a bit more.
> > 
> > Yeah, I agree it depends on how many PyMem_* occurrences are there,
> > and it
> > probably makes more sense to cover those ones only. Under Modules/*
> > I found:
> > 
> > - 24 occurrences for PyMem_RawMalloc
> > - 2 for PyMem_RawCalloc
> > - 106 for PyMem_Malloc
> > - 12 for PyMem_Calloc
> > - 39 for PyMem_New
> > - 5 for " = malloc("
> > 
> > I spent an hour covering around 20 of those and didn't find any
> > leak. It's
> > a boring work. I will try to work on it over the next few weeks.
> > 
> > ___
> > Python-Dev mailing list -- python-dev@python.org
> > To unsubscribe send an email to python-dev-le...@python.org
> > https://mail.python.org/mailman3/lists/python-dev.python.org/
> > Message archived at
> > https://mail.python.org/archives/list/python-dev@python.org/message/C7ZZRDPGIUS7Q6Q4AS4YFPD2OOF56JBO/
> > Code of Conduct: http://python.org/psf/codeofconduct/
> > 
> 
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/YNBGAWNCCHF3XK3MBHJWZI4NKTNEPKV6/
> Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TOPMBKEH7VVKMDZM7QE6XQ6GUU64WKCW/
Code of Conduct: http://python.org/psf/codeofconduct/