[Python-Dev] Re: PEP 618: Add Optional Length-Checking To zip

2020-05-12 Thread Paul Moore
On Tue, 12 May 2020 at 07:53, Brandt Bucher  wrote:

> > > However, zip_longest is really another beast entirely
> > No, it isn't.
>
> It has a completely independent implementation, a different interface, lives 
> in a separate namespace, and doesn't even reference zip in its documentation. 
> So it seems to me that it is indeed another beast entirely.
>
> > > so it makes sense that it would live in itertools while zip grows 
> > > in-place.
> > No, it doesn't
>
> See above for why I think it does.

... so it's another beast because (among other reasons) it lives in a
separate namespace, and it should live in a separate namespace because
it's another beast? That's circular logic.

If we were to put zip_strict into itertools, you could use*precisely*
this logic to argue that it was the right thing to do.


> > > The goal here is not just to provide a way to catch bugs, but to also 
> > > make it easy (even tempting) for a user to enable the check whenever 
> > > using zip at a call site with this property.
> > Importing necessary functions is not an anti-pattern.
>
> Um, agreed?

So importing zip_strict from itertools is an entirely reasonable way
for users to enable the check, then.

> > > Another proposed idiom, per-module shadowing of the built-in zip with 
> > > some subtly different variant from itertools, is an anti-pattern that 
> > > shouldn't be encouraged.
> > Source?
>
> Point taken. I probably went a bit far labeling this a straight-up 
> "anti-pattern", but it is certainly annoying to find that someone has added 
> `from pprint import pprint as print` at the top of a module, for example 
> (which has actually happened to me before).  Very hard to figure out what's 
> happening.

Also irrelevant. It's very easy to suggest bad ways of using a
feature. That doesn't make the feature bad. You seem to be arguing
that zip_strict is bad because people can misuse it. We could probably
remove 99% of the Python language by that argument...

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QYQHK6BWILSORA2XSGV7AUEZJTBUOSIL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 618: Add Optional Length-Checking To zip

2020-05-12 Thread Chris Angelico
On Tue, May 12, 2020 at 5:20 PM Paul Moore  wrote:
>
> On Tue, 12 May 2020 at 07:53, Brandt Bucher  wrote:
> > > > Another proposed idiom, per-module shadowing of the built-in zip with 
> > > > some subtly different variant from itertools, is an anti-pattern that 
> > > > shouldn't be encouraged.
> > > Source?
> >
> > Point taken. I probably went a bit far labeling this a straight-up 
> > "anti-pattern", but it is certainly annoying to find that someone has added 
> > `from pprint import pprint as print` at the top of a module, for example 
> > (which has actually happened to me before).  Very hard to figure out what's 
> > happening.
>
> Also irrelevant. It's very easy to suggest bad ways of using a
> feature. That doesn't make the feature bad. You seem to be arguing
> that zip_strict is bad because people can misuse it. We could probably
> remove 99% of the Python language by that argument...
>

And considering that "from __future__ import print_function" is an
officially-sanctioned way to cause a semantic change to print, I don't
think it's really that strong an argument. Python is *deliberately*
designed so that you can shadow things. I am most in favour of the
separate-functions option *because* it makes shadowing easy. Not an
anti-pattern at all.

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EWNW7SQGN55NIME6LD3NVVJUWIKXZO4I/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 618: Add Optional Length-Checking To zip

2020-05-12 Thread Antoine Pitrou
On Tue, 12 May 2020 06:48:32 -
"Brandt Bucher"  wrote:
> 
> > > A good rule of thumb is that "mode-switches" which change return types or 
> > > significantly alter functionality are indeed an anti-pattern,  
> > Source?  
> 
> This was based on a chat with someone who has chosen not to become involved 
> in the larger discussion, and it was lifted almost verbatim from my notes 
> into the draft.

It's also something that makes more or less consensus among core
developers, IMHO.

Regards

Antoine.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ABJPSG65M5BPEZYMZ62NCA2KPDLOR6G7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] inspect.getdoc and (Not) returning type/superclass docstrings in 3.9

2020-05-12 Thread Matthias Bussonnier
Hi All, 

# Too long didn't read:

In 3.9 inspect.getdoc(instance) behavior was changed and does not return the 
documentation of type(instance) or it's superclass(es) – I think this is a 
problematic change of for some project and interactive use to get info on 
objects that get rarely directly constructed by users. For example pandas 
dataframe obtained via `pandas.read_csv(filepath)`.

I'd like to ask for reconsideration, and that change of behavior are better 
suited in a new function; potentially deprecating the old one.

# Longer version

In https://bugs.python.org/issue40257 attempts are made to improve the output 
of `pydoc`, it particular it is difficult to have fine grained logic depending 
on where the documentation comes from (instance, class , superclass, etc..). 
Which sometime can lead to nonsensical help.

The following are given as examples:

> inspect.getdoc(1) returns the same as inspect.getdoc(int)

or 

>>> import wave
>>> help(wave.Error)
Help on class Error in module wave:

class Error(builtins.Exception)
 |  Common base class for all non-exit exceptions.
 |  
 |  Method resolution order:
...


In 3.9 the behavior of `inspect.getdoc()` has been changed to be way more 
restrictive in what it returns, often returning None where it used to return 
docstrings. 

I agree with the end goal of having more controllable way of finding where the 
documentation/docstrings is coming from and avoiding incorrect docs in pydoc 
and help, 
though I find that change of behavior of `getdocs()` might not be the right 
approach.

I'm quite worried many project rely on current behavior of `getdocs()` – at 
least IPython/Jupyter does to provide user with help/superhelp accessible via 
obj? and obj??.

I would also argue that inaccurate help is also often better than no help.
With current state on Python 3.9,  a few things like asking for help on a 
pandas dataframe instance will currently loose informations.
>>> import pandas as pd
>>> from inspect import getdoc
>>> df = pd.read_csv('mydata.csv')
>>> print(getdoc(df))
None

I'm taking the example of pandas as this is typically the kind of objects you 
don't construct directly, and get via for example `read_csv()`, or that another 
API/Package return to you. 

I haven't been able to confirm yet exactly how this affects sphinx rendering of 
docs, and how other IDEs provide help (Spyder, Pycharm...), or other projects 
that use `getdocs()`. 

I've found mentions of `getdocs()` in numpy, scipy, jedi, matplotlib ... as 
well (sphinx extension and various dynamic docs), and working on building them 
on 3.9 to check the effect.

In general though the effect of `getdoc()` rarely seem to be tested as they 
will directly be user facing is my feeling – I was lucky to catch it in 
IPython/Jupyter as the failing test was unrelated and indirectly relying on the 
exact output of a subprocess. 

From the IPython/Jupyter perspective I would prefer to keep current behavior of 
`inspect.getdocs()` potentially deprecating it if you wish to, and provide an 
alternative that have a behavior of your choosing. Dealing with functions with 
slightly chaging behavior across Python version is not the best experience, and 
this would let the ecosystem get some chance to adapt. Updated project get 
rarely released  in synchrony with new Python versions.

Your thoughts on this issue are welcome, thanks for all your work on core 
python, and I'll support any decision that get made. 
-- 
Matthias
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6QO2XI5B7RVZDW3YZV24LYD75VGRITFU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 618: Add Optional Length-Checking To zip

2020-05-12 Thread Ethan Furman

On 05/11/2020 11:48 PM, Brandt Bucher wrote:

On 05/10/2020 14:39 PM, Ethan Furman wrote:

On 05/10/2020 09:04 AM, Brandt Bucher wrote:




However, zip_longest is really another beast entirely


No, it isn't.


It has a completely independent implementation, a different interface, lives in 
a separate namespace,


- both take an unknown number of iterables
- both return tuples
- both names start with `zip`
- both stop at exhaustion
  - one as soon as possible
  - the other as late as possible
- one has one extra parameter

Those seem like very similar beasts to me.


 and doesn't even reference zip in its documentation.


So update the docs.

--
~Ethan~
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LJ3MDIPO3AZCTXIFNQ3AM4F2O72GUTZ3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 618: Add Optional Length-Checking To zip

2020-05-12 Thread Ethan Furman

On 05/11/2020 11:48 PM, Brandt Bucher wrote:

On 05/10/2020 14:39, Ethan Furman wrote:

On 05/10/2020 09:04 PM, Brandt Bucher wrote:



the author has counted dozens of other call sites in Python's standard library 
and
 tooling where it would be appropriate to enable this new feature immediately.


References, please.


Here are two dozens:



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/_pydecimal.py#L3394
- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/_pydecimal.py#L3418
- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/_pydecimal.py#L3435


These are all after a function that ensures the iterables are the same length 
-- hardly seems a good idea to slow them down with an extra check for each 
digit.



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/ast.py#L94-L95


This one already has a check.



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/ast.py#L1184
- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/ast.py#L1275


Reasonable.



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/ast.py#L1363


Already has a check.



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/ast.py#L1391


Reasonable.



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/copy.py#L217
- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/csv.py#L142


Mismatch cannot happen.



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/dis.py#L462


Unsure if mismatch can happen.



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/filecmp.py#L142
- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/filecmp.py#L143


Mismatch cannot happen.



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/inspect.py#L1440
- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/inspect.py#L2095


Reasonable.



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/os.py#L510


Wow -- I don't even know how to parse that!



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/plistlib.py#L577


Maybe.



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/tarfile.py#L1317
- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/tarfile.py#L1323


Definitely.



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/tarfile.py#L1339


Mismatch cannot happen.



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/turtle.py#L3015
- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/turtle.py#L3071


Reasonable.



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/turtle.py#L3901


Mismatch cannot happen.



- 
https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/setup.py#L455


Mismatch cannot happen.


So half of your examples are actually counter-examples.  Did you vet them, or 
just pick matches against `zip(` ?

Also, if a flag is used, won't that slow down every call to zip even when the 
flag is False?  I know in many cases it probably won't matter, but I can see 
where it could in _pydecimal.

--
~Ethan~
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LZUNHS7RBAAJKYADI7P6WTRR25CQFD2Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 618: Add Optional Length-Checking To zip

2020-05-12 Thread Chris Angelico
On Wed, May 13, 2020 at 4:38 AM Ethan Furman  wrote:
>
> > - 
> > https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/os.py#L510
>
> Wow -- I don't even know how to parse that!
>

Wow, that's quite an example. Of something, I'm not sure what, but
definitely an example. Based on two booleans, entries is either None
or a list. If it's None, this loops over just the directory names; if
it's a list, then it's been populated in perfect parallel to dirs (see
the preceding loop), thus guaranteeing that the two lists are
perfectly parallel. But in that case, "name" actually gets a tuple of
(name,entry), and then inside the loop, it does a three-way branch
that is guaranteed (and asserted) to split out the name and entry ONLY
when there actually will be one.

Definitely an odd piece of code. But it can never zip over things of
different lengths.

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/P6K3JGADX5A34CY2VOKVCJDYTDX3WQEY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Detect memory leaks in unit tests

2020-05-12 Thread Giampaolo Rodola'
Hello there,
I would like to discuss a proposal regarding one aspect which AFAIK is
currently missing from cPython's test suite: the ability to detect memory
leaks of functions implemented in the C extension modules.
In psutil I use a test class/framework which calls a function many times,
and fails if the process memory increased after doing so. I do this in
order to quickly detect missing free() or Py_DECREF calls in the C code,
but I suppose there may be other use cases. Here's the class:
https://github.com/giampaolo/psutil/blob/913d4b1d6dcce88dea6ef9382b93883a04a66cd7/psutil/tests/__init__.py#L901

Detecting a memory leak is no easy task, and that's because the process
memory fluctuates. Sometimes it may increase (or even decrease!) even if
there's no leak, I suppose because of how the OS handles memory, the
Python's garbage collector, the fact that RSS is an approximation, and who
knows what else. In order to compensate fluctuations I did the following:
in case of failure (mem > 0 after calling fun() N times) I retry the test
for up to 5 times, increasing N (repetitions) each time, so I consider the
test a failure only if the memory keeps increasing across all runs. So for
instance, here's a legitimate failure:


psutil.tests.test_memory_leaks.TestModuleFunctionsLeaks.test_disk_partitions
...
Run #1: extra-mem=696.0K, per-call=3.5K, calls=200
Run #2: extra-mem=1.4M, per-call=3.5K, calls=400
Run #3: extra-mem=2.1M, per-call=3.5K, calls=600
Run #4: extra-mem=2.7M, per-call=3.5K, calls=800
Run #5: extra-mem=3.4M, per-call=3.5K, calls=1000
FAIL

If, on the other hand, the memory increased on one run (say 200 calls) but
decreased on the next run (say 400 calls), then it clearly means it's a
false positive, because memory consumption may be > 0 on the second run,
but if it's lower than the previous run with less repetitions, then it
cannot possibly represent a leak (just a fluctuation):


psutil.tests.test_memory_leaks.TestModuleFunctionsLeaks.test_net_connections
...
Run #1: extra-mem=568.0K, per-call=2.8K, calls=200
Run #2: extra-mem=24.0K, per-call=61.4B, calls=400
OK

This is the best I could come up with as a simple leak detection mechanism
to integrate with CI services, and keep more advanced tools like Valgrind
out of the picture (I just wanted to know if there's a leak, not to debug
the leak itself). In addition, since psutil is able to get the number of
fds (UNIX) and handles (Windows) opened by a process, I also run a separate
set of tests to make sure I didn't forget to call close(2) or CloseHandle()
in C.

Would something like this make sense to have in cPython? Here's a quick PoC
I put together just to show how this thing would look like in practice:
https://github.com/giampaolo/cpython/pull/2/files
A proper work in terms of API coverage would result being quite huge (test
all C modules), and ideally should also include cases where functions raise
an exception when being fed with an improper input. The biggest stopper
here is, of course, psutil, since it's a third party dep, but before
getting to that I wanted to see how this idea is perceived in general.

Cheers,

-- 
Giampaolo - http://grodola.blogspot.com
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NFHW4TP3ALY3CVRBVKRI4SRG7BOLZLJH/
Code of Conduct: http://python.org/psf/codeofconduct/