[issue37905] Remove NormalDist.overlap() or improve documentation?
New submission from Christoph Deil : I saw that Python 3.8 will add a NormalDist class: https://docs.python.org/3.8/library/statistics.html#normaldist-objects Personally I don't see the value of adding this to the Python standard lib. The natural progression would be to extend and extend, but in the end only duplicate what already exists in scientific Python packages. But Ok, I guess this is not up for debate any more? I'd like to make a specific comment on NormalDist.overlap. The rest of NormalDist is very standard, but that method is an oddball. My suggestion is to remove it or to improve the documentation. Current docstring: https://github.com/python/cpython/blob/44f2c096804e8e3adc09400a59ef9c9ae843f339/Lib/statistics.py#L959-L991 And this docs example: https://github.com/python/cpython/commit/318d537daabf2bd5f781255c7e25bfce260cf227#diff-d436928bc44b5d7c40a8047840f55d35R620-R629 > What percentage of men and women will have the same height in `two normally distributed populations with known means and standard deviations <http://www.usablestats.com/lessons/normal>`_? 50.3% This statement doesn't make sense to me. No two people have the exact same height, I think the answer to this question should be 0%. Using n = 100_000; sum(m > w for m, w in zip(men.samples(n), women.samples(n))) / n I see that for 82% of random (men, women) matches the man will be larger. That's another measure, but still, stating that 50% of men and women have the same height is confusing. Note that there is a multitude of PDF overlap measures different from this min(pdf1, pdf2) that I think are much more common in statistics and the physical sciences: - https://en.wikipedia.org/wiki/Hellinger_distance - https://arxiv.org/pdf/1407.7172.pdf And note that the references that are given currently are weird (basic statistics textbooks would be appropriate references IMO, or open references like Wikipedia) - slides: http://www.iceaaonline.com/ready/wp-content/uploads/2014/06/MM-9-Presentation-Meet-the-Overlapping-Coefficient-A-Measure-for-Elevator-Speeches.pdf - implementation code comment points to http://dx.doi.org/10.1080/03610928908830127 which is behind a paywall Why add this one overlap measure and expose it under the "overlap" method name? My suggestion would be to be conservative and to remove that method again, before releasing it in 3.8. A reference in the docs could be added to other existing third-party codes (e.g. scipy or the uncertainties package) with further functionality, such as being able to handle correlations or multi-dimensional distributions. For this change I'd be happy to send a PR any time. Raymond and others interested in this topic - thoughts? (note: I wrote a MultiNorm class prototype last year at https://github.com/cdeil/multinorm/blob/master/multinorm.py and now wanted to rewrite it and try to find a good API and thus was interested in this NormalDist class and what functionality it offers) -- components: Library (Lib) messages: 350076 nosy: Christoph.Deil, rhettinger priority: normal severity: normal status: open title: Remove NormalDist.overlap() or improve documentation? type: enhancement versions: Python 3.8 ___ Python tracker <https://bugs.python.org/issue37905> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37905] Remove NormalDist.overlap() or improve documentation?
Christoph Deil added the comment: The Monte Carlo example here has completely unstable results: https://github.com/python/cpython/commit/cc353a0cd95d9b0c93ed0b60ba762427a94c790d#diff-d436928bc44b5d7c40a8047840f55d35R633 If you run it multiple times, you will see that `mean` is relatively stable, but `stddev` varies from 10 to 50 to 100. The reason is that in the model there's a division by z, and the z distribution used has values arbitrarily close to zero: >>> NormalDist(5, 1.25).cdf(0) * 100_000 3.16 Suggest to change to a MC sampling example that isn't as pathological, doesn't involve division by zero. E.g. change the mean of z to 50, or reduce the stddev to 0.125 or some such change in parameters. Usually in stats or machine learning books and docs e.g. on statsmodels or scikit-learn etc., for methods where random numbers are involved, the seed is always set to a fixed value, to have reproducible results & docs. Suggest to make that change also here. -- ___ Python tracker <https://bugs.python.org/issue37905> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37905] Improve docs for NormalDist
Christoph Deil added the comment: Thank you, Raymond! -- ___ Python tracker <https://bugs.python.org/issue37905> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30882] Built-in list disappeared from Python 2.7 intersphinx inventory
New submission from Christoph Deil: We have a project where we sub-class `list`. Since recently our docs build is failing because the intersphinx inventory entry for `list` on Python 2.7 doesn't exist any more. I think this is a regression, because Python 2.7 is supposed to be stable and other functions and classes here are still there, just "list" is missing: https://docs.python.org/2.7/library/functions.html#func-list Just in case someone else sees this issue, the Sphinx warning looks like this: ``` docs/api/pyregion.ShapeList.rst:7: WARNING: py:class reference target not found: list ``` if you have something like ``` class ShapeList(list): """My list sub-class""" ``` -- assignee: docs@python components: Documentation messages: 297993 nosy: Christoph.Deil, docs@python priority: normal severity: normal status: open title: Built-in list disappeared from Python 2.7 intersphinx inventory versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue30882> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16947] Search for "sherpa" on pypi leads to gitflow
New submission from Christoph Deil: If you enter "sherpa" on http://pypi.python.org you currently get http://pypi.python.org/pypi/gitflow/0.5.0 Why? It doesn't make much sense as the term "sherpa" doesn't appear on that pypi page. Instead pypi should say "not found", as the sherpa Python package is not registered on pypi: http://cxc.cfa.harvard.edu/contrib/sherpa/ -- components: None messages: 179813 nosy: Christoph.Deil priority: normal severity: normal status: open title: Search for "sherpa" on pypi leads to gitflow type: behavior ___ Python tracker <http://bugs.python.org/issue16947> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16947] Search for "sherpa" on pypi leads to gitflow
Christoph Deil added the comment: Sorry about that. Ticket in PyPI tracer is now here: http://sourceforge.net/tracker/?func=detail&aid=3600625&group_id=66150&atid=513503 -- ___ Python tracker <http://bugs.python.org/issue16947> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com