[issue37905] Remove NormalDist.overlap() or improve documentation?

2019-08-21 Thread Christoph Deil


New submission from Christoph Deil :

I saw that Python 3.8 will add a NormalDist class:
https://docs.python.org/3.8/library/statistics.html#normaldist-objects

Personally I don't see the value of adding this to the Python standard lib. The 
natural progression would be to extend and extend, but in the end only 
duplicate what already exists in scientific Python packages.
But Ok, I guess this is not up for debate any more?

I'd like to make a specific comment on NormalDist.overlap.
The rest of NormalDist is very standard, but that method is an oddball.
My suggestion is to remove it or to improve the documentation.

Current docstring: 
https://github.com/python/cpython/blob/44f2c096804e8e3adc09400a59ef9c9ae843f339/Lib/statistics.py#L959-L991

And this docs example:
https://github.com/python/cpython/commit/318d537daabf2bd5f781255c7e25bfce260cf227#diff-d436928bc44b5d7c40a8047840f55d35R620-R629


> What percentage of men and women will have the same height in `two normally
distributed populations with known means and standard deviations
<http://www.usablestats.com/lessons/normal>`_?

50.3%

This statement doesn't make sense to me. No two people have the exact same 
height, I think the answer to this question should be 0%.

Using

n = 100_000; sum(m > w for m, w in zip(men.samples(n), women.samples(n))) / n

I see that for 82% of random (men, women) matches the man will be larger. 
That's another measure, but still, stating that 50% of men and women have the 
same height is confusing.

Note that there is a multitude of PDF overlap measures different from this 
min(pdf1, pdf2) that I think are much more common in statistics and the 
physical sciences:
- https://en.wikipedia.org/wiki/Hellinger_distance
- https://arxiv.org/pdf/1407.7172.pdf

And note that the references that are given currently are weird (basic 
statistics textbooks would be appropriate references IMO, or open references 
like Wikipedia)
- slides: 
http://www.iceaaonline.com/ready/wp-content/uploads/2014/06/MM-9-Presentation-Meet-the-Overlapping-Coefficient-A-Measure-for-Elevator-Speeches.pdf
- implementation code comment points to 
http://dx.doi.org/10.1080/03610928908830127 which is behind a paywall

Why add this one overlap measure and expose it under the "overlap" method name?

My suggestion would be to be conservative and to remove that method again, 
before releasing it in 3.8. A reference in the docs could be added to other 
existing third-party codes (e.g. scipy or the uncertainties package) with 
further functionality, such as being able to handle correlations or 
multi-dimensional distributions. For this change I'd be happy to send a PR any 
time.

Raymond and others interested in this topic - thoughts?

(note: I wrote a MultiNorm class prototype last year at 
https://github.com/cdeil/multinorm/blob/master/multinorm.py and now wanted to 
rewrite it and try to find a good API and thus was interested in this 
NormalDist class and what functionality it offers)

--
components: Library (Lib)
messages: 350076
nosy: Christoph.Deil, rhettinger
priority: normal
severity: normal
status: open
title: Remove NormalDist.overlap() or improve documentation?
type: enhancement
versions: Python 3.8

___
Python tracker 
<https://bugs.python.org/issue37905>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37905] Remove NormalDist.overlap() or improve documentation?

2019-08-21 Thread Christoph Deil


Christoph Deil  added the comment:

The Monte Carlo example here has completely unstable results:

https://github.com/python/cpython/commit/cc353a0cd95d9b0c93ed0b60ba762427a94c790d#diff-d436928bc44b5d7c40a8047840f55d35R633

If you run it multiple times, you will see that `mean` is relatively stable, 
but `stddev` varies from 10 to 50 to 100. The reason is that in the model 
there's a division by z, and the z distribution used has values arbitrarily 
close to zero:

>>> NormalDist(5, 1.25).cdf(0) * 100_000
3.16

Suggest to change to a MC sampling example that isn't as pathological, doesn't 
involve division by zero. E.g. change the mean of z to 50, or reduce the stddev 
to 0.125 or some such change in parameters.

Usually in stats or machine learning books and docs e.g. on statsmodels or 
scikit-learn etc., for methods where random numbers are involved, the seed is 
always set to a fixed value, to have reproducible results & docs. Suggest to 
make that change also here.

--

___
Python tracker 
<https://bugs.python.org/issue37905>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37905] Improve docs for NormalDist

2019-08-27 Thread Christoph Deil


Christoph Deil  added the comment:

Thank you, Raymond!

--

___
Python tracker 
<https://bugs.python.org/issue37905>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30882] Built-in list disappeared from Python 2.7 intersphinx inventory

2017-07-09 Thread Christoph Deil

New submission from Christoph Deil:

We have a project where we sub-class `list`. Since recently our docs build is 
failing because the intersphinx inventory entry for `list` on Python 2.7 
doesn't exist any more.

I think this is a regression, because Python 2.7 is supposed to be stable and 
other functions and classes here are still there, just "list" is missing:
https://docs.python.org/2.7/library/functions.html#func-list

Just in case someone else sees this issue, the Sphinx warning looks like this:
```
docs/api/pyregion.ShapeList.rst:7: WARNING: py:class reference target not 
found: list
```
if you have something like
```
class ShapeList(list):
"""My list sub-class"""
```

--
assignee: docs@python
components: Documentation
messages: 297993
nosy: Christoph.Deil, docs@python
priority: normal
severity: normal
status: open
title: Built-in list disappeared from Python 2.7 intersphinx inventory
versions: Python 2.7

___
Python tracker 
<http://bugs.python.org/issue30882>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16947] Search for "sherpa" on pypi leads to gitflow

2013-01-12 Thread Christoph Deil

New submission from Christoph Deil:

If you enter "sherpa" on http://pypi.python.org you currently get
http://pypi.python.org/pypi/gitflow/0.5.0

Why?
It doesn't make much sense as the term "sherpa" doesn't appear on that pypi 
page.
Instead pypi should say "not found", as the sherpa Python package
is not registered on pypi:
http://cxc.cfa.harvard.edu/contrib/sherpa/

--
components: None
messages: 179813
nosy: Christoph.Deil
priority: normal
severity: normal
status: open
title: Search for "sherpa" on pypi leads to gitflow
type: behavior

___
Python tracker 
<http://bugs.python.org/issue16947>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16947] Search for "sherpa" on pypi leads to gitflow

2013-01-12 Thread Christoph Deil

Christoph Deil added the comment:

Sorry about that. Ticket in PyPI tracer is now here:
http://sourceforge.net/tracker/?func=detail&aid=3600625&group_id=66150&atid=513503

--

___
Python tracker 
<http://bugs.python.org/issue16947>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com