Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-18 Thread Sturla Molden
Jaime Fernández del Río wrote: > I think we have an explicit rule against C++, although I may be wrong. Currently there is Python, C and Cython in NumPy. SciPy also has C++ and Fortran code. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Neil Girdhar
Cool, thanks for looking at this. P2 might still be better even if the whole dataset is in memory because of cache misses. Partition, which I guess is based on quickselect, is going to run over all of the data as many times as there are bins roughly, whereas p2 only runs over it once. From a cac

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Jaime Fernández del Río
On Wed, Apr 15, 2015 at 9:14 AM, Eric Moore wrote: > This blog post, and the links within also seem relevant. Appears to have > python code available to try things out as well. > > > https://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Jaime Fernández del Río
On Wed, Apr 15, 2015 at 8:06 AM, Neil Girdhar wrote: > You got it. I remember this from when I worked at Google and we would > process (many many) logs. With enough bins, the approximation is still > really close. It's great if you want to make an automatic plot of data. > Calling numpy.partit

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Eric Moore
This blog post, and the links within also seem relevant. Appears to have python code available to try things out as well. https://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest -Eric On Wed, Apr 15, 2015 at 11:24 AM, Benjamin Root wrot

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Benjamin Root
"Then you can set about convincing matplotlib and friends to use it by default" Just to note, this proposal was originally made over in the matplotlib project. We sent it over here where its benefits would have wider reach. Matplotlib's plan is not to change the defaults, but to offload as much as

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Neil Girdhar
You got it. I remember this from when I worked at Google and we would process (many many) logs. With enough bins, the approximation is still really close. It's great if you want to make an automatic plot of data. Calling numpy.partition a hundred times is probably slower than calling P^2 with n=

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Jaime Fernández del Río
On Wed, Apr 15, 2015 at 4:36 AM, Neil Girdhar wrote: > Yeah, I'm not arguing, I'm just curious about your reasoning. That > explains why not C++. Why would you want to do this in C and not Python? > Well, the algorithm has to iterate over all the inputs, updating the estimated percentile posit

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Neil Girdhar
Yeah, I'm not arguing, I'm just curious about your reasoning. That explains why not C++. Why would you want to do this in C and not Python? On Wed, Apr 15, 2015 at 1:48 AM, Jaime Fernández del Río < jaime.f...@gmail.com> wrote: > On Tue, Apr 14, 2015 at 6:16 PM, Neil Girdhar > wrote: > >> If y

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Jaime Fernández del Río
On Tue, Apr 14, 2015 at 6:16 PM, Neil Girdhar wrote: > If you're going to C, is there a reason not to go to C++ and include the > already-written Boost code? Otherwise, why not use Python? > I think we have an explicit rule against C++, although I may be wrong. Not sure how much of boost we wou

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Neil Girdhar
By the way, the p^2 algorithm still needs to know how many bins you want. It just adapts the endpoints of the bins. I like adaptive=True. However, you will have to find a way to return both the bins and and their calculated endpoints. The P^2 algorithm can also give approximate answers to numpy.

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Paul Hobson
On Tue, Apr 14, 2015 at 4:24 PM, Jaime Fernández del Río < jaime.f...@gmail.com> wrote: > On Tue, Apr 14, 2015 at 4:12 PM, Nathaniel Smith wrote: > >> On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar >> wrote: >> > Can I suggest that we instead add the P-square algorithm for the dynamic >> > calcul

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Neil Girdhar
If you're going to C, is there a reason not to go to C++ and include the already-written Boost code? Otherwise, why not use Python? On Tue, Apr 14, 2015 at 7:24 PM, Jaime Fernández del Río < jaime.f...@gmail.com> wrote: > On Tue, Apr 14, 2015 at 4:12 PM, Nathaniel Smith wrote: > >> On Mon, Apr

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Jaime Fernández del Río
On Tue, Apr 14, 2015 at 4:12 PM, Nathaniel Smith wrote: > On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar > wrote: > > Can I suggest that we instead add the P-square algorithm for the dynamic > > calculation of histograms? > > ( > http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Nathaniel Smith
On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar wrote: > Can I suggest that we instead add the P-square algorithm for the dynamic > calculation of histograms? > (http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf) > > This i

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Neil Girdhar
Yes, you're right. Although in practice, people almost always want adaptive bins. On Tue, Apr 14, 2015 at 5:08 PM, Chris Barker wrote: > On Mon, Apr 13, 2015 at 5:02 AM, Neil Girdhar > wrote: > >> Can I suggest that we instead add the P-square algorithm for the dynamic >> calculation of histog

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Chris Barker
On Mon, Apr 13, 2015 at 5:02 AM, Neil Girdhar wrote: > Can I suggest that we instead add the P-square algorithm for the dynamic > calculation of histograms? ( > http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf > ) > T

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Antony Lee
Another improvement would be to make sure, for integer-valued datasets, that all bins cover the same number of integer, as it is easy to end up otherwise with bins "effectively" wider than others: hist(np.random.randint(11, size=1)) shows a peak in the last bin, as it covers both 9 and 10. A

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Neil Girdhar
Can I suggest that we instead add the P-square algorithm for the dynamic calculation of histograms? ( http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf ) This is already implemented in C++'s boost library ( http://www.bo

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-12 Thread Ralf Gommers
On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fernández del Río < jaime.f...@gmail.com> wrote: > On Sun, Apr 12, 2015 at 12:19 AM, Varun wrote: > >> >> http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta >> tistics/A >>

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-12 Thread Varun
Using a URL shortener for the notebook to get around the 80 char width limit http://goo.gl/JmfTRJ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-12 Thread Jaime Fernández del Río
On Sun, Apr 12, 2015 at 12:19 AM, Varun wrote: > > http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta > tistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb > > Long story short, histogram visualisations that depend on numpy (such as > matplotlib, or nearly

[Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-12 Thread Varun
http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta tistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb Long story short, histogram visualisations that depend on numpy (such as matplotlib, or nearly all of them) have poor default behaviour as I have to const