Re: [Numpy-discussion] ticket #605

2008-04-09 Thread Timothy Hochberg
On Wed, Apr 9, 2008 at 7:01 AM, David Huard <[EMAIL PROTECTED]> wrote: > Hello Jarrod and co., > > here is my personal version of the histogram saga. > > The current version of histogram puts in the rightmost bin all values > larger than range, but does not put in the leftmost bin all values small

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-09 Thread Bruce Southey
Hi, I should have asked first (I hope that you don't mind), but I created a ticket Ticket #728 (http://scipy.org/scipy/numpy/ticket/728 ) for numpy.r_ because this incorrectly casts based on the array types. The bug is that -inf and inf are numpy floats but dbin is an array of ints. Unfortunate

Re: [Numpy-discussion] ticket #605

2008-04-09 Thread David Huard
Hello Jarrod and co., here is my personal version of the histogram saga. The current version of histogram puts in the rightmost bin all values larger than range, but does not put in the leftmost bin all values smaller than bin, eg. In [6]: histogram([1,2,3,4,5,6], bins=3, range=[2,5]) Out[6]: (a

Re: [Numpy-discussion] ticket #605

2008-04-09 Thread Bruce Southey
Jarrod Millman wrote: > Hello, > > I just turned this one into a blocker for now. There has been a very > long and good discussion about this ticket: > http://projects.scipy.org/scipy/numpy/ticket/605 > > Could someone (David?, Bruce?) briefly summarize the problem and the > current proposed solut

[Numpy-discussion] ticket #605

2008-04-09 Thread Jarrod Millman
Hello, I just turned this one into a blocker for now. There has been a very long and good discussion about this ticket: http://projects.scipy.org/scipy/numpy/ticket/605 Could someone (David?, Bruce?) briefly summarize the problem and the current proposed solution for us again? Let's agree on th

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-08 Thread David Huard
2008/4/8, Bruce Southey <[EMAIL PROTECTED]>: > > Hi, > I agree that the current histogram should be changed. However, I am not > sure 1.0.5 is the correct release for that. We both agree. David, this doesn't work for your code: > r= np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]) > dbin=[2,3,4] > rc,

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-08 Thread Bruce Southey
Hi, I agree that the current histogram should be changed. However, I am not sure 1.0.5 is the correct release for that. David, this doesn't work for your code: r= np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]) dbin=[2,3,4] rc, rb=histogram(r, bins=dbin, discard=None) Returns: rc=[3 3] # Really should

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-08 Thread David Huard
Hans, Note that the current histogram is buggy, in the sense that it assumes that all bins have the same width and computes db = bins[1]-bin[0]. This is why you get zeros everywhere. The current behavior has been heavily criticized and I think we should change it. My proposal is to have for histo

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-08 Thread Hans Meine
Am Montag, 07. April 2008 14:34:08 schrieb Hans Meine: > Am Samstag, 05. April 2008 21:54:27 schrieb Anne Archibald: > > There's also a fourth option - raise an exception if any points are > > outside the range. > > +1 > > I think this should be the default. Otherwise, I tend towards "exclude", >

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-07 Thread David Huard
> On Apr 7, 2008, at 4:14 PM, LB wrote: > > +1 for axis and +1 for a keyword to define what to do with values > > outside the range. > > > > For the keyword, ather than 'outliers', I would propose 'discard' or > > 'exclude', because it could be used to describe the four > > possibilities : > > - d

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-07 Thread Tommy Grav
On Apr 7, 2008, at 4:14 PM, LB wrote: > +1 for axis and +1 for a keyword to define what to do with values > outside the range. > > For the keyword, ather than 'outliers', I would propose 'discard' or > 'exclude', because it could be used to describe the four > possibilities : > - discard='low'

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-07 Thread LB
+1 for axis and +1 for a keyword to define what to do with values outside the range. For the keyword, ather than 'outliers', I would propose 'discard' or 'exclude', because it could be used to describe the four possibilities : - discard='low' => values lower than the range are discarded, va

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-07 Thread Bruce Southey
Hi, Thanks David for pointing the piece of information I forgot to add in my original email. -1 for 'raise an exception' because, as Dan points out, the problem stems from user providing bins. +1 for the outliers keyword. Should 'exclude' distinguish points that are too low and those that are too

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-07 Thread David Huard
+1 for an outlier keyword. Note, that this implies that when bins are passed explicitly, the edges are given (nbins+1), not simply the left edges (nbins). While we are refactoring histogram, I'd suggest adding an axis keyword. This is pretty straightforward to implement using the np.apply_along_ax

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-07 Thread Hans Meine
Am Samstag, 05. April 2008 21:54:27 schrieb Anne Archibald: > There's also a fourth option - raise an exception if any points are > outside the range. +1 I think this should be the default. Otherwise, I tend towards "exclude", in order to have comparable bin sizes (when plotting, I always find

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-06 Thread Tommy Grav
On Apr 5, 2008, at 2:01 PM, Bruce Southey wrote: > Hi, > I have been investigating Ticket #605 'Incorrect behavior of > numpy.histogram' (http://scipy.org/scipy/numpy/ticket/605 ). I think that my preference depends on the definition of what the bin number means. If the bin numbers are the lower

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-05 Thread Anne Archibald
On 05/04/2008, Bruce Southey <[EMAIL PROTECTED]> wrote: > 1) Should the first bin contain all values less than or equal to the > value of the first limit and the last bin contain all values greater > than the value of the last limit? > This produced the counts as: array([3, 3, 9]) (I termed th

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-05 Thread James Philbin
The matlab behaviour is to extend the first bin to include all data down to -inf and extend the last bin to handle all data to inf. This is probably the behaviour with least suprise. Therefor, I would vote +1 for behaviour #1 by default, +1 for keeping the old behaviour #2 around as an option and

[Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-05 Thread Bruce Southey
Hi, I have been investigating Ticket #605 'Incorrect behavior of numpy.histogram' (http://scipy.org/scipy/numpy/ticket/605 ). The fix for this ticket really depends on what the expectations are for the bin limits and different applications have different behavior. Consequently, I think that feedba