Hi Catherine, I can't reproduce your issue with bins_list vs. bins_arange, but passing both range and number of bins to np.histogram does give the same strange behavior for me:
In [16]: data = np.array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.05, -0.05]) In [17]: bins_list = np.array([-0.1, -0.05, 0.0, 0.05, 0.1]) In [18]: np.histogram(data, bins=bins_list) Out[18]: (array([ 0, 1, 10, 1]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ])) In [19]: bins_arange = np.arange(-0.1, 0.101, 0.05) In [20]: np.histogram(data, bins=bins_arange) Out[20]: (array([ 0, 1, 10, 1]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ])) In [21]: np.histogram(data, range=(-0.1, 0.1), bins=4) Out[21]: (array([ 0, 1, 11, 0]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ])) In [22]: np.version.version Out[22]: '1.8.1' Looks like the 0.05 value of data is being binned differently in the last case, but I'm not sure why either... Mark On Wed, Jul 2, 2014 at 2:05 AM, Chris Barker <chris.bar...@noaa.gov> wrote: > A few thoughts: > > 1) don't use arange() for flaoting point numbers, use linspace(). > > 2) histogram1d is a floating point function, and you shouldn't expect > exact results for floating point -- in particular, values exactly at the > bin boundaries are likely to be "uncertain" -- not quite the right word, > but you get the idea. > > 3) if you expect have a lot of certain specific values, say, integers, or > zeros -- then you don't want your bin boundaries to be exactly at the value > -- they should be between the expected values. > > 4) remember that histogramming is inherently sensitive to bin position > anyway -- if these small bin-boundary differences matter, than you may not > be using teh best approach. > > -HTH, > -Chris > > > > > > >> >>> data >> array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , >> 0. , 0.05, -0.05]) >> >>> bins_list = numpy.array([-0.1, -0.05, 0.0, 0.05, 0.1]) >> >>> (counts, edges) = numpy.histogram(data, bins=bins_list) >> >>> counts >> array([ 0, 1, 10, 1]) >> >>> edges >> array([-0.1 , -0.05, 0. , 0.05, 0.1 ]) >> >> >> >> but this does not (generating the bin values via bumpy.arange): >> >> >>> bins_arange = numpy.arange(-0.1, 0.101, 0.05) >> >>> data >> array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , >> 0. , 0.05, -0.05]) >> >>> bins_arange >> array([-0.1 , -0.05, 0. , 0.05, 0.1 ]) >> >>> (counts, edges) = numpy.histogram(data, bins=bins_arange) >> >>> counts >> array([ 0, 1, 11, 0]) >> >> I'm assuming this is due to slight rounding in the calculation of >> bins_arange, >> as compared to the manually entered values in bins_list. >> >> What is the recommended way of getting the first set of results, without >> having to manually enter all the values in the "bins" argument? >> >> The following also gives me unexpected results: >> >> >>> data >> array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , >> 0. , 0.05, -0.05]) >> counts, edges) = numpy.histogram(data, range=(-0.1, 0.1), bins=4) >> >>> counts >> array([ 0, 1, 11, 0]) >> >> >> >> Thank you for any advice, >> >> Catherine >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion