Hi Catherine,

I can't reproduce your issue with bins_list vs. bins_arange, but passing
both range and number of bins to np.histogram does give the same strange
behavior for me:

In [16]: data = np.array([ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
 0.  ,  0.  ,
        0.  ,  0.05, -0.05])

In [17]: bins_list = np.array([-0.1, -0.05, 0.0, 0.05, 0.1])

In [18]: np.histogram(data, bins=bins_list)
Out[18]: (array([ 0,  1, 10,  1]), array([-0.1 , -0.05,  0.  ,  0.05,  0.1
]))

In [19]: bins_arange = np.arange(-0.1, 0.101, 0.05)

In [20]: np.histogram(data, bins=bins_arange)
Out[20]: (array([ 0,  1, 10,  1]), array([-0.1 , -0.05,  0.  ,  0.05,  0.1
]))

In [21]: np.histogram(data, range=(-0.1, 0.1), bins=4)
Out[21]: (array([ 0,  1, 11,  0]), array([-0.1 , -0.05,  0.  ,  0.05,  0.1
]))

In [22]: np.version.version
Out[22]: '1.8.1'

Looks like the 0.05 value of data is being binned differently in the last
case, but I'm not sure why either...

Mark


On Wed, Jul 2, 2014 at 2:05 AM, Chris Barker <chris.bar...@noaa.gov> wrote:

> A few thoughts:
>
> 1) don't use arange() for flaoting point numbers, use linspace().
>
> 2) histogram1d is a floating point function, and you shouldn't expect
> exact results for floating point -- in particular, values exactly at the
> bin boundaries are likely to be "uncertain" -- not quite the right word,
> but you get the idea.
>
> 3) if you expect have a lot of certain specific values, say, integers, or
> zeros -- then you don't want your bin boundaries to be exactly at the value
> -- they should be between the expected values.
>
> 4) remember that histogramming is inherently sensitive to bin position
> anyway -- if these small bin-boundary differences matter, than you may not
> be using teh best approach.
>
> -HTH,
>   -Chris
>
>
>
>
>
>
>> >>> data
>> array([ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
>>         0.  ,  0.05, -0.05])
>> >>> bins_list = numpy.array([-0.1, -0.05, 0.0, 0.05, 0.1])
>> >>> (counts, edges) = numpy.histogram(data, bins=bins_list)
>> >>> counts
>> array([ 0,  1, 10,  1])
>> >>> edges
>> array([-0.1 , -0.05,  0.  ,  0.05,  0.1 ])
>>
>>
>>
>> but this does not (generating the bin values via bumpy.arange):
>>
>> >>> bins_arange = numpy.arange(-0.1, 0.101, 0.05)
>> >>> data
>> array([ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
>>         0.  ,  0.05, -0.05])
>> >>> bins_arange
>> array([-0.1 , -0.05,  0.  ,  0.05,  0.1 ])
>> >>> (counts, edges) = numpy.histogram(data, bins=bins_arange)
>> >>> counts
>> array([ 0,  1, 11,  0])
>>
>> I'm assuming this is due to slight rounding in the calculation of
>> bins_arange,
>> as compared to the manually entered values in bins_list.
>>
>> What is the recommended way of getting the first set of results, without
>> having to manually enter all the values in the "bins" argument?
>>
>> The following also gives me unexpected results:
>>
>> >>> data
>> array([ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
>>         0.  ,  0.05, -0.05])
>> counts, edges) = numpy.histogram(data, range=(-0.1, 0.1), bins=4)
>> >>> counts
>> array([ 0,  1, 11,  0])
>>
>>
>>
>> Thank you for any advice,
>>
>> Catherine
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to