Re: [Numpy-discussion] numpy.histogram not giving expected results

Mark Szepieniec Wed, 02 Jul 2014 07:58:24 -0700

Looks this could be a float32 vs float64 problem:

In [19]: data32 = np.array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.05,
-0.05], dtype=np.float32)
In [20]: data64 = np.array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.05,
-0.05], dtype=np.float64)
In [21]: bins32 = np.arange(-0.1, 0.101, 0.05, dtype=np.float32)
In [22]: bins64 = np.arange(-0.1, 0.101, 0.05, dtype=np.float64)


In [23]: np.histogram(data32, bins32)
Out[23]:
(array([ 0,  1, 10,  1]), array([-0.1 , -0.05,  0.  ,  0.05,  0.1 ],
dtype=float32))

In [24]: np.histogram(data32, bins64)
Out[24]: (array([ 1,  0, 10,  1]), array([-0.1 , -0.05,  0.  ,  0.05,  0.1
]))

In [25]: np.histogram(data64, bins32)
Out[25]:
(array([ 0,  1, 11,  0]), array([-0.1 , -0.05,  0.  ,  0.05,  0.1 ],
dtype=float32))

In [26]: np.histogram(data64, bins64)
Out[26]: (array([ 0,  1, 10,  1]), array([-0.1 , -0.05,  0.  ,  0.05,  0.1
]))


I guess users always be very careful when mixing floating point types, but
should numpy prevent (or warn) the user from doing so in this case?



On Wed, Jul 2, 2014 at 10:07 AM, Mark Szepieniec <mszep...@gmail.com> wrote:

> Hi Catherine,
>
> I can't reproduce your issue with bins_list vs. bins_arange, but passing
> both range and number of bins to np.histogram does give the same strange
> behavior for me:
>
> In [16]: data = np.array([ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
>  0.  ,  0.  ,
>         0.  ,  0.05, -0.05])
>
> In [17]: bins_list = np.array([-0.1, -0.05, 0.0, 0.05, 0.1])
>
> In [18]: np.histogram(data, bins=bins_list)
> Out[18]: (array([ 0,  1, 10,  1]), array([-0.1 , -0.05,  0.  ,  0.05,  0.1
> ]))
>
> In [19]: bins_arange = np.arange(-0.1, 0.101, 0.05)
>
> In [20]: np.histogram(data, bins=bins_arange)
> Out[20]: (array([ 0,  1, 10,  1]), array([-0.1 , -0.05,  0.  ,  0.05,  0.1
> ]))
>
> In [21]: np.histogram(data, range=(-0.1, 0.1), bins=4)
> Out[21]: (array([ 0,  1, 11,  0]), array([-0.1 , -0.05,  0.  ,  0.05,  0.1
> ]))
>
> In [22]: np.version.version
> Out[22]: '1.8.1'
>
> Looks like the 0.05 value of data is being binned differently in the last
> case, but I'm not sure why either...
>
> Mark
>
>
> On Wed, Jul 2, 2014 at 2:05 AM, Chris Barker <chris.bar...@noaa.gov>
> wrote:
>
>> A few thoughts:
>>
>> 1) don't use arange() for flaoting point numbers, use linspace().
>>
>> 2) histogram1d is a floating point function, and you shouldn't expect
>> exact results for floating point -- in particular, values exactly at the
>> bin boundaries are likely to be "uncertain" -- not quite the right word,
>> but you get the idea.
>>
>> 3) if you expect have a lot of certain specific values, say, integers, or
>> zeros -- then you don't want your bin boundaries to be exactly at the value
>> -- they should be between the expected values.
>>
>> 4) remember that histogramming is inherently sensitive to bin position
>> anyway -- if these small bin-boundary differences matter, than you may not
>> be using teh best approach.
>>
>> -HTH,
>>   -Chris
>>
>>
>>
>>
>>
>>
>>> >>> data
>>> array([ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
>>>         0.  ,  0.05, -0.05])
>>> >>> bins_list = numpy.array([-0.1, -0.05, 0.0, 0.05, 0.1])
>>> >>> (counts, edges) = numpy.histogram(data, bins=bins_list)
>>> >>> counts
>>> array([ 0,  1, 10,  1])
>>> >>> edges
>>> array([-0.1 , -0.05,  0.  ,  0.05,  0.1 ])
>>>
>>>
>>>
>>> but this does not (generating the bin values via bumpy.arange):
>>>
>>> >>> bins_arange = numpy.arange(-0.1, 0.101, 0.05)
>>> >>> data
>>> array([ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
>>>         0.  ,  0.05, -0.05])
>>> >>> bins_arange
>>> array([-0.1 , -0.05,  0.  ,  0.05,  0.1 ])
>>> >>> (counts, edges) = numpy.histogram(data, bins=bins_arange)
>>> >>> counts
>>> array([ 0,  1, 11,  0])
>>>
>>> I'm assuming this is due to slight rounding in the calculation of
>>> bins_arange,
>>> as compared to the manually entered values in bins_list.
>>>
>>> What is the recommended way of getting the first set of results, without
>>> having to manually enter all the values in the "bins" argument?
>>>
>>> The following also gives me unexpected results:
>>>
>>> >>> data
>>> array([ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
>>>         0.  ,  0.05, -0.05])
>>> counts, edges) = numpy.histogram(data, range=(-0.1, 0.1), bins=4)
>>> >>> counts
>>> array([ 0,  1, 11,  0])
>>>
>>>
>>>
>>> Thank you for any advice,
>>>
>>> Catherine
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R            (206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115       (206) 526-6317   main reception
>>
>> chris.bar...@noaa.gov
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.histogram not giving expected results

Reply via email to