[Numpy-discussion] 3-10x speedup in bincount()

Stephen Simmons Tue, 13 Mar 2007 20:52:28 -0800

Well, there were no responses to my earlier email proposing changes to 
numpy.bincount() to make it faster and more flexible. Does this mean 
noone is using bincount? :-)


Anyway I've now made the proposed modifications and got substantial 
speedups of 3-10. Details are in this extract from my C code. For the 
time being I will test this as a standalone extension module. When 
stable and tests are written, I'll submit as a patch to the numpy's 
_compiled_base.c.

Cheers,

Stephen


/************************************************
 * Faster versions of bincount and casting strings to integers
 *
 * Author:  Stephen Simmons, [EMAIL PROTECTED]
 * Date:    11 March 2007
 *
 * This module contains C code for functions I am using to accelerate
 * SQL-like aggregate functions for a column-oriented database based on 
numpy.
 *
 * subtotal's bincount is typically 3-10 times faster than numpy's 
bincount,
 * and more flexible
 *  - takes bins as they are, without promoting their type to int32
 *  - takes weights as they are, without promoting their type to double
 *  - ignores negative bins, rather than reporting an error
 *  - allows optional int32/double output array to be passed in
 *  - specify maximum bin number to use. Larger bins numbers are ignored
 *  - only scans for max bin number if neither max_bin nor out array 
specified
 *
 * atoi is 30-60 times faster than casting a string array to an integer 
type,
 * and may optionally use a translation table. The translation table is
 * a convenient way to map sparse integer strings to adjacent bins before
 * using bincount.
 *
 * Typical usage
 * -------------

# Set up arrays 5,000,000 elts long. s1 has strings filled with '1' and '2'.
# w are weights
 >>> s1 = numpy.ndarray(shape=5000000, dtype='S1'); s1[:]='2'; s1[1::2]='1'
 >>> w = numpy.arange(5000000,dtype='f4')

# Using standard numpy functions, string conversion is slow
 >>> i = s1.astype('i1')                -> 5.95 sec (!)
 >>> numpy.bincount(i)                  -> 113 ms
 >>> numpy.bincount(i, w)               -> 272 ms
 >>> numpy.bincount(s1.astype('i1'), w) -> 6.12 sec

# Using the faster functions here:
 >>> i = subtotal.atoi(s1)              -> 90 ms (60x faster)
 >>> subtotal.bincount(i, max_bin=2)    -> 31 ms (3.6x faster)
 >>> subtotal.bincount(i, w, max_bin=2) -> 51 ms (5.3x faster)

# In both bases, output is
array([2, 1, 2, ..., 1, 2, 1], dtype=int8)
array([      0, 2500000, 2500000])
array([  0.00000000e+00,   6.25000000e+12,   6.24999750e+12])

# As an example of using a translate table, run bincount from atoi applying
# a translate table for counting odd vs even strings. This does the 
whole lot
# in 158ms, a speedup of 38x from 6.12 s above.
 >>> t = numpy.arange(256, dtype='i1') % 2
 >>> subtotal.bincount(subtotal.atoi(s1, t), w, max_bin=t.max())     -> 
158 ms
array([  6.24999750e+12,   6.25000000e+12])

 *
 * These timings are based on numpy-1.0.1 Windows binary distribution
 * for Python 2.5, compared with building this code using standard distutils
 * without any particular compiler optimisations:
C:\mnt\dev\subtotal> python setup.py build -cmingw32
 * Processor is a 1.83GHz Pentium-M
 ****************************************************/

<< rest of C code omitted >>


Stephen Simmons wrote:
> Hi,
>
> I'd like to propose some minor modifications to the function 
> bincount(arr, weights=None), so would like some feedback from other 
> uses of bincount() before I write this up as a proper patch, .
>
> Background:
> bincount() has two forms:
> - bincount(x) returns an integer array ians of length max(x)+1  where 
> ians[n] is the number of times n appears in x.
> - bincount(x, weights) returns a double array dans of length max(x)+1 
> where dans[n] is the sum of elements in the weight vector weights[i] 
> at the positions where x[i]==n
> In both cases, all elements of x must be non-negative.
>
> Proposed changes:
> (1) Remove the restriction that elements of x must be non-negative.
> Currently bincount() starts by finding max(x) and min(x). If the min 
> value is negative, an exception is raised.  This change proposes 
> dropping the initial search for min(x), and instead testing for 
> non-negativity while summing values in the return arrays ians or dans. 
> Any indexes where where x is negative will be silently ignored. This 
> will allow selective bincounts where values to ignore are flagged with 
> a negative bin number.
>
> (2) Allow an optional argument for maximum bin number.
> Currently bincount(x) returns an array whose length is dependent on 
> max(x). It is sometimes preferable to specify the exact size of the 
> returned array, so this change would add a new optional argument, 
> max_bin, which is one less than the size of the returned array. Under 
> this change, bincount() starts by finding max(x) only if max_bin is 
> not specified. Then the returned array ians or dans is created with 
> length max_bin+1, and any indexes that would overflow the output array 
> are silently ignored.
>
> (3) Allow an optional output array, y.
> Currently bincount() creates a new output array each time. Sometimes 
> it is preferable to add results to an existing output array, for 
> example, when the input array is only available in smaller chunks, or 
> for a progressive update strategy to avoid fp precision problems when 
> adding lots of small weights to larger subtotals. Thus we can add an 
> extra optional argument y that bypasses the creation of an output array.
>
> With these three change, the function signature of bincount() would 
> become:
> bincount(x, weights=None, y=None, max_bin=None)
>
>
> Anyway, that's the general idea. I'd be grateful for any feedback 
> before I code this up as a patch to _compiled_base.c.
>
> Cheers
>
> Stephen
>
>

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] 3-10x speedup in bincount()

Reply via email to