Hi, I'd like to propose some minor modifications to the function bincount(arr, weights=None), so would like some feedback from other uses of bincount() before I write this up as a proper patch, .
Background: bincount() has two forms: - bincount(x) returns an integer array ians of length max(x)+1 where ians[n] is the number of times n appears in x. - bincount(x, weights) returns a double array dans of length max(x)+1 where dans[n] is the sum of elements in the weight vector weights[i] at the positions where x[i]==n In both cases, all elements of x must be non-negative. Proposed changes: (1) Remove the restriction that elements of x must be non-negative. Currently bincount() starts by finding max(x) and min(x). If the min value is negative, an exception is raised. This change proposes dropping the initial search for min(x), and instead testing for non-negativity while summing values in the return arrays ians or dans. Any indexes where where x is negative will be silently ignored. This will allow selective bincounts where values to ignore are flagged with a negative bin number. (2) Allow an optional argument for maximum bin number. Currently bincount(x) returns an array whose length is dependent on max(x). It is sometimes preferable to specify the exact size of the returned array, so this change would add a new optional argument, max_bin, which is one less than the size of the returned array. Under this change, bincount() starts by finding max(x) only if max_bin is not specified. Then the returned array ians or dans is created with length max_bin+1, and any indexes that would overflow the output array are silently ignored. (3) Allow an optional output array, y. Currently bincount() creates a new output array each time. Sometimes it is preferable to add results to an existing output array, for example, when the input array is only available in smaller chunks, or for a progressive update strategy to avoid fp precision problems when adding lots of small weights to larger subtotals. Thus we can add an extra optional argument y that bypasses the creation of an output array. With these three change, the function signature of bincount() would become: bincount(x, weights=None, y=None, max_bin=None) Anyway, that's the general idea. I'd be grateful for any feedback before I code this up as a patch to _compiled_base.c. Cheers Stephen _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion