1) Correction: The PR was not written with small arrays in mind. I ran some new timing tests, and it does perform worse on smaller arrays but appears to scale better than the current implementation.
2) Let me put it out there that I am not opposed to moving it to C, but right now, there seems to be a large technical brick wall up against such an implementation. So suggestions about how to move the code into C would be welcome too! On Sun, May 22, 2016 at 10:32 AM, Ralf Gommers <ralf.gomm...@gmail.com> wrote: > > > On Sun, May 22, 2016 at 3:05 AM, G Young <gfyoun...@gmail.com> wrote: > >> Hi, >> >> I have had a PR <https://github.com/numpy/numpy/pull/7177> open (first >> draft can be found here <https://github.com/numpy/numpy/pull/7138>) for >> quite some time now that adds an 'axis' parameter to *count_nonzero*. >> While the functionality is fully in-place, very robust, and actually >> higher-performing than the original *count_nonzero* function, the >> obstacle at this point is the implementation, as most of the functionality >> is now surfaced at the Python level instead of at the C level. >> >> I have made several attempts to move the code into C to no avail and have >> not received much feedback from maintainers unfortunately to move this >> forward, so I'm opening this up to the mailing list to see what you guys >> think of the changes and whether or not it should be merged in as is or be >> tabled until a more C-friendly solution can be found. >> > > The discussion is spread over several PRs/issues, so maybe a summary is > useful: > > - adding an axis parameter was a feature request that was generally > approved of [1] > - writing the axis selection/validation code in C, like the rest of > count_nonzero, was preferred by several core devs > - Writing that C code turns out to be tricky. Jaime had a PR for doing > this for bincount [2], but closed it with final conclusion "the proper > approach seems to me to build some intermediate layer over nditer that > abstracts the complexity away". > - Julian pointed out that this adds a ufunc-like param, so why not add > other params like out/keepdims [3] > - Stephan points out that the current PR has quite a few branches, would > benefit from reusing a helper function (like _validate_axis, but that may > not do exactly the right thing), and that he doesn't want to merge it as is > without further input from other devs [4]. > > Points previously not raised that I can think of: > - count_nonzero is also in the C API [5], the axis parameter is now only > added to the Python API. > - Part of why the code in this PR is complex is to keep performance for > small arrays OK, but there's no benchmarks added or result given for the > existing benchmark [6]. A simple check with: > x = np.arange(100) > %timeit np.count_nonzero(x) > shows that that gets about 30x slower (330 ns vs 10.5 us on my machine). > > It looks to me like performance is a concern, and if that can be resolved > there's the broader discussion of whether it's a good idea to merge this PR > at all. That's a trade-off of adding a useful feature vs. technical debt / > maintenance burden plus divergence Python/C API. Also, what do we do when > we merge this and then next week someone else sends a PR adding a keepdims > or out keyword? For these kinds of additions it would feel better if we > were sure that the new version is the final/desired one for the foreseeable > future. > > Ralf > > > [1] https://github.com/numpy/numpy/issues/391 > [2] https://github.com/numpy/numpy/pull/4330#issuecomment-77791250 > [3] https://github.com/numpy/numpy/pull/7138#issuecomment-177202894 > [4] https://github.com/numpy/numpy/pull/7177 > [5] > http://docs.scipy.org/doc/numpy/reference/c-api.array.html#c.PyArray_CountNonzero > [6] > https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_ufunc.py#L70 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion