Would it make sense to at all to bring that optimization to np.sum()? I know that I have np.sum() all over the place instead of count_nonzero, partly because it is a MatLab-ism and partly because it is easier to write. I had no clue that there was a performance difference.
Cheers! Ben Root On Thu, Dec 17, 2015 at 1:37 PM, CJ Carey <perimosocord...@gmail.com> wrote: > I believe this line is the reason: > > https://github.com/numpy/numpy/blob/c0e48cfbbdef9cca954b0c4edd0052e1ec8a30aa/numpy/core/src/multiarray/item_selection.c#L2110 > > On Thu, Dec 17, 2015 at 11:52 AM, Raghav R V <rag...@gmail.com> wrote: > >> I was just playing with `count_nonzero` and found it to be significantly >> faster for boolean arrays compared to integer arrays >> >> >> >>> a = np.random.randint(0, 2, (100, 5)) >> >>> a_bool = a.astype(bool) >> >> >>> %timeit np.sum(a) >> 100000 loops, best of 3: 5.64 µs per loop >> >> >>> %timeit np.count_nonzero(a) >> 1000000 loops, best of 3: 1.42 us per loop >> >> >>> %timeit np.count_nonzero(a_bool) >> 1000000 loops, best of 3: 279 ns per loop (but why?) >> >> I tried looking into the code and dug my way through to this line >> <https://github.com/numpy/numpy/blob/c0e48cfbbdef9cca954b0c4edd0052e1ec8a30aa/numpy/core/src/multiarray/item_selection.c#L2172>. >> I am unable to dig further. >> >> I know this is probably a trivial question, but was wondering if anyone >> could provide insight on why this is so? >> >> Thanks >> >> R >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion