If you add a layer of indirection with Numba you can get a *very* nice API:
@numba.njit
def _first(arr, pred):
for i, elem in enumerate(arr):
if pred(elem):
return i
def first(arr, pred):
_pred = numba.njit(pred)
return _first(arr, _pred)
This even works with lambdas! (TIL, thanks Numba devs!)
>>> first(np.random.random(10_000_000), lambda x: x > 0.99)
215
Since Numba has ufunc support I don't suppose it would be hard to make it work
with an axis= argument, but I've never played with that API myself.
On Tue, 31 Oct 2023, at 6:49 PM, Lev Maximov wrote:
> I've implemented such functions in Cython and packaged them into a library
> called numpy_illustrated <https://pypi.org/project/numpy-illustrated/>
>
> It exposes the following functions:
>
> find(a, v) # returns the index of the first occurrence of v in a
> first_above(a, v) # returns the index of the first element in a that is
> strictly above v
> first_nonzero(a) # returns the index of the first nonzero element
>
> They scan the array and bail out immediately once the match is found. Have a
> significant performance gain if the element to be
> found is closer to the beginning of the array. Have roughly the same speed as
> alternative methods if the value is missing.
>
> The complete signatures of the functions look like this:
>
> find(a, v, rtol=1e-05, atol=1e-08, sorted=False, default=-1, raises=False)
> first_above(a, v, sorted=False, missing=-1, raises=False)
> first_nonzero(a, missing=-1, raises=False)
>
> This covers the most common use cases and does not accept Python callbacks
> because accepting them would nullify any speed gain
> one would expect from such a function. A Python callback can be implemented
> with Numba, but anyone who can write the callback
> in Numba has no need for a library that wraps it into a dedicated function.
>
> The library has a 100% test coverage. Code style 'black'. It should be easy
> to add functions like 'first_below' if necessary.
>
> A more detailed description of these functions can be found here
> <https://betterprogramming.pub/the-numpy-illustrated-library-7531a7c43ffb?sk=8dd60bfafd6d49231ac76cb148a4d16f>.
>
> Best regards,
> Lev Maximov
>
> On Tue, Oct 31, 2023 at 3:50 AM Dom Grigonis <[email protected]> wrote:
>> I juggled a bit and found pretty nice solution using numba. Which is
>> probably not very robust, but proves that such thing can be optimised while
>> retaining flexibility. Check if it works for your use cases and let me know
>> if anything fails or if it is slow compared to what you used.
>>
>>
>>
>> first_true_str = """
>> def first_true(arr, n):
>> result = np.full((n, arr.shape[1]), -1, dtype=np.int32)
>> for j in range(arr.shape[1]):
>> k = 0
>> for i in range(arr.shape[0]):
>> x = arr[i:i + 1, j]
>> if cond(x):
>> result[k, j] = i
>> k += 1
>> if k >= n:
>> break
>> return result
>> """
>>
>>
>> *class* *FirstTrue*:
>> CONTEXT = {'np': np}
>>
>> *def* __init__(self, expr):
>> self.expr = expr
>> self.expr_ast = ast.parse(expr, mode='exec').body[0].value
>> self.func_ast = ast.parse(first_true_str, mode='exec')
>> self.func_ast.body[0].body[1].body[1].body[1].test = self.expr_ast
>> self.func_cmp = compile(self.func_ast, filename="<ast>", mode="exec")
>> *exec*(self.func_cmp, self.CONTEXT)
>> self.func_nb = nb.njit(self.CONTEXT[self.func_ast.body[0].name])
>>
>> *def* __call__(self, arr, n=1, axis=None):
>> *# PREPARE INPUTS*
>> in_1d = False
>> *if* axis *is* None:
>> arr = np.ravel(arr)[:, None]
>> in_1d = True
>> *elif* axis == 0:
>> *if* arr.ndim == 1:
>> in_1d = True
>> arr = arr[:, None]
>> *else*:
>> *raise* *ValueError*('axis ~in (None, 0)')
>> res = self.func_nb(arr, n)
>> *if* in_1d:
>> res = res[:, 0]
>> *return* res
>>
>>
>> *if* __name__ == '__main__':
>> arr = np.arange(125).reshape((5, 5, 5))
>> ft = FirstTrue('np.sum(x) > 30')
>> *print*(ft(arr, n=2, axis=0))
>>
>> [[1 0 0 0 0]
>> [2 1 1 1 1]]
>>
>>
>> In [16]: %timeit ft(arr, 2, axis=0)
>> 1.31 µs ± 3.94 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>
>> Regards,
>> DG
>>
>>> On 29 Oct 2023, at 23:18, rosko37 <[email protected]> wrote:
>>>
>>> An example with a 1-D array (where it is easiest to see what I mean) is the
>>> following. I will follow Dom Grigonis's suggestion that the range not be
>>> provided as a separate argument, as it can be just as easily "folded into"
>>> the array by passing a slice. So it becomes just:
>>> idx = first_true(arr, cond)
>>>
>>> As Dom also points out, the "cond" would likely need to be a "function
>>> pointer" (i.e., the name of a function defined elsewhere, turning
>>> first_true into a higher-order function), unless there's some way to pass a
>>> parseable expression for simple cases. A few special cases like the first
>>> zero/nonzero element could be handled with dedicated options (sort of like
>>> matplotlib colors), but for anything beyond that it gets unwieldy fast.
>>>
>>> So let's say we have this:
>>> ******************
>>> def cond(x):
>>> return x>50
>>>
>>> search_arr = np.exp(np.arange(0,1000))
>>>
>>> print(np.first_true(search_arr, cond))
>>> *******************
>>>
>>> This should print 4, because the element of search_arr at index 4 (i.e. the
>>> 5th element) is e^4, which is slightly greater than 50 (while e^3 is less
>>> than 50). It should return this *without testing the 6th through 1000th
>>> elements of the array at all to see whether they exceed 50 or not*. This
>>> example is rather contrived, because simply taking the natural log of 50
>>> and rounding up is far superior, not even *evaluating the array of
>>> exponentials *(which my example clearly still does--and in the use cases
>>> I've had for such a function, I can't predict the array elements like
>>> this--they come from loaded data, the output of a simulation, etc., and are
>>> all already in a numpy array). And in this case, since the values are
>>> strictly increasing, search_sorted() would work as well. But it illustrates
>>> the idea.
>>>
>>>
>>>
>>>
>>> On Thu, Oct 26, 2023 at 5:54 AM Dom Grigonis <[email protected]> wrote:
>>>> Could you please give a concise example? I know you have provided one, but
>>>> it is engrained deep in verbose text and has some typos in it, which makes
>>>> hard to understand exactly what inputs should result in what output.
>>>>
>>>> Regards,
>>>> DG
>>>>
>>>> > On 25 Oct 2023, at 22:59, rosko37 <[email protected]> wrote:
>>>> >
>>>> > I know this question has been asked before, both on this list as well as
>>>> > several threads on Stack Overflow, etc. It's a common issue. I'm NOT
>>>> > asking for how to do this using existing Numpy functions (as that
>>>> > information can be found in any of those sources)--what I'm asking is
>>>> > whether Numpy would accept inclusion of a function that does this, or
>>>> > whether (possibly more likely) such a proposal has already been
>>>> > considered and rejected for some reason.
>>>> >
>>>> > The task is this--there's a large array and you want to find the next
>>>> > element after some index that satisfies some condition. Such elements
>>>> > are common, and the typical number of elements to be searched through is
>>>> > small relative to the size of the array. Therefore, it would greatly
>>>> > improve performance to avoid testing ALL elements against the
>>>> > conditional once one is found that returns True. However, all built-in
>>>> > functions that I know of test the entire array.
>>>> >
>>>> > One can obviously jury-rig some ways, like for instance create a "for"
>>>> > loop over non-overlapping slices of length slice_length and call
>>>> > something like np.where(cond) on each--that outer "for" loop is much
>>>> > faster than a loop over individual elements, and the inner loop at most
>>>> > will go slice_length-1 elements past the first "hit". However, needing
>>>> > to use such a convoluted piece of code for such a simple task seems to
>>>> > go against the Numpy spirit of having one operation being one function
>>>> > of the form func(arr)".
>>>> >
>>>> > A proposed function for this, let's call it "np.first_true(arr,
>>>> > start_idx, [stop_idx])" would be best implemented at the C code level,
>>>> > possibly in the same code file that defines np.where. I'm wondering if
>>>> > I, or someone else, were to write such a function, if the Numpy
>>>> > developers would consider merging it as a standard part of the codebase.
>>>> > It's possible that the idea of such a function is bad because it would
>>>> > violate some existing broadcasting or fancy indexing rules. Clearly one
>>>> > could make it possible to pass an "axis" argument to np.first_true()
>>>> > that would select an axis to search over in the case of
>>>> > multi-dimensional arrays, and then the result would be an array of
>>>> > indices of one fewer dimension than the original array. So
>>>> > np.first_true(np.array([1,5],[2,7],[9,10],cond) would return [1,1,0] for
>>>> > cond(x): x>4. The case where no elements satisfy the condition would
>>>> > need to return a "signal value" like -1. But maybe there are some weird
>>>> > cases where there isn't a sensible return val
>>>> ue, hence why such a function has not been added.
>>>> >
>>>> > -Andrew Rosko
>>>> > _______________________________________________
>>>> > NumPy-Discussion mailing list -- [email protected]
>>>> > To unsubscribe send an email to [email protected]
>>>> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>>>> > Member address: [email protected]
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list -- [email protected]
>>>> To unsubscribe send an email to [email protected]
>>>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>>>> Member address: [email protected]
>>> _______________________________________________
>>> NumPy-Discussion mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>>> Member address: [email protected]
>>
>> _______________________________________________
>> NumPy-Discussion mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: [email protected]
> _______________________________________________
> NumPy-Discussion mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: [email protected]
>
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: [email protected]