[Numpy-discussion] Re: Function that searches arrays for the first element that satisfies a condition

2023-10-31 Thread Lev Maximov
I've implemented such functions in Cython and packaged them into a library
called numpy_illustrated 

It exposes the following functions:

find(a, v)  # returns the index of the first occurrence of v in a
first_above(a, v)   # returns the index of the first element in a that is
strictly above v
first_nonzero(a)   # returns the index of the first nonzero element

They scan the array and bail out immediately once the match is found. Have
a significant performance gain if the element to be
found is closer to the beginning of the array. Have roughly the same speed
as alternative methods if the value is missing.

The complete signatures of the functions look like this:

find(a, v, rtol=1e-05, atol=1e-08, sorted=False, default=-1, raises=False)
first_above(a, v, sorted=False, missing=-1, raises=False)
first_nonzero(a, missing=-1, raises=False)

This covers the most common use cases and does not accept Python callbacks
because accepting them would nullify any speed gain
one would expect from such a function. A Python callback can be implemented
with Numba, but anyone who can write the callback
in Numba has no need for a library that wraps it into a dedicated function.

The library has a 100% test coverage. Code style 'black'. It should be easy
to add functions like 'first_below' if necessary.

A more detailed description of these functions can be found here

.

Best regards,
  Lev Maximov

On Tue, Oct 31, 2023 at 3:50 AM Dom Grigonis  wrote:

> I juggled a bit and found pretty nice solution using numba. Which is
> probably not very robust, but proves that such thing can be optimised while
> retaining flexibility. Check if it works for your use cases and let me know
> if anything fails or if it is slow compared to what you used.
>
> first_true_str = """def first_true(arr, n):result = np.full((n, 
> arr.shape[1]), -1, dtype=np.int32)for j in range(arr.shape[1]):k 
> = 0for i in range(arr.shape[0]):x = arr[i:i + 1, j]   
>  if cond(x):result[k, j] = ik += 1
> if k >= n:breakreturn result"""
>
> class FirstTrue:
> CONTEXT = {'np': np}
>
> def __init__(self, expr):
> self.expr = expr
> self.expr_ast = ast.parse(expr, mode='exec').body[0].value
> self.func_ast = ast.parse(first_true_str, mode='exec')
> self.func_ast.body[0].body[1].body[1].body[1].test = self.expr_ast
> self.func_cmp = compile(self.func_ast, filename="", mode="exec")
> exec(self.func_cmp, self.CONTEXT)
> self.func_nb = nb.njit(self.CONTEXT[self.func_ast.body[0].name])
>
> def __call__(self, arr, n=1, axis=None):
> # PREPARE INPUTS
> in_1d = False
> if axis is None:
> arr = np.ravel(arr)[:, None]
> in_1d = True
> elif axis == 0:
> if arr.ndim == 1:
> in_1d = True
> arr = arr[:, None]
> else:
> raise ValueError('axis ~in (None, 0)')
> res = self.func_nb(arr, n)
> if in_1d:
> res = res[:, 0]
> return res
>
> if __name__ == '__main__':
> arr = np.arange(125).reshape((5, 5, 5))
> ft = FirstTrue('np.sum(x) > 30')
> print(ft(arr, n=2, axis=0))
>
> [[1 0 0 0 0]
>  [2 1 1 1 1]]
>
> In [16]: %timeit ft(arr, 2, axis=0)1.31 µs ± 3.94 ns per loop (mean ± std. 
> dev. of 7 runs, 1,000,000 loops each)
>
> Regards,
> DG
>
> On 29 Oct 2023, at 23:18, rosko37  wrote:
>
> An example with a 1-D array (where it is easiest to see what I mean) is
> the following. I will follow Dom Grigonis's suggestion that the range not
> be provided as a separate argument, as it can be just as easily "folded
> into" the array by passing a slice. So it becomes just:
> idx = first_true(arr, cond)
>
> As Dom also points out, the "cond" would likely need to be a "function
> pointer" (i.e., the name of a function defined elsewhere, turning
> first_true into a higher-order function), unless there's some way to pass a
> parseable expression for simple cases. A few special cases like the first
> zero/nonzero element could be handled with dedicated options (sort of like
> matplotlib colors), but for anything beyond that it gets unwieldy fast.
>
> So let's say we have this:
> **
> def cond(x):
> return x>50
>
> search_arr = np.exp(np.arange(0,1000))
>
> print(np.first_true(search_arr, cond))
> ***
>
> This should print 4, because the element of search_arr at index 4 (i.e.
> the 5th element) is e^4, which is slightly greater than 50 (while e^3 is
> less than 50). It should return this *without testing the 6th through
> 1000th elements of the array at all to see whether they exceed 50 or not*.
> This example is rather contrived, because simply taking the natural log of

[Numpy-discussion] Re: next NumPy Newcomers' Hour - 12pm UTC

2023-10-31 Thread Ganesh Kathiresan
Our next Newcomers' Hour will be held this Thursday, Nov 2nd, at 12pm UTC.
Stop by to ask questions or just to say hi.
To add to the meeting agenda the topics you’d like to discuss, follow the
link: https://hackmd.io/3f3otyyuTte3FU9y3QzsLg?both
Join the meeting via Zoom:
https://us06web.zoom.us/j/82563808729?pwd=ZFU3Z2dMcXBGb05YemRsaGE1OW5nQT09

Thanks,
Ganesh

On Sun, Jun 11, 2023 at 8:43 PM Inessa Pawson  wrote:

> Our next Newcomers' Hour will be held this Thursday, June 15th, at 12pm
> UTC. Stop by to ask questions, share your progress, or just to say hi.
> To add to the meeting agenda the topics you’d like to discuss, follow the
> link: https://hackmd.io/3f3otyyuTte3FU9y3QzsLg?both
> Join the meeting via Zoom:
> https://us06web.zoom.us/j/82563808729?pwd=ZFU3Z2dMcXBGb05YemRsaGE1OW5nQT09
>
> --
> Cheers,
> Inessa
>
> Inessa Pawson
> Contributor Experience Lead | NumPy
> https://numpy.org/
> GitHub: inessapawson
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: ganesh3...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Function that searches arrays for the first element that satisfies a condition

2023-10-31 Thread Juan Nunez-Iglesias
If you add a layer of indirection with Numba you can get a *very* nice API:

@numba.njit
def _first(arr, pred):
for i, elem in enumerate(arr):
if pred(elem):
return i

def first(arr, pred):
_pred = numba.njit(pred)
return _first(arr, _pred)

This even works with lambdas! (TIL, thanks Numba devs!)

>>> first(np.random.random(10_000_000), lambda x: x > 0.99)
215

Since Numba has ufunc support I don't suppose it would be hard to make it work 
with an axis= argument, but I've never played with that API myself.

On Tue, 31 Oct 2023, at 6:49 PM, Lev Maximov wrote:
> I've implemented such functions in Cython and packaged them into a library 
> called numpy_illustrated 
> 
> It exposes the following functions:
> 
> find(a, v)  # returns the index of the first occurrence of v in a
> first_above(a, v)   # returns the index of the first element in a that is 
> strictly above v
> first_nonzero(a)   # returns the index of the first nonzero element
> 
> They scan the array and bail out immediately once the match is found. Have a 
> significant performance gain if the element to be
> found is closer to the beginning of the array. Have roughly the same speed as 
> alternative methods if the value is missing.
> 
> The complete signatures of the functions look like this:
> 
> find(a, v, rtol=1e-05, atol=1e-08, sorted=False, default=-1, raises=False)
> first_above(a, v, sorted=False, missing=-1, raises=False)
> first_nonzero(a, missing=-1, raises=False)
> 
> This covers the most common use cases and does not accept Python callbacks 
> because accepting them would nullify any speed gain
> one would expect from such a function. A Python callback can be implemented 
> with Numba, but anyone who can write the callback
> in Numba has no need for a library that wraps it into a dedicated function.
> 
> The library has a 100% test coverage. Code style 'black'. It should be easy 
> to add functions like 'first_below' if necessary.
> 
> A more detailed description of these functions can be found here 
> .
> 
> Best regards,
>   Lev Maximov
> 
> On Tue, Oct 31, 2023 at 3:50 AM Dom Grigonis  wrote:
>> I juggled a bit and found pretty nice solution using numba. Which is 
>> probably not very robust, but proves that such thing can be optimised while 
>> retaining flexibility. Check if it works for your use cases and let me know 
>> if anything fails or if it is slow compared to what you used.
>> 
>> 
>> 
>> first_true_str = """
>> def first_true(arr, n):
>> result = np.full((n, arr.shape[1]), -1, dtype=np.int32)
>> for j in range(arr.shape[1]):
>> k = 0
>> for i in range(arr.shape[0]):
>> x = arr[i:i + 1, j]
>> if cond(x):
>> result[k, j] = i
>> k += 1
>> if k >= n:
>> break
>> return result
>> """
>> 
>> 
>> *class* *FirstTrue*:
>> CONTEXT = {'np': np}
>> 
>> *def* __init__(self, expr):
>> self.expr = expr
>> self.expr_ast = ast.parse(expr, mode='exec').body[0].value
>> self.func_ast = ast.parse(first_true_str, mode='exec')
>> self.func_ast.body[0].body[1].body[1].body[1].test = self.expr_ast
>> self.func_cmp = compile(self.func_ast, filename="", mode="exec")
>> *exec*(self.func_cmp, self.CONTEXT)
>> self.func_nb = nb.njit(self.CONTEXT[self.func_ast.body[0].name])
>> 
>> *def* __call__(self, arr, n=1, axis=None):
>> *# PREPARE INPUTS*
>> in_1d = False
>> *if* axis *is* None:
>> arr = np.ravel(arr)[:, None]
>> in_1d = True
>> *elif* axis == 0:
>> *if* arr.ndim == 1:
>> in_1d = True
>> arr = arr[:, None]
>> *else*:
>> *raise* *ValueError*('axis ~in (None, 0)')
>> res = self.func_nb(arr, n)
>> *if* in_1d:
>> res = res[:, 0]
>> *return* res
>> 
>> 
>> *if* __name__ == '__main__':
>> arr = np.arange(125).reshape((5, 5, 5))
>> ft = FirstTrue('np.sum(x) > 30')
>> *print*(ft(arr, n=2, axis=0))
>> 
>> [[1 0 0 0 0]
>>  [2 1 1 1 1]]
>> 
>> 
>> In [16]: %timeit ft(arr, 2, axis=0)
>> 1.31 µs ± 3.94 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>> 
>> Regards,
>> DG
>> 
>>> On 29 Oct 2023, at 23:18, rosko37  wrote:
>>> 
>>> An example with a 1-D array (where it is easiest to see what I mean) is the 
>>> following. I will follow Dom Grigonis's suggestion that the range not be 
>>> provided as a separate argument, as it can be just as easily "folded into" 
>>> the array by passing a slice. So it becomes just:
>>> idx = first_true(arr, cond)
>>> 
>>> As Dom also points out, the "cond" would likely need to be a "function 
>>> pointer" (i.e., the name of a function defined elsewhere, turning 
>>> first_true

[Numpy-discussion] Re: Function that searches arrays for the first element that satisfies a condition

2023-10-31 Thread Bill Ross
Could a sub-python level spin up extra threads (processes)? Search could
benefit easily. 

I switched back to Java for number crunching because one gets to share
memory without using OS-supplied shared memory. Maybe put a JVM behind
python, or do python in the JVM?

Bill 

--

Phobrain.com 

On 2023-10-31 16:05, Juan Nunez-Iglesias wrote:

> If you add a layer of indirection with Numba you can get a *very* nice API: 
> 
> @numba.njit 
> def _first(arr, pred): 
> for i, elem in enumerate(arr): 
> if pred(elem): 
> return i 
> 
> def first(arr, pred): 
> _pred = numba.njit(pred) 
> return _first(arr, _pred) 
> 
> This even works with lambdas! (TIL, thanks Numba devs!) 
> 
 first(np.random.random(10_000_000), lambda x: x > 0.99) 
> 215 
> 
> Since Numba has ufunc support I don't suppose it would be hard to make it 
> work with an axis= argument, but I've never played with that API myself. 
> 
> On Tue, 31 Oct 2023, at 6:49 PM, Lev Maximov wrote: 
> 
> I've implemented such functions in Cython and packaged them into a library 
> called numpy_illustrated [1] 
> 
> It exposes the following functions: 
> 
> find(a, v)  # returns the index of the first occurrence of v in a 
> first_above(a, v)   # returns the index of the first element in a that is 
> strictly above v 
> first_nonzero(a)   # returns the index of the first nonzero element 
> 
> They scan the array and bail out immediately once the match is found. Have a 
> significant performance gain if the element to be 
> found is closer to the beginning of the array. Have roughly the same speed as 
> alternative methods if the value is missing. 
> 
> The complete signatures of the functions look like this: 
> 
> find(a, v, rtol=1e-05, atol=1e-08, sorted=False, default=-1, raises=False)
> first_above(a, v, sorted=False, missing=-1, raises=False) 
> first_nonzero(a, missing=-1, raises=False) 
> 
> This covers the most common use cases and does not accept Python callbacks 
> because accepting them would nullify any speed gain 
> one would expect from such a function. A Python callback can be implemented 
> with Numba, but anyone who can write the callback 
> in Numba has no need for a library that wraps it into a dedicated function. 
> 
> The library has a 100% test coverage. Code style 'black'. It should be easy 
> to add functions like 'first_below' if necessary. 
> 
> A more detailed description of these functions can be found here [2]. 
> 
> Best regards, 
> Lev Maximov 
> 
> On Tue, Oct 31, 2023 at 3:50 AM Dom Grigonis  wrote: 
> 
> I juggled a bit and found pretty nice solution using numba. Which is probably 
> not very robust, but proves that such thing can be optimised while retaining 
> flexibility. Check if it works for your use cases and let me know if anything 
> fails or if it is slow compared to what you used. 
> 
> first_true_str = """
> def first_true(arr, n):
> result = np.full((n, arr.shape[1]), -1, dtype=np.int32)
> for j in range(arr.shape[1]):
> k = 0
> for i in range(arr.shape[0]):
> x = arr[i:i + 1, j]
> if cond(x):
> result[k, j] = i
> k += 1
> if k >= n:
> break
> return result
> """
> 
> class FirstTrue:
> CONTEXT = {'np': np}
> 
> def __init__(self, expr):
> self.expr = expr
> self.expr_ast = ast.parse(expr, mode='exec').body[0].value
> self.func_ast = ast.parse(first_true_str, mode='exec')
> self.func_ast.body[0].body[1].body[1].body[1].test = self.expr_ast
> self.func_cmp = compile(self.func_ast, filename="", mode="exec")
> exec(self.func_cmp, self.CONTEXT)
> self.func_nb = nb.njit(self.CONTEXT[self.func_ast.body[0].name])
> 
> def __call__(self, arr, n=1, axis=None):
> _# PREPARE INPUTS_
> in_1d = False
> if axis is None:
> arr = np.ravel(arr)[:, None]
> in_1d = True
> elif axis == 0:
> if arr.ndim == 1:
> in_1d = True
> arr = arr[:, None]
> else:
> raise ValueError('axis ~in (None, 0)')
> res = self.func_nb(arr, n)
> if in_1d:
> res = res[:, 0]
> return res
> 
> if __name__ == '__main__':
> arr = np.arange(125).reshape((5, 5, 5))
> ft = FirstTrue('np.sum(x) > 30')
> print(ft(arr, n=2, axis=0))
> 
> [[1 0 0 0 0]
> [2 1 1 1 1]]
> 
> In [16]: %timeit ft(arr, 2, axis=0)
> 1.31 µs ± 3.94 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
> 
> Regards, 
> DG 
> 
> On 29 Oct 2023, at 23:18, rosko37  wrote: 
> 
> An example with a 1-D array (where it is easiest to see what I mean) is the 
> following. I will follow Dom Grigonis's suggestion that the range not be 
> provided as a separate argument, as it can be just as easily "folded into" 
> the array by passing a slice. So it becomes just: 
> idx = first_true(arr, cond) 
> 
> As Dom also points out, the "cond" would likely need to be a "function 
> pointer" (i.e., the name of a function defined elsewhere, turning first_true 
> into a higher-order function), unless there's some way to pass a parseable 
> expression for simple cases. A few special cases like the first zero/nonzero 
> element could be handled with dedicated options (sort of like matplotlib 
> colors), but for a

[Numpy-discussion] Re: Function that searches arrays for the first element that satisfies a condition

2023-10-31 Thread Dom Grigonis
This results in a very slow code. The function calls of 
_pred = numba.njit(pred)
are expensive and this sort of approach will be comparable to pure python 
functions.

This is only recommended for sourcing functions that are not called frequently, 
but rather have a large computational content within them. In other words not 
suitable for predicates.

Regards,
DG

> On 1 Nov 2023, at 01:05, Juan Nunez-Iglesias  wrote:
> 
> If you add a layer of indirection with Numba you can get a *very* nice API:
> 
> @numba.njit
> def _first(arr, pred):
> for i, elem in enumerate(arr):
> if pred(elem):
> return i
> 
> def first(arr, pred):
> _pred = numba.njit(pred)
> return _first(arr, _pred)
> 
> This even works with lambdas! (TIL, thanks Numba devs!)
> 
> >>> first(np.random.random(10_000_000), lambda x: x > 0.99)
> 215
> 
> Since Numba has ufunc support I don't suppose it would be hard to make it 
> work with an axis= argument, but I've never played with that API myself.
> 
> On Tue, 31 Oct 2023, at 6:49 PM, Lev Maximov wrote:
>> I've implemented such functions in Cython and packaged them into a library 
>> called numpy_illustrated 
>> 
>> It exposes the following functions:
>> 
>> find(a, v)  # returns the index of the first occurrence of v in a
>> first_above(a, v)   # returns the index of the first element in a that is 
>> strictly above v
>> first_nonzero(a)   # returns the index of the first nonzero element
>> 
>> They scan the array and bail out immediately once the match is found. Have a 
>> significant performance gain if the element to be
>> found is closer to the beginning of the array. Have roughly the same speed 
>> as alternative methods if the value is missing.
>> 
>> The complete signatures of the functions look like this:
>> 
>> find(a, v, rtol=1e-05, atol=1e-08, sorted=False, default=-1, raises=False)
>> first_above(a, v, sorted=False, missing=-1, raises=False)
>> first_nonzero(a, missing=-1, raises=False)
>> 
>> This covers the most common use cases and does not accept Python callbacks 
>> because accepting them would nullify any speed gain
>> one would expect from such a function. A Python callback can be implemented 
>> with Numba, but anyone who can write the callback
>> in Numba has no need for a library that wraps it into a dedicated function.
>> 
>> The library has a 100% test coverage. Code style 'black'. It should be easy 
>> to add functions like 'first_below' if necessary.
>> 
>> A more detailed description of these functions can be found here 
>> .
>> 
>> Best regards,
>>   Lev Maximov
>> 
>> On Tue, Oct 31, 2023 at 3:50 AM Dom Grigonis > > wrote:
>> I juggled a bit and found pretty nice solution using numba. Which is 
>> probably not very robust, but proves that such thing can be optimised while 
>> retaining flexibility. Check if it works for your use cases and let me know 
>> if anything fails or if it is slow compared to what you used.
>> 
>> 
>> 
>> first_true_str = """
>> def first_true(arr, n):
>> result = np.full((n, arr.shape[1]), -1, dtype=np.int32)
>> for j in range(arr.shape[1]):
>> k = 0
>> for i in range(arr.shape[0]):
>> x = arr[i:i + 1, j]
>> if cond(x):
>> result[k, j] = i
>> k += 1
>> if k >= n:
>> break
>> return result
>> """
>> 
>> 
>> class FirstTrue:
>> CONTEXT = {'np': np}
>> 
>> def __init__(self, expr):
>> self.expr = expr
>> self.expr_ast = ast.parse(expr, mode='exec').body[0].value
>> self.func_ast = ast.parse(first_true_str, mode='exec')
>> self.func_ast.body[0].body[1].body[1].body[1].test = self.expr_ast
>> self.func_cmp = compile(self.func_ast, filename="", mode="exec")
>> exec(self.func_cmp, self.CONTEXT)
>> self.func_nb = nb.njit(self.CONTEXT[self.func_ast.body[0].name])
>> 
>> def __call__(self, arr, n=1, axis=None):
>> # PREPARE INPUTS
>> in_1d = False
>> if axis is None:
>> arr = np.ravel(arr)[:, None]
>> in_1d = True
>> elif axis == 0:
>> if arr.ndim == 1:
>> in_1d = True
>> arr = arr[:, None]
>> else:
>> raise ValueError('axis ~in (None, 0)')
>> res = self.func_nb(arr, n)
>> if in_1d:
>> res = res[:, 0]
>> return res
>> 
>> 
>> if __name__ == '__main__':
>> arr = np.arange(125).reshape((5, 5, 5))
>> ft = FirstTrue('np.sum(x) > 30')
>> print(ft(arr, n=2, axis=0))
>> 
>> [[1 0 0 0 0]
>>  [2 1 1 1 1]]
>> 
>> 
>> In [16]: %timeit ft(arr, 2, axis=0)
>> 1.31 µs ± 3.94 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>> 
>> Regards,
>> DG
>> 
>>> On 29 Oct 2023, at 23:18, rosko37