[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Homeier, Derek


> On 16 Feb 2024, at 2:48 am, Marten van Kerkwijk  
> wrote:
> 
>> In [45]: %timeit np.add.reduce(a, axis=None)
>> 42.8 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>> 
>> In [43]: %timeit dotsum(a)
>> 26.1 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>> 
>> But theoretically, sum, should be faster than dot product by a fair bit.
>> 
>> Isn’t parallelisation implemented for it?
> 
> I cannot reproduce that:
> 
> In [3]: %timeit np.add.reduce(a, axis=None)
> 19.7 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
> 
> In [4]: %timeit dotsum(a)
> 47.2 µs ± 360 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
> 
> But almost certainly it is indeed due to optimizations, since .dot uses
> BLAS which is highly optimized (at least on some platforms, clearly
> better on yours than on mine!).
> 
> I thought .sum() was optimized too, but perhaps less so?


I can confirm at least it does not seem to use multithreading – with the 
conda-installed numpy+BLAS
I almost exactly reproduce your numbers, whereas linked against my own OpenBLAS 
build

In [3]: %timeit np.add.reduce(a, axis=None)
19 µs ± 111 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# OMP_NUM_THREADS=1
In [4]: %timeit dots(a)
20.5 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# OMP_NUM_THREADS=8
In [4]: %timeit dots(a)
9.84 µs ± 1.1 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

add.reduce shows no difference between the two and always remains at <= 100 % 
CPU usage.
dotsum is scaling still better with larger matrices, e.g. ~4 x for 1000x1000.

Cheers,
Derek
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Arrays of variable itemsize

2024-03-13 Thread Homeier, Derek
On 13 Mar 2024, at 6:01 PM, Dom Grigonis  wrote:

So my array sizes in this case are 3e8. Thus, 32bit ints would be needed. So it 
is not a solution for this case.

Nevertheless, such concept would still be worthwhile for cases where integers 
are say max 256bits (or unlimited), then even if memory addresses or offsets 
are 64bit. This would both:
a) save memory if many of values in array are much smaller than 256bits
b) provide a standard for dynamically unlimited size values

In principle one could encode individual offsets in a smarter way, using just 
the minimal number of bits required,
but again that would make random access impossible or very expensive – probably 
more or less amounting to
what smart compression algorithms are already doing.
Another approach might be to to use the mask approach after all (or just flag 
all you uint8 data valued 2**8 as
overflows) and store the correct (uint64 or whatever) values and their indices 
in a second array.
May still not vectorise very efficiently with just numpy if your typical 
operations are non-local.

Derek

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Getting scipy.interpolate.pchip_interpolate to return the first derivative of a pchip interpolation

2023-01-23 Thread Homeier, Derek
On 22 Jan 2023, at 10:40 am, Samuel Dupree 
mailto:sdup...@speakeasy.net>> wrote:


I believe I know what is going on, but I don't understand why.

The line for the first derivative that failed to coincide with the points in 
the plot for the cosine is actually the interpolated first derivative scaled by 
the factor pi/180. When I multiply the interpolated values for the first 
derivative by 180/pi,  the interpolated first derivative coincides with the 
points for the cosine as expected.

What I don't understand is how the interpolator came up with the scale factor 
it did and applied it using pure numbers.

Any thoughts?

Sam Dupree.


On 1/21/23 18:04, Samuel Dupree wrote:

I'm running SciPy ver. 1.9.3 under Python ver. 3.9.15  on a Mac Pro (2019) 
desktop running Mac OSX ver. 13.1 Ventura. The problem I'm having is getting 
scipy.interpolate.pchip_interpolate to return the first derivative of a pchip 
interpolation.

The test program I'm using is given below (and attached to this note).

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import pchip_interpolate

x_observed  = np.linspace(0.0, 360.0, 51)
y_observed  = np.sin(np.pi*x_observed/180)
dydx_observed = np.cos(np.pi*x_observed/180)

Although you are running your calculations on pure numerical values, you have 
defined the sine
as a function of x in degrees. So cosine really gives you the derivative wrt. 
(x * pi/180),
dy / d(x_observed * np.pi/180) = np.cos(np.pi*x_observed/180).

Besides, as noted in a previous post, the interpolated derivative may not 
reproduces exactly the
derivative of the interpolation, but the 180/pi factor should come solely from 
the transformation
to/from working in degrees.

Cheers,
Derek
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-11 Thread Homeier, Derek
On 11 Aug 2023, at 7:52 pm, Robert Kern 
mailto:robert.k...@gmail.com>> wrote:

>>> np.cumsum([[1, 2, 3], [4, 5, 6]])
array([ 1,  3,  6, 10, 15, 21])
```
which matches your example in the cumsum0() documentation. Did something change 
in a recent release?

That's not what's in his example.

The example is creating a cumsum-like array of n+1 elements starting with the 
number 0,
not array[0] – i.e. essentially just inserting 0 along every axis, so that
np.diff(np.cumsum0(a)) = a

Not sure if this would be too complicated to effect with the existing ufuncs 
either…
Almost all of the documentation sounds very repetitive, so maybe implementing 
this
via a new kwarg to cumsum would be a better option?

Cheers,
Derek
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com