from:"Pieter Eendebak"

[Numpy-discussion] Re: Suggestions for changes to the polynomial module

2023-08-28 Thread Pieter Eendebak

Hi Steve,

The representation of the polynomials without including the domain is
indeed confusing. In https://github.com/numpy/numpy/pull/21760 the
representation is changed to avoid this. Would this representation work for
you, or are there better representations?

With kind regards,
Pieter Eendebak


Op ma 28 aug. 2023 14:59 schreef :

> Hello,
>
> Since this is my first post to this group, I want to start by expressing
> my appreciation for the work that everyone has done to develop and maintain
> NumPy. Your work enabled me to adopt a fully open-source workflow in my
> research, after being a MATLAB user since v4. The quality of the
> NumPy/SciPy/Matplotlib ecosystem also helped me convince my department to
> standardize our courses around these libraries. Thank you!
>
> Now for my gripes ;)
>
> I've encountered two issues with the numpy.polynomial module that I'd like
> to synthesize here, since they are scattered among multiple GitHub issues
> that span over five years. I'm interested in this module because I've
> prepared some educational resources for using Python to do data analysis in
> the physical sciences (https://github.com/jsdodge/data-analysis-python),
> and these include how to use numpy.polyfit for fitting polynomial models to
> data (specifically, in
> https://github.com/jsdodge/data-analysis-python/blob/main/notebooks/5.1-Linear-fits.ipynb).
> I had hoped to adopt either numpy.polynomial.polynomial.polyfit or
> numpy.polynomial.polynomial.Polynomial.fit by now, but currently neither of
> these functions serves as a suitable substitute. My concerns are more
> urgent now that numpy.polyfit has been deprecated.
>
> 1. The numpy.polyfit function returns the covariance matrix for the fit
> parameters, which I think is an essential feature for a function like this
> (see https://github.com/numpy/numpy/pull/11197#issuecomment-426728143).
> Currently, neither numpy.polynomial.polynomial.polyfit nor
> numpy.polynomial.polynomial.Polynomial.fit provide this option.
>
> 2. Both numpy.polyfit function and numpy.polynomial.polynomial.polyfit
> return fit parameters that correspond to the standard polynomial
> representation (ie, a_0 x^p + a_1 x^{p-1} + ... a_p for numpy.polyfit and
> x^p a_0 + a_1 x + ... a_p x^p for numpy.polynomial.polynomial.polyfit). The
> recommended method, Polynomial.fit, returns coefficients for a rescaled
> domain, and expects the user to distinguish between the standard
> representation and the rescaled form. Among other things, this decision
> leads to problems like the discrepancy between [7] and [8] below:
>
> In [3]: x = np.linspace(0, 100)
> In [4]: y = 2 * x + 17
> In [5]: p = np.polynomial.Polynomial.fit(x, y, 1)
> In [6]: p
> Out[6]: Polynomial([117., 100.], domain=[  0., 100.], window=[-1.,  1.],
> symbol='x')
> In [7]: print(p)
> 117.0 + 100.0·x
> In [8]: print(p.convert())
> 17.0 + 2.0·x
>
> The output of [7] is bad and the output of [8] is good.
>
> In my view, the ideal solution to (1) and (2) would be the following:
>
> a. Revise both numpy.polynomial.polynomial.polyfit and
> numpy.polynomial.polynomial.Polynomial.fit to compute the covariance
> matrix, either by default or as part of the output when full==True. The
> covariance matrix should be in the basis of the standard polynomial
> coefficients, not the rescaled ones.
>
> b. Revise numpy.polynomial.polynomial.Polynomial.fit to yield coefficients
> for the unscaled domain (as in [8] above) by default, while retaining the
> scaled coefficients internally for evaluating the fitted polynomial,
> transformation the fitted polynomial to another polynomial basis, etc. The
> covariance matrix should also be given in terms of the standard polynomial
> representation, not the representation for the rescaled domain.
>
> Thanks for your attention,
>
> Steve
>
> As an example, here is some code that I would like to refactor using the
> numpy.polynomial package.
>
> # Fit the data using known parameter uncertainties
> p, V = np.polyfit(current, voltage, 1, w=1 / alpha_voltage, cov="unscaled")
>
> # Print results
> print(f"Intercept estimate: ({1000*p[1]:.0f} ±
> {1000*np.sqrt(V[1][1]):.0f}) µV")
> print(f"Slope estimate: ({1000*p[0]:.2f} ± {1000*np.sqrt(V[0][0]):.2f}) Ω")
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: pieter.eende...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] overhead of numpy scalars

2022-05-04 Thread Pieter Eendebak

Hi everyone,

Operations such as np.sqrt or np.cos on numpy scalars are much slower than
operations on an array of size 1 (benchmark below). The reason seems to be
that a numpy scalar (e.g. np.float64(1.1)) is converted internally to an
array, an array is created for the result of the operation and the result
is converted back to a scalar (so three object constructions in total).
Operating directly on an array creates an array for the result, which is
returned (so one object construction in total).

Has anyone looked into improving this performance before or has ideas how
to tackle this?

With kind regards,
Pieter Eendebak

Example benchmark:

import numpy as np
from numpy import sqrt
import time

v=np.array([1.1])
t0=time.perf_counter()
x=np.float64(v)
for kk in range(1_000_000):
w1=sqrt(x)
dt1=time.perf_counter()-t0
print(dt1)

t0=time.perf_counter()
x=v
for kk in range(1_000_000):
w2=sqrt(x)
dt2=time.perf_counter()-t0
print(dt2)

print(f'factor {dt1/dt2:.2f} faster')
assert(float(w1)==float(w2))

Results in

1.0878291579974757
0.511536967245
factor 2.13 faster
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] normalization of Polynomial

2022-05-24 Thread Pieter Eendebak

Hi,

The new Polynomial objects includes a domain and window. How can I
normalize such an object to a standard domain and window? There is the
.convert method, but it returns a strange format. A minimal example:

import numpy as np
p =np.polynomial.Polynomial([1, 1])
x=np.linspace(0, 3, 15);
y=p(x)
y[1]+=.1
q=np.polynomial.Polynomial.fit(x, y, 1)
print(f'p: {repr(p)}')
print(f'fitted q: {repr(q)}')
print(f'fitted q: {str(q)}')
qc=q.convert()
print(f'q.convert(): str {qc}')
print(f'q.convert(): {repr(qc)}')

The output of repr(qc) is:

Polynomial([Polynomial([1.0217, 0.99  ], domain=[-1.,  1.],
window=[-1.,  1.])],
  dtype=object, domain=[-1.,  1.], window=[-1.,  1.])

but I am looking for:

Polynomial([1.0217, 0.99  ], domain=[-1.,  1.], window=[-1.,  1.])

With kind regards,
Pieter Eendebak
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: overhead of numpy scalars

2022-06-02 Thread Pieter Eendebak

Hi,

A small update on the scalar and small array performance: with a series of
recent PRs performance has improved:

Before:
1.0878291579974757  (numpy scalar)
0.511536967245 (numpy array of size 1)
factor 2.13 faster

Current:
0.6275881889996526 (numpy scalar)
0.436217111433 (numpy array of size 1)
factor 1.44 faster
(tested with np.sqrt, but similar for other ufuncs)

>From the profiling I performed I conclude there is still (a little) room
for improvement in both the scalar and small array case. To get the
performance of scalars really on par with small arrays one of the
suggestions from Sebastian (free lists or scalar ufuncs) might be required,
but both involve some larger effort.

With kind regards,
Pieter Eendebak

On Wed, May 4, 2022 at 3:49 PM Sebastian Berg 
wrote:

> On Tue, 2022-05-03 at 15:06 +0200, Pieter Eendebak wrote:
> > Hi everyone,
> >
> > Operations such as np.sqrt or np.cos on numpy scalars are much slower
> > than
> > operations on an array of size 1 (benchmark below). The reason seems
> > to be
> > that a numpy scalar (e.g. np.float64(1.1)) is converted internally to
> > an
> > array, an array is created for the result of the operation and the
> > result
> > is converted back to a scalar (so three object constructions in
> > total).
> > Operating directly on an array creates an array for the result, which
> > is
> > returned (so one object construction in total).
> >
> > Has anyone looked into improving this performance before or has ideas
> > how
> > to tackle this?
>
> Improvements certainly here and there, but very little in terms of a
> larger effort.  But not a coordinated effort to try to get the scalar
> performance closer to the array one here.
>
> It is a bit tricky, since at least I have little appetite to duplicate
> a big chunk of the ufunc machinery.
>
> I could think of a few things:
>
> 1. We need to special case things so the `__array_ufunc__` overhead
>goes away.  That needs a good idea... or just a fast check for
>NumPy scalars.
>
> 2. If we stick with arrays internally, improve the scalar -> array
>conversion.  Things that could help here:
>* A free-list for 0-D arrays (with small itemsize)?
>* Some fast paths to skip dtype/shape discovery here?
>
> Unless we want to look into writing a "scalar ufunc" machinery (we
> almost have a start for that in the scalar math, but ufuncs are much
> more flexible than simple math).
>
> It would be nice if we can chip away at improving things, but right now
> I do not have a good plan of how to level the performance gap.
> A super-fast scalar check (we have one that might just work), may help
> with both of the points above I guess.
>
> Cheers,
>
> Sebastian
>
>
>
> [1] To be fair, I may have removed some fast-paths there, but I don't
> recall slowing things down, so I either sped things up elsewhere or
> they were not thorough enough anyway probably.
>
>
>
>
> >
> > With kind regards,
> > Pieter Eendebak
> >
> > Example benchmark:
> >
> > import numpy as np
> > from numpy import sqrt
> > import time
> >
> > v=np.array([1.1])
> > t0=time.perf_counter()
> > x=np.float64(v)
> > for kk in range(1_000_000):
> > w1=sqrt(x)
> > dt1=time.perf_counter()-t0
> > print(dt1)
> >
> > t0=time.perf_counter()
> > x=v
> > for kk in range(1_000_000):
> > w2=sqrt(x)
> > dt2=time.perf_counter()-t0
> > print(dt2)
> >
> > print(f'factor {dt1/dt2:.2f} faster')
> > assert(float(w1)==float(w2))
> >
> > Results in
> >
> > 1.0878291579974757
> > 0.511536967245
> > factor 2.13 faster
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: sebast...@sipsolutions.net
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: pieter.eende...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] seed for new random number generator

2022-06-19 Thread Pieter Eendebak

Hi everyone,

The new numpy random interface (e.g. r=numpy.random.default_rng; r.random)
is much faster than the old one (e.g. np.random.random). When converting
 code from the old style to the new style I miss having a way to set the
seed of the RNG

I tried:

rng.bit_generator = np.random.PCG64(seed=42) # fails, with good error
message
rng.bit_generator.state['state']['state']=42 # has no effect, perhaps make
this dict read-only?

Is there a way to set the seed without creating a new RNG object?

With kind regards,
Pieter Eendebak
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: seed for new random number generator

2022-06-21 Thread Pieter Eendebak

Hi Robert,

Thanks for the information and the offer. My use case is development of
algorithms where I want to be able to test variations of the algorithm
while keeping the same (simulated or generated) data. Related to that are
example scripts part of the documentation that I want to always have the
same result.

Creating a generator and passing that (someway or the other) is a good
approach, but it requires some refactoring of the code. Especially for
third-party code I would like to avoid this (to be honest, most external
code is still using the old numpy random number interface, so there we can
use the old style of setting the seed)

With kind regards,
Pieter Eendebak


On Sun, Jun 19, 2022 at 5:36 PM Robert Kern  wrote:

> On Sun, Jun 19, 2022 at 9:37 AM Pieter Eendebak 
> wrote:
>
>> Hi everyone,
>>
>> The new numpy random interface (e.g. r=numpy.random.default_rng;
>> r.random) is much faster than the old one (e.g. np.random.random). When
>> converting  code from the old style to the new style I miss having a way to
>> set the seed of the RNG
>>
>> I tried:
>>
>> rng.bit_generator = np.random.PCG64(seed=42) # fails, with good error
>> message
>> rng.bit_generator.state['state']['state']=42 # has no effect, perhaps
>> make this dict read-only?
>>
>> Is there a way to set the seed without creating a new RNG object?
>>
>
> We generally recommend just creating a new Generator and passing that
> around in almost all cases. Whenever that can possibly be made to work,
> please do that. The use of np.random.seed() is usually a code smell
> indicating that someone was working around the fact that there was just the
> one global underneath np.random.random() et al. When you don't have the
> constraint of a single global instance, it's almost always going to be
> better to use multiple instances; you don't need that workaround anymore.
>
> There are ways to punch in a new BitGenerator into an existing Generator,
> if you must, and also ways to punch in a state dict into the BitGenerator,
> but these are largely for various meta-programming tasks like serialization
> rather than day-to-day use of pseudorandom numbers. If you are converting
> old code that happened to use np.random.seed(), it is almost certainly not
> one of these tasks.
>
> If you want to describe your use case more specifically, I can give you
> some more guidance on good patterns to replace it with the new system.
>
> --
> Robert Kern
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: pieter.eende...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] backwards incompatible change to numpy.busday_count

2023-03-13 Thread Pieter Eendebak

Hi,

The numpy.busday_count contains a bug where the number of days returned for
a call np.busday_count(begin_date, end_date) can be incorrect if end_date
is before begin_data in time. The problem has been around since 2011.

There are two possible ways to solve this:

1. Fix the bug. This is proposed in
https://github.com/numpy/numpy/pull/23229
2. Update the documentation and accept the behaviour

The issue has been discussed in the triage meeting and the proposal is to
fix the bug (
https://github.com/numpy/numpy/issues/23197#issuecomment-1460610260). Since
this modifies the behaviour of a method that has been around for a long
time, we wanted to check whether there are any objections or concerns.

- Are there objections to fixing the bug and thereby modifying the
behaviour?
- Would there be objections to deprecating the busday functionality and
moving it to a separate package (e.g. numpy financials)

With kind regards,
Pieter Eendebak
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Automatic binning for np.histogram

2025-03-05 Thread Pieter Eendebak via NumPy-Discussion

Hi everyone,

For certain input arrays with small variation the method np.histogram with 
automatic binning can  select an enormous amount of bins and return with an 
out-of-memory error. A minimal example:

import numpy as np
e = 1 + 1e-12
Z = [0,1,1,1,1,1,e,e,e,e,e,e, 2]
np.histogram(Z, bins="auto")

There is a proposal to change the automatic bin selection to avoid this: 
https://github.com/numpy/numpy/pull/28426. The aim is to keep close to the 
original algorithm, but avoid the out-of-memory issues for input with small 
variance. It passes unit tests, but since this is a user visible change we 
would like some more input.
In particular:

* What are expectations of the auto binning algorithm?
* What is a reasonable maximum number of bins for a sample of size  n?

With kind regards,
Pieter Eendebak
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Suggestions for changes to the polynomial module

[Numpy-discussion] overhead of numpy scalars

[Numpy-discussion] normalization of Polynomial

[Numpy-discussion] Re: overhead of numpy scalars

[Numpy-discussion] seed for new random number generator

[Numpy-discussion] Re: seed for new random number generator

[Numpy-discussion] backwards incompatible change to numpy.busday_count

[Numpy-discussion] Automatic binning for np.histogram

8 matches

Site Navigation

Mail list logo

Footer information