[Numpy-discussion] Suggestions for changes to the polynomial module

2023-08-28 Thread jsdodge
Hello,

Since this is my first post to this group, I want to start by expressing my 
appreciation for the work that everyone has done to develop and maintain NumPy. 
Your work enabled me to adopt a fully open-source workflow in my research, 
after being a MATLAB user since v4. The quality of the NumPy/SciPy/Matplotlib 
ecosystem also helped me convince my department to standardize our courses 
around these libraries. Thank you!

Now for my gripes ;)

I've encountered two issues with the numpy.polynomial module that I'd like to 
synthesize here, since they are scattered among multiple GitHub issues that 
span over five years. I'm interested in this module because I've prepared some 
educational resources for using Python to do data analysis in the physical 
sciences (https://github.com/jsdodge/data-analysis-python), and these include 
how to use numpy.polyfit for fitting polynomial models to data (specifically, 
in 
https://github.com/jsdodge/data-analysis-python/blob/main/notebooks/5.1-Linear-fits.ipynb).
 I had hoped to adopt either numpy.polynomial.polynomial.polyfit or 
numpy.polynomial.polynomial.Polynomial.fit by now, but currently neither of 
these functions serves as a suitable substitute. My concerns are more urgent 
now that numpy.polyfit has been deprecated.

1. The numpy.polyfit function returns the covariance matrix for the fit 
parameters, which I think is an essential feature for a function like this (see 
https://github.com/numpy/numpy/pull/11197#issuecomment-426728143). Currently, 
neither numpy.polynomial.polynomial.polyfit nor 
numpy.polynomial.polynomial.Polynomial.fit provide this option.

2. Both numpy.polyfit function and numpy.polynomial.polynomial.polyfit return 
fit parameters that correspond to the standard polynomial representation (ie, 
a_0 x^p + a_1 x^{p-1} + ... a_p for numpy.polyfit and  x^p a_0 + a_1 x + ... 
a_p x^p for numpy.polynomial.polynomial.polyfit). The recommended method, 
Polynomial.fit, returns coefficients for a rescaled domain, and expects the 
user to distinguish between the standard representation and the rescaled form. 
Among other things, this decision leads to problems like the discrepancy 
between [7] and [8] below:

In [3]: x = np.linspace(0, 100)
In [4]: y = 2 * x + 17
In [5]: p = np.polynomial.Polynomial.fit(x, y, 1)
In [6]: p
Out[6]: Polynomial([117., 100.], domain=[  0., 100.], window=[-1.,  1.], 
symbol='x')
In [7]: print(p)
117.0 + 100.0·x
In [8]: print(p.convert())
17.0 + 2.0·x

The output of [7] is bad and the output of [8] is good.

In my view, the ideal solution to (1) and (2) would be the following:

a. Revise both numpy.polynomial.polynomial.polyfit and 
numpy.polynomial.polynomial.Polynomial.fit to compute the covariance matrix, 
either by default or as part of the output when full==True. The covariance 
matrix should be in the basis of the standard polynomial coefficients, not the 
rescaled ones.

b. Revise numpy.polynomial.polynomial.Polynomial.fit to yield coefficients for 
the unscaled domain (as in [8] above) by default, while retaining the scaled 
coefficients internally for evaluating the fitted polynomial, transformation 
the fitted polynomial to another polynomial basis, etc. The covariance matrix 
should also be given in terms of the standard polynomial representation, not 
the representation for the rescaled domain.

Thanks for your attention,

Steve

As an example, here is some code that I would like to refactor using the 
numpy.polynomial package.

# Fit the data using known parameter uncertainties
p, V = np.polyfit(current, voltage, 1, w=1 / alpha_voltage, cov="unscaled")

# Print results
print(f"Intercept estimate: ({1000*p[1]:.0f} ± {1000*np.sqrt(V[1][1]):.0f}) µV")
print(f"Slope estimate: ({1000*p[0]:.2f} ± {1000*np.sqrt(V[0][0]):.2f}) Ω")
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] [NEP 52] Introduction of `array_utils` public namespace in `np.lib` submodule

2023-08-28 Thread Mateusz Sokol
Hi all!

The two most important goals of NEP 52 are cleanups of the top `np` namespace 
and `np.lib` namespace.

A set of functions from `np` and `np.lib.*` submodules has been identified that 
could establish a new public namespace within the `np.lib` submodule.

The issue https://github.com/numpy/numpy/issues/24166 proposes a 
`np.lib.array_utils` name for the new namespace that could host functions such 
as `byte_bounds` (originally in `np`), `normalize_axis_tuple` (originally in 
`np.lib.stride_tricks`) or `normalize_axis_index` (originally in 
`np.lib.shape_base`).

If you have any thoughts on naming/content of the new namespace, please share 
your feedback!

Best regards,
Mateusz
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Suggestions for changes to the polynomial module

2023-08-28 Thread jsdodge
To follow up on these comments, an enhancement proposal 
(https://github.com/numpy/numpy/pull/20889), which would enable 
numpy.polynomial.polynomial.polyfit to return the covariance matrix, appears to 
be moving forward. This is great news, and addresses item (1) of my previous 
message.

After thinking a bit more about item (2), is it too late to change the default 
setting to "[]" for the domain parameter in the convenience class `fit` 
methods? A user who is aware of the scaling and wants to preserve the rescaled 
domain information would still be able to do that. The rest of us could remain 
blissfully unaware of what is happening under the hood.

This option would also facilitate adding a covariance matrix output to these 
fit methods, which I would also recommend. The covariance matrix will also 
depend on the scaling, and I assume that most users would want to know its 
element values for the original scale.

Making these changes to the API would enable users to replace 
numpy.polynomial.polynomial.polyfit and numpy.polynomial.polynomial.polval (and 
their equivalents for other polynomial classes) with the more sophisticated 
object-oriented interface more readily. I like many of the features of the 
convenience classes, but the lack of parameter uncertainty output is a 
deal-breaker. The overhead of managing the rescaling up front is also a barrier 
for introductory data analysis applications.

Either way, as the example in my previous email demonstrates, the current 
behavior produces output that is unnecessarily confusing. Even if the API 
remains unchanged, the way polynomial instances are represented to the user 
needs improvement.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Suggestions for changes to the polynomial module

2023-08-28 Thread Pieter Eendebak
Hi Steve,

The representation of the polynomials without including the domain is
indeed confusing. In https://github.com/numpy/numpy/pull/21760 the
representation is changed to avoid this. Would this representation work for
you, or are there better representations?

With kind regards,
Pieter Eendebak


Op ma 28 aug. 2023 14:59 schreef :

> Hello,
>
> Since this is my first post to this group, I want to start by expressing
> my appreciation for the work that everyone has done to develop and maintain
> NumPy. Your work enabled me to adopt a fully open-source workflow in my
> research, after being a MATLAB user since v4. The quality of the
> NumPy/SciPy/Matplotlib ecosystem also helped me convince my department to
> standardize our courses around these libraries. Thank you!
>
> Now for my gripes ;)
>
> I've encountered two issues with the numpy.polynomial module that I'd like
> to synthesize here, since they are scattered among multiple GitHub issues
> that span over five years. I'm interested in this module because I've
> prepared some educational resources for using Python to do data analysis in
> the physical sciences (https://github.com/jsdodge/data-analysis-python),
> and these include how to use numpy.polyfit for fitting polynomial models to
> data (specifically, in
> https://github.com/jsdodge/data-analysis-python/blob/main/notebooks/5.1-Linear-fits.ipynb).
> I had hoped to adopt either numpy.polynomial.polynomial.polyfit or
> numpy.polynomial.polynomial.Polynomial.fit by now, but currently neither of
> these functions serves as a suitable substitute. My concerns are more
> urgent now that numpy.polyfit has been deprecated.
>
> 1. The numpy.polyfit function returns the covariance matrix for the fit
> parameters, which I think is an essential feature for a function like this
> (see https://github.com/numpy/numpy/pull/11197#issuecomment-426728143).
> Currently, neither numpy.polynomial.polynomial.polyfit nor
> numpy.polynomial.polynomial.Polynomial.fit provide this option.
>
> 2. Both numpy.polyfit function and numpy.polynomial.polynomial.polyfit
> return fit parameters that correspond to the standard polynomial
> representation (ie, a_0 x^p + a_1 x^{p-1} + ... a_p for numpy.polyfit and
> x^p a_0 + a_1 x + ... a_p x^p for numpy.polynomial.polynomial.polyfit). The
> recommended method, Polynomial.fit, returns coefficients for a rescaled
> domain, and expects the user to distinguish between the standard
> representation and the rescaled form. Among other things, this decision
> leads to problems like the discrepancy between [7] and [8] below:
>
> In [3]: x = np.linspace(0, 100)
> In [4]: y = 2 * x + 17
> In [5]: p = np.polynomial.Polynomial.fit(x, y, 1)
> In [6]: p
> Out[6]: Polynomial([117., 100.], domain=[  0., 100.], window=[-1.,  1.],
> symbol='x')
> In [7]: print(p)
> 117.0 + 100.0·x
> In [8]: print(p.convert())
> 17.0 + 2.0·x
>
> The output of [7] is bad and the output of [8] is good.
>
> In my view, the ideal solution to (1) and (2) would be the following:
>
> a. Revise both numpy.polynomial.polynomial.polyfit and
> numpy.polynomial.polynomial.Polynomial.fit to compute the covariance
> matrix, either by default or as part of the output when full==True. The
> covariance matrix should be in the basis of the standard polynomial
> coefficients, not the rescaled ones.
>
> b. Revise numpy.polynomial.polynomial.Polynomial.fit to yield coefficients
> for the unscaled domain (as in [8] above) by default, while retaining the
> scaled coefficients internally for evaluating the fitted polynomial,
> transformation the fitted polynomial to another polynomial basis, etc. The
> covariance matrix should also be given in terms of the standard polynomial
> representation, not the representation for the rescaled domain.
>
> Thanks for your attention,
>
> Steve
>
> As an example, here is some code that I would like to refactor using the
> numpy.polynomial package.
>
> # Fit the data using known parameter uncertainties
> p, V = np.polyfit(current, voltage, 1, w=1 / alpha_voltage, cov="unscaled")
>
> # Print results
> print(f"Intercept estimate: ({1000*p[1]:.0f} ±
> {1000*np.sqrt(V[1][1]):.0f}) µV")
> print(f"Slope estimate: ({1000*p[0]:.2f} ± {1000*np.sqrt(V[0][0]):.2f}) Ω")
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: pieter.eende...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com