Hello,
Since this is my first post to this group, I want to start by expressing my
appreciation for the work that everyone has done to develop and maintain NumPy.
Your work enabled me to adopt a fully open-source workflow in my research,
after being a MATLAB user since v4. The quality of the NumPy/SciPy/Matplotlib
ecosystem also helped me convince my department to standardize our courses
around these libraries. Thank you!
Now for my gripes ;)
I've encountered two issues with the numpy.polynomial module that I'd like to
synthesize here, since they are scattered among multiple GitHub issues that
span over five years. I'm interested in this module because I've prepared some
educational resources for using Python to do data analysis in the physical
sciences (https://github.com/jsdodge/data-analysis-python), and these include
how to use numpy.polyfit for fitting polynomial models to data (specifically,
in
https://github.com/jsdodge/data-analysis-python/blob/main/notebooks/5.1-Linear-fits.ipynb).
I had hoped to adopt either numpy.polynomial.polynomial.polyfit or
numpy.polynomial.polynomial.Polynomial.fit by now, but currently neither of
these functions serves as a suitable substitute. My concerns are more urgent
now that numpy.polyfit has been deprecated.
1. The numpy.polyfit function returns the covariance matrix for the fit
parameters, which I think is an essential feature for a function like this (see
https://github.com/numpy/numpy/pull/11197#issuecomment-426728143). Currently,
neither numpy.polynomial.polynomial.polyfit nor
numpy.polynomial.polynomial.Polynomial.fit provide this option.
2. Both numpy.polyfit function and numpy.polynomial.polynomial.polyfit return
fit parameters that correspond to the standard polynomial representation (ie,
a_0 x^p + a_1 x^{p-1} + ... a_p for numpy.polyfit and x^p a_0 + a_1 x + ...
a_p x^p for numpy.polynomial.polynomial.polyfit). The recommended method,
Polynomial.fit, returns coefficients for a rescaled domain, and expects the
user to distinguish between the standard representation and the rescaled form.
Among other things, this decision leads to problems like the discrepancy
between [7] and [8] below:
In [3]: x = np.linspace(0, 100)
In [4]: y = 2 * x + 17
In [5]: p = np.polynomial.Polynomial.fit(x, y, 1)
In [6]: p
Out[6]: Polynomial([117., 100.], domain=[ 0., 100.], window=[-1., 1.],
symbol='x')
In [7]: print(p)
117.0 + 100.0·x
In [8]: print(p.convert())
17.0 + 2.0·x
The output of [7] is bad and the output of [8] is good.
In my view, the ideal solution to (1) and (2) would be the following:
a. Revise both numpy.polynomial.polynomial.polyfit and
numpy.polynomial.polynomial.Polynomial.fit to compute the covariance matrix,
either by default or as part of the output when full==True. The covariance
matrix should be in the basis of the standard polynomial coefficients, not the
rescaled ones.
b. Revise numpy.polynomial.polynomial.Polynomial.fit to yield coefficients for
the unscaled domain (as in [8] above) by default, while retaining the scaled
coefficients internally for evaluating the fitted polynomial, transformation
the fitted polynomial to another polynomial basis, etc. The covariance matrix
should also be given in terms of the standard polynomial representation, not
the representation for the rescaled domain.
Thanks for your attention,
Steve
As an example, here is some code that I would like to refactor using the
numpy.polynomial package.
# Fit the data using known parameter uncertainties
p, V = np.polyfit(current, voltage, 1, w=1 / alpha_voltage, cov="unscaled")
# Print results
print(f"Intercept estimate: ({1000*p[1]:.0f} ± {1000*np.sqrt(V[1][1]):.0f}) µV")
print(f"Slope estimate: ({1000*p[0]:.2f} ± {1000*np.sqrt(V[0][0]):.2f}) Ω")
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com