Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum wrote: > I think the biggest issues could be resolved if __array_concatenate__ were > finished. Unfortunately I don't feel like I can take that on right now. > > See Ryan May's talk at scipy about using an ndarray subclass for units and > the issues he's run into: > > https://www.youtube.com/watch?v=qCo9bkT9sow > Interesting talk, but I don't see how general library code should know what units the output has. for example if units are some flows per unit of time and we average, sum or integrate over time, then what are the new units? (e.g. pandas time aggregation) What are units of covariance or correlation between two variables with the same units, and what are they between variables with different units? How do you concatenate and operate arrays with different units? interpolation or prediction would work with using the existing units. partially related: statsmodels uses a wrapper for pandas Series and DataFrames and tries to preserve the index when possible and make up a new DataFrame or Series if the existing index doesn't apply. E.g. predicted values and residuals are in terms of the original provided index, and could also get original units assigned. That would also be possible with prediction confidence intervals. But for the rest, see above. Josef > > > On Wed, Nov 1, 2017 at 5:50 PM, Marten van Kerkwijk < > m.h.vankerkw...@gmail.com> wrote: > >> From my experience with Quantity, routines that properly ducktype work >> well, those that feel the need to accept list and blatantly do >> `asarray` do not - even if in many cases they would have worked if >> they used `asanyarray`... But there are lots of nice surprises, with, >> e.g., `np.fft.fftfreq` just working as one would hope. Anyway, bottom >> line, I think you should let this stop you from trying only if you >> know something important does not work. -- Marten >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Thu, Nov 2, 2017 at 8:46 AM, wrote: > > > On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum > wrote: > >> I think the biggest issues could be resolved if __array_concatenate__ >> were finished. Unfortunately I don't feel like I can take that on right now. >> >> See Ryan May's talk at scipy about using an ndarray subclass for units >> and the issues he's run into: >> >> https://www.youtube.com/watch?v=qCo9bkT9sow >> > > > Interesting talk, but I don't see how general library code should know > what units the output has. > for example if units are some flows per unit of time and we average, sum > or integrate over time, then what are the new units? (e.g. pandas time > aggregation) > What are units of covariance or correlation between two variables with the > same units, and what are they between variables with different units? > > How do you concatenate and operate arrays with different units? > > interpolation or prediction would work with using the existing units. > > partially related: > statsmodels uses a wrapper for pandas Series and DataFrames and tries to > preserve the index when possible and make up a new DataFrame or Series if > the existing index doesn't apply. > E.g. predicted values and residuals are in terms of the original provided > index, and could also get original units assigned. That would also be > possible with prediction confidence intervals. But for the rest, see above. > using pint >>> x >>> x / x >>> x / (1 + x) Traceback (most recent call last): File "", line 1, in File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", line 669, in __add__ raise DimensionalityError(self._units, 'dimensionless') return self._add_sub(other, operator.add) File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", line 580, in _add_sub pint.errors.DimensionalityError: Cannot convert from 'meter' to 'dimensionless' np.exp(x) raises pint.errors.DimensionalityError: Cannot convert from 'meter' ([length]) to 'dimensionless' (dimensionless) Josef > > Josef > > >> >> >> On Wed, Nov 1, 2017 at 5:50 PM, Marten van Kerkwijk < >> m.h.vankerkw...@gmail.com> wrote: >> >>> From my experience with Quantity, routines that properly ducktype work >>> well, those that feel the need to accept list and blatantly do >>> `asarray` do not - even if in many cases they would have worked if >>> they used `asanyarray`... But there are lots of nice surprises, with, >>> e.g., `np.fft.fftfreq` just working as one would hope. Anyway, bottom >>> line, I think you should let this stop you from trying only if you >>> know something important does not work. -- Marten >>> ___ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
Hi Josef, astropy's Quantity is well developed and would give similar results to pint; all those results make sense if one wants to have consistent units. A general library code will actually do the right thing as long as it just uses normal mathematical operations with ufuncs - and as long as it just duck types! - the unit code will then override and properly propagate units to outputs, as can be seen in this example: ``` import astropy.units as u np.fft.fftfreq(8, 1*u.min) # np.fft.fftfreq(8, 1*u.min).var() # ``` > for example if units are some flows per unit of time and we average, sum or > integrate over time, then what are the new units? (e.g. pandas time > aggregation) The units module will force you to take into account `dt`! This is in fact one reason why it is so powerful. So, your example might go something like: ``` flow = [1., 1.5, 1.5] * u.g / u.s dt = [0.5, 0.5, 1.] * u.hr np.sum(flow * dt) # np.sum(flow * dt).to(u.kg) # ``` > How do you concatenate and operate arrays with different units? This is where Nathaniel's `__array_concatenate__` would come in. For regular arrays it is fine to just concatenate, but for almost anything else you need a different approach. For quantities, the most logical one would be to first create an empty array of the right size with the unit of, e.g., the first part to be concatenated, and then set sections to the input quantities (where the setter does unit conversion and will fail if that is not possible). All the best, Marten p.s. A fun subject is what to do with logarithmic units, such as the magnitudes in astronomy... We have a module for that as well; http://docs.astropy.org/en/latest/units/logarithmic_units.html ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Thu, Nov 2, 2017 at 6:46 AM, wrote: > > > On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum > wrote: > >> I think the biggest issues could be resolved if __array_concatenate__ >> were finished. Unfortunately I don't feel like I can take that on right now. >> >> See Ryan May's talk at scipy about using an ndarray subclass for units >> and the issues he's run into: >> >> https://www.youtube.com/watch?v=qCo9bkT9sow >> > > > Interesting talk, but I don't see how general library code should know > what units the output has. > for example if units are some flows per unit of time and we average, sum > or integrate over time, then what are the new units? (e.g. pandas time > aggregation) > A general library doesn't have to do anything--just not do annoying things like isinstance() checks and calling np.asarray() everywhere. Honestly one of those is the core of most of the problems I run into. It's currently more complicated when doing things in compiled land, but that's implementation details, not any kind of fundamental incompatibility. For basic mathematical operations, units have perfectly well defined semantics that many of us encountered in an introductory physics or chemistry class: - Want to add or subtract two things? They need to have the same units; a units library can handle conversion provided they have the same dimensionality (e.g. length, time) - Multiplication/Divison: combine and cancel units ( m/s * s -> m) Everything else we do on a computer with data in some way boils down to: add, subtract, multiply, divide. Average keeps the same units -- it's just a sum and division by a unit-less constant Integration (in 1-D) involves *two* variables, your data as well as the time/space coordinates (or dx or dt); fundamentally it's a multiplication by dx and a summation. The units results then are e.g. data.units * dx.units. This works just like it does in Physics 101 where you integrate velocity (i.e. m/s) over time (e.g. s) and get displacement (e.g. m) What are units of covariance or correlation between two variables with the > same units, and what are they between variables with different units? > Well, covariance is subtracting the mean from each variable and multiplying the residuals; therefore the units for cov(x, y): (x.units - x.units) * (y.units - y.units) -> x.units * y.units Correlation takes covariance and divides by the product of the standard deviations, so that's: (x.units * y.units) / (x.units * y.units) -> dimensionless Which is what I'd expect for a correlation. > How do you concatenate and operate arrays with different units? > If all arrays have compatible dimensionality (say meters, inches, miles), you convert to one (say the first) and concatenate like normal. If they're not compatible, you error out. > interpolation or prediction would work with using the existing units. > I'm sure you wrote that thinking units didn't play a role, but the math behind those operations works perfectly fine with units, with things cancelling out properly to give the same units out as in. Ryan -- Ryan May ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Thu, Nov 2, 2017 at 11:51 AM, Marten van Kerkwijk < m.h.vankerkw...@gmail.com> wrote: > Hi Josef, > > astropy's Quantity is well developed and would give similar results to > pint; all those results make sense if one wants to have consistent > units. A general library code will actually do the right thing as long > as it just uses normal mathematical operations with ufuncs - and as > long as it just duck types! - the unit code will then override and > properly propagate units to outputs, as can be seen in this example: > ``` > import astropy.units as u > np.fft.fftfreq(8, 1*u.min) > # min> > np.fft.fftfreq(8, 1*u.min).var() > # > ``` > > > for example if units are some flows per unit of time and we average, sum > or integrate over time, then what are the new units? (e.g. pandas time > aggregation) > > The units module will force you to take into account `dt`! This is in > fact one reason why it is so powerful. So, your example might go > something like: > ``` > flow = [1., 1.5, 1.5] * u.g / u.s > dt = [0.5, 0.5, 1.] * u.hr > np.sum(flow * dt) > # > np.sum(flow * dt).to(u.kg) > # > ``` > > > How do you concatenate and operate arrays with different units? > > This is where Nathaniel's `__array_concatenate__` would come in. For > regular arrays it is fine to just concatenate, but for almost anything > else you need a different approach. For quantities, the most logical > one would be to first create an empty array of the right size with the > unit of, e.g., the first part to be concatenated, and then set > sections to the input quantities (where the setter does unit > conversion and will fail if that is not possible). > For example, "will fail if that is not possible" rules out inhomogeneous arrays (analogous to structure dtypes) How to you get a vander matrix for something simple like a polynomial fit? x[:, None] ** np.arange(3) > > All the best, > > Marten > > p.s. A fun subject is what to do with logarithmic units, such as the > magnitudes in astronomy... We have a module for that as well; > http://docs.astropy.org/en/latest/units/logarithmic_units.html similar, scipy.special has ufuncs what units are those? Most code that I know (i.e. scipy.stats and statsmodels) does not use only "normal mathematical operations with ufuncs" I guess there are a lot of "abnormal" mathematical operations where just simply propagating the units will not work. Aside: The problem is more general also for other datastructures. E.g. statsmodels for most parts uses only numpy ndarrays inside the algorithm and computations because that provides well defined behavior. (e.g. pandas behaved too differently in many cases) I don't have much idea yet about how to change the infrastructure to allow the use of dask arrays, sparse matrices and similar and possibly automatic differentiation. Josef > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Thu, Nov 2, 2017 at 6:56 AM, wrote: > On Thu, Nov 2, 2017 at 8:46 AM, wrote: > >> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum >> wrote: >> >>> I think the biggest issues could be resolved if __array_concatenate__ >>> were finished. Unfortunately I don't feel like I can take that on right now. >>> >>> See Ryan May's talk at scipy about using an ndarray subclass for units >>> and the issues he's run into: >>> >>> https://www.youtube.com/watch?v=qCo9bkT9sow >>> >> >> >> Interesting talk, but I don't see how general library code should know >> what units the output has. >> for example if units are some flows per unit of time and we average, sum >> or integrate over time, then what are the new units? (e.g. pandas time >> aggregation) >> What are units of covariance or correlation between two variables with >> the same units, and what are they between variables with different units? >> >> How do you concatenate and operate arrays with different units? >> >> interpolation or prediction would work with using the existing units. >> >> partially related: >> statsmodels uses a wrapper for pandas Series and DataFrames and tries to >> preserve the index when possible and make up a new DataFrame or Series if >> the existing index doesn't apply. >> E.g. predicted values and residuals are in terms of the original provided >> index, and could also get original units assigned. That would also be >> possible with prediction confidence intervals. But for the rest, see above. >> > > using pint > > >>> x > > >>> x / x > > > >>> x / (1 + x) > Traceback (most recent call last): > File "", line 1, in > File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", > line 669, in __add__ > raise DimensionalityError(self._units, 'dimensionless') > return self._add_sub(other, operator.add) > File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", > line 580, in _add_sub > pint.errors.DimensionalityError: Cannot convert from 'meter' to > 'dimensionless' > I'm not sure why you have a problem with that results. You tried to take a number in meters and add a dimensionless value to that--that's not a defined operation. That's like saying: "I have a distance of 12 meters and added 1 to it." 1 what? 1 meter? Great. 1 centimeter? I need to convert, but I can do that operation. 1 second? That makes no sense. If you add units to the 1 then it's a defined operation: >>> reg = pint.UnitRegistry() >>> x / (1 * ureg.meters + x) > np.exp(x) > raises > pint.errors.DimensionalityError: Cannot convert from 'meter' ([length]) > to 'dimensionless' (dimensionless) > Well, the Taylor series for exp (around a=0) is: exp(x) = 1 + x + x**2 / 2 + x**3 / 6 + ... so for that to properly add up, x needs to be dimensionless. It should be noted, though, that I've *never* seen a formula, theoretically derived or empirically fit, require directly taking exp(x) where x is a physical quantity with units. Instead, you have: f = a * exp(kx) Properly calculated values for a, k will have appropriate units attached to them that allows the calculation to proceed without error Ryan -- Ryan May ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Thu, Nov 2, 2017 at 12:23 PM, Ryan May wrote: > On Thu, Nov 2, 2017 at 6:46 AM, wrote: > >> >> >> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum >> wrote: >> >>> I think the biggest issues could be resolved if __array_concatenate__ >>> were finished. Unfortunately I don't feel like I can take that on right now. >>> >>> See Ryan May's talk at scipy about using an ndarray subclass for units >>> and the issues he's run into: >>> >>> https://www.youtube.com/watch?v=qCo9bkT9sow >>> >> >> >> Interesting talk, but I don't see how general library code should know >> what units the output has. >> for example if units are some flows per unit of time and we average, sum >> or integrate over time, then what are the new units? (e.g. pandas time >> aggregation) >> > > A general library doesn't have to do anything--just not do annoying things > like isinstance() checks and calling np.asarray() everywhere. Honestly one > of those is the core of most of the problems I run into. It's currently > more complicated when doing things in compiled land, but that's > implementation details, not any kind of fundamental incompatibility. > > For basic mathematical operations, units have perfectly well defined > semantics that many of us encountered in an introductory physics or > chemistry class: > - Want to add or subtract two things? They need to have the same units; a > units library can handle conversion provided they have the same > dimensionality (e.g. length, time) > - Multiplication/Divison: combine and cancel units ( m/s * s -> m) > > Everything else we do on a computer with data in some way boils down to: > add, subtract, multiply, divide. > > Average keeps the same units -- it's just a sum and division by a > unit-less constant > Integration (in 1-D) involves *two* variables, your data as well as the > time/space coordinates (or dx or dt); fundamentally it's a multiplication > by dx and a summation. The units results then are e.g. data.units * > dx.units. This works just like it does in Physics 101 where you integrate > velocity (i.e. m/s) over time (e.g. s) and get displacement (e.g. m) > > What are units of covariance or correlation between two variables with the >> same units, and what are they between variables with different units? >> > > Well, covariance is subtracting the mean from each variable and > multiplying the residuals; therefore the units for cov(x, y): > > (x.units - x.units) * (y.units - y.units) -> x.units * y.units > > Correlation takes covariance and divides by the product of the standard > deviations, so that's: > > (x.units * y.units) / (x.units * y.units) -> dimensionless > > Which is what I'd expect for a correlation. > > >> How do you concatenate and operate arrays with different units? >> > > If all arrays have compatible dimensionality (say meters, inches, miles), > you convert to one (say the first) and concatenate like normal. If they're > not compatible, you error out. > > >> interpolation or prediction would work with using the existing units. >> > > I'm sure you wrote that thinking units didn't play a role, but the math > behind those operations works perfectly fine with units, with things > cancelling out properly to give the same units out as in. > Some of it is in my reply to Marten. regression and polyfit requires an X matrix with different units and then some linear algebra like solve, pinv or svd. So, while the predicted values have well defined units, the computation involves some messier operations, unless you want to forgo linear algebra in all intermediate step and reduce it to sum, division and inverse. Josef > > Ryan > > -- > Ryan May > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Thu, Nov 2, 2017 at 12:46 PM, Ryan May wrote: > On Thu, Nov 2, 2017 at 6:56 AM, wrote: > >> On Thu, Nov 2, 2017 at 8:46 AM, wrote: >> >>> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum >>> wrote: >>> I think the biggest issues could be resolved if __array_concatenate__ were finished. Unfortunately I don't feel like I can take that on right now. See Ryan May's talk at scipy about using an ndarray subclass for units and the issues he's run into: https://www.youtube.com/watch?v=qCo9bkT9sow >>> >>> >>> Interesting talk, but I don't see how general library code should know >>> what units the output has. >>> for example if units are some flows per unit of time and we average, sum >>> or integrate over time, then what are the new units? (e.g. pandas time >>> aggregation) >>> What are units of covariance or correlation between two variables with >>> the same units, and what are they between variables with different units? >>> >>> How do you concatenate and operate arrays with different units? >>> >>> interpolation or prediction would work with using the existing units. >>> >>> partially related: >>> statsmodels uses a wrapper for pandas Series and DataFrames and tries to >>> preserve the index when possible and make up a new DataFrame or Series if >>> the existing index doesn't apply. >>> E.g. predicted values and residuals are in terms of the original >>> provided index, and could also get original units assigned. That would also >>> be possible with prediction confidence intervals. But for the rest, see >>> above. >>> >> >> using pint >> >> >>> x >> >> >>> x / x >> >> >> >>> x / (1 + x) >> Traceback (most recent call last): >> File "", line 1, in >> File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", >> line 669, in __add__ >> raise DimensionalityError(self._units, 'dimensionless') >> return self._add_sub(other, operator.add) >> File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", >> line 580, in _add_sub >> pint.errors.DimensionalityError: Cannot convert from 'meter' to >> 'dimensionless' >> > > I'm not sure why you have a problem with that results. You tried to take a > number in meters and add a dimensionless value to that--that's not a > defined operation. That's like saying: "I have a distance of 12 meters and > added 1 to it." 1 what? 1 meter? Great. 1 centimeter? I need to convert, > but I can do that operation. 1 second? That makes no sense. > > If you add units to the 1 then it's a defined operation: > > >>> reg = pint.UnitRegistry() > >>> x / (1 * ureg.meters + x) > 'dimensionless')> > > >> np.exp(x) >> raises >> pint.errors.DimensionalityError: Cannot convert from 'meter' ([length]) >> to 'dimensionless' (dimensionless) >> > > Well, the Taylor series for exp (around a=0) is: > > exp(x) = 1 + x + x**2 / 2 + x**3 / 6 + ... > > so for that to properly add up, x needs to be dimensionless. It should be > noted, though, that I've *never* seen a formula, theoretically derived or > empirically fit, require directly taking exp(x) where x is a physical > quantity with units. Instead, you have: > > f = a * exp(kx) > > Properly calculated values for a, k will have appropriate units attached > to them that allows the calculation to proceed without error > I was thinking of a simple logit model to predict whether it rains tomorrow The Logit transformation for the probability is exp(k x) / (1 + exp(k x) where k is a parameter to search for in the optimization. x is a matrix with all predictors or explanatory variables which could all have different units. So it sounds to me if we drop asarray, then we just get exceptions or possibly strange results, or we have to introduce a unit that matches everything (like a joker card) for any constants that we are using. Josef > > Ryan > > -- > Ryan May > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
Hi Josef, Indeed, for some applications one would like to have different units for different parts of an array. And that means that, at present, the quantity implementations that we have are no good at storing, say, a covariance matrix involving parameters with different units, where thus each element of the covariance matrix has a different unit. I fear at present it would have to be an object array instead; other cases may be a bit easier to solve, by, e.g., allowing structured arrays with similarly structured units. I do note that actually doing it would clarify, e.g., what the axes in Vandermonde (spelling?) matrices mean. That said, there is truly an enormous benefit for checking units on "regular" operations. Spacecraft have missed Mars because people didn't do it properly... All the best, Marten p.s. The scipy functions should indeed be included in the ufuncs covered; there is a fairly long-standing issue for that in astropy... ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Thu, Nov 2, 2017 at 2:39 PM, Marten van Kerkwijk < m.h.vankerkw...@gmail.com> wrote: > Hi Josef, > > Indeed, for some applications one would like to have different units > for different parts of an array. And that means that, at present, the > quantity implementations that we have are no good at storing, say, a > covariance matrix involving parameters with different units, where > thus each element of the covariance matrix has a different unit. I > fear at present it would have to be an object array instead; other > cases may be a bit easier to solve, by, e.g., allowing structured > arrays with similarly structured units. I do note that actually doing > it would clarify, e.g., what the axes in Vandermonde (spelling?) > matrices mean. > (I have problems remembering the spelling of proper names) np.vander and various polyvander functions/methods One point I wanted to make is that the units are overhead and irrelevant in the computation. It's the outcome that might have units. Eg. polyfit could use various underlying polynomials, e.g. numpy.polynomial.chebyshev.chebvander(...) and various linear algebra and projection versions, and the output would still be the same units. aside: I just found an interesting http://docs.astropy.org/en/latest/api/astropy.stats.biweight.biweight_midcovariance.html is pairwise, but uses asanyarray e.g. using asarray (for robust scatter) https://github.com/statsmodels/statsmodels/pull/3230/files#diff-8fd46d3044db86ae7992f5d817eec6c7R473 I guess I would have problems replacing asarray by asanyarray. one last related one What's the inverse of a covariance matrix? It's just sum, multiplication and division (which I wouldn't remember), but for the computation is just np.linalg.inv or np.linalg.pinv which is a simple shortcut. Josef > > That said, there is truly an enormous benefit for checking units on > "regular" operations. Spacecraft have missed Mars because people > didn't do it properly... > https://twitter.com/search?q=2%20unit%20tests.%200%20integration%20tests > > All the best, > > Marten > > p.s. The scipy functions should indeed be included in the ufuncs > covered; there is a fairly long-standing issue for that in astropy... > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Thu, Nov 2, 2017 at 9:45 AM wrote: > similar, scipy.special has ufuncs > what units are those? > > Most code that I know (i.e. scipy.stats and statsmodels) does not use only > "normal mathematical operations with ufuncs" > I guess there are a lot of "abnormal" mathematical operations > where just simply propagating the units will not work. > > Aside: The problem is more general also for other datastructures. > E.g. statsmodels for most parts uses only numpy ndarrays inside the > algorithm and computations because that provides well defined > behavior. (e.g. pandas behaved too differently in many cases) > I don't have much idea yet about how to change the infrastructure to > allow the use of dask arrays, sparse matrices and similar and possibly > automatic differentiation. > This is the exact same reason why pandas and xarray do not support wrapping arbitrary ndarray subclasses or duck array types. The operations we use internally (on numpy.ndarray objects) may not be what you would expect externally, and may even be implementation details not considered part of the public API. For example, in xarray we use numpy.nanmean() or bottleneck.nanmean() instead of numpy.mean(). For NumPy and xarray, I think we could (and should) define an interface to support subclasses and duck types for generic operations for core use-cases. My main concern with subclasses / duck-arrays is undefined/untested behavior, especially where we might silently give the wrong answer or trigger some undesired operation (e.g., loading a lazily computed into memory) rather than raising an informative error. Leaking implementation details is another concern: we have already had several cases in NumPy where a function only worked on a subclass if a particular method was called internally, and broke when that was changed. ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Thu, Nov 2, 2017 at 2:37 PM, Stephan Hoyer wrote: > On Thu, Nov 2, 2017 at 9:45 AM wrote: > >> similar, scipy.special has ufuncs >> what units are those? >> >> Most code that I know (i.e. scipy.stats and statsmodels) does not use only >> "normal mathematical operations with ufuncs" >> I guess there are a lot of "abnormal" mathematical operations >> where just simply propagating the units will not work. >> > >> Aside: The problem is more general also for other datastructures. >> E.g. statsmodels for most parts uses only numpy ndarrays inside the >> algorithm and computations because that provides well defined >> behavior. (e.g. pandas behaved too differently in many cases) >> I don't have much idea yet about how to change the infrastructure to >> allow the use of dask arrays, sparse matrices and similar and possibly >> automatic differentiation. >> > > This is the exact same reason why pandas and xarray do not support > wrapping arbitrary ndarray subclasses or duck array types. The operations > we use internally (on numpy.ndarray objects) may not be what you would > expect externally, and may even be implementation details not considered > part of the public API. For example, in xarray we use numpy.nanmean() or > bottleneck.nanmean() instead of numpy.mean(). > > For NumPy and xarray, I think we could (and should) define an interface to > support subclasses and duck types for generic operations for core > use-cases. My main concern with subclasses / duck-arrays is > undefined/untested behavior, especially where we might silently give the > wrong answer or trigger some undesired operation (e.g., loading a lazily > computed into memory) rather than raising an informative error. Leaking > implementation details is another concern: we have already had several > cases in NumPy where a function only worked on a subclass if a particular > method was called internally, and broke when that was changed. > Would this issue be ameliorated given Nathaniel's proposal to try to move away from subclasses and towards storing data in dtypes? Or would that just mean that xarray would need to ban dtypes it doesn't know about? > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Python @ FOSDEM 2018
Dear SciPythonists and NumPythonists, FOSDEM is a free event for software developers to meet, share ideas and collaborate. Every year, 6500+ of developers of free and open source software from all over the world gather at the event in Brussels. Every year, 6500+ of developers of free and open source software from all over the world gather at the event in Brussels. For FOSDEM 2018, we will try the new concept of a virtual Python-devroom: there is no dedicated Python room but instead, we promote the presence of Python in all devrooms. We hope to have at least one Python talk in every devroom (Yes, even in Perl, Ada, Go and Rust devrooms ;-) ). How can you help to highlight the Python community at Python-FOSDEM 2018? Propose your talk in the closest related devroom: https://fosdem.org/2018/news/2017-10-04-accepted-developer-rooms/ Not all devrooms are language-specific and a number of topics come to mind for data and science participants: "Monitoring & Cloud devroom" https://lists.fosdem.org/pipermail/fosdem/2017-October/002631.html "HPC, Big Data, and Data Science" https://lists.fosdem.org/pipermail/fosdem/2017-October/002615.html "LLVM toolchain" https://lists.fosdem.org/pipermail/fosdem/2017-October/002624.html Most call for contributions end around the 24 of november. Send a copy of your proposition to python-devroom AT lists.fosdem DOT org. We will publish a dedicated schedule for Python on https://python-fosdem.org/ and at our stand. A dinner will be also organized, stay tuned. We are waiting for your talks proposals. The Python-FOSDEM committee ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
Numpy already does support a specific unit, datetime64 and timedelta64, think through that very mechanism. Its also probably the most complicated unit since at least there is no such thing as leap meters. And it works well and is very useful IMHO On Thu, Nov 2, 2017 at 3:40 PM, Nathan Goldbaum wrote: > > > On Thu, Nov 2, 2017 at 2:37 PM, Stephan Hoyer wrote: > >> On Thu, Nov 2, 2017 at 9:45 AM wrote: >> >>> similar, scipy.special has ufuncs >>> what units are those? >>> >>> Most code that I know (i.e. scipy.stats and statsmodels) does not use >>> only >>> "normal mathematical operations with ufuncs" >>> I guess there are a lot of "abnormal" mathematical operations >>> where just simply propagating the units will not work. >>> >> >>> Aside: The problem is more general also for other datastructures. >>> E.g. statsmodels for most parts uses only numpy ndarrays inside the >>> algorithm and computations because that provides well defined >>> behavior. (e.g. pandas behaved too differently in many cases) >>> I don't have much idea yet about how to change the infrastructure to >>> allow the use of dask arrays, sparse matrices and similar and possibly >>> automatic differentiation. >>> >> >> This is the exact same reason why pandas and xarray do not support >> wrapping arbitrary ndarray subclasses or duck array types. The operations >> we use internally (on numpy.ndarray objects) may not be what you would >> expect externally, and may even be implementation details not considered >> part of the public API. For example, in xarray we use numpy.nanmean() or >> bottleneck.nanmean() instead of numpy.mean(). >> >> For NumPy and xarray, I think we could (and should) define an interface >> to support subclasses and duck types for generic operations for core >> use-cases. My main concern with subclasses / duck-arrays is >> undefined/untested behavior, especially where we might silently give the >> wrong answer or trigger some undesired operation (e.g., loading a lazily >> computed into memory) rather than raising an informative error. Leaking >> implementation details is another concern: we have already had several >> cases in NumPy where a function only worked on a subclass if a particular >> method was called internally, and broke when that was changed. >> > > Would this issue be ameliorated given Nathaniel's proposal to try to move > away from subclasses and towards storing data in dtypes? Or would that just > mean that xarray would need to ban dtypes it doesn't know about? > > >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
My 2¢ here is that all code should feel free to assume certain type of input, as long as it is documented properly, but there is no reason to enforce that by, e.g., putting `asarray` everywhere. Then, for some pieces ducktypes and subclasses will just work like magic, and uses you might never have foreseen become possible. For others, whoever wants to use them has to do work (and up to a package maintainers to decide whether or not to accept PRs that implement hooks, etc.) I do see the argument that this way one becomes constrained in the internal implementation, as a change may break an outward-looking function, but while at times this may be inconvenient, in my experience at others it may just make one realize an even better implementation is possible. But then, I really like duck-typing... -- Marten ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
Duck typing is great and all for classes that implement some or all of the ndarray interface but remember what the main reason for asarray() and asanyarray(): to automatically promote lists and tuples and other "array-likes" to ndarrays. Ignoring the use-case of lists of lists is problematic at best. Ben Root On Thu, Nov 2, 2017 at 5:05 PM, Marten van Kerkwijk < m.h.vankerkw...@gmail.com> wrote: > My 2¢ here is that all code should feel free to assume certain type of > input, as long as it is documented properly, but there is no reason to > enforce that by, e.g., putting `asarray` everywhere. Then, for some > pieces ducktypes and subclasses will just work like magic, and uses > you might never have foreseen become possible. For others, whoever > wants to use them has to do work (and up to a package maintainers to > decide whether or not to accept PRs that implement hooks, etc.) > > I do see the argument that this way one becomes constrained in the > internal implementation, as a change may break an outward-looking > function, but while at times this may be inconvenient, in my > experience at others it may just make one realize an even better > implementation is possible. But then, I really like duck-typing... > > -- Marten > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Thu, Nov 2, 2017 at 5:09 PM, Benjamin Root wrote: > Duck typing is great and all for classes that implement some or all of the > ndarray interface but remember what the main reason for asarray() and > asanyarray(): to automatically promote lists and tuples and other > "array-likes" to ndarrays. Ignoring the use-case of lists of lists is > problematic at best. How I wish numpy had never gone there! Convenience for what, exactly? For the user not having to put `array()` around the list themselves? We slow down everything for that? And even now we're trying to remove some of the cases where both tuples and lists are allowed. Grr. Of course, we are well and truly stuck with it - now it is one of the main reasons to subclass rather than duck-type... Anyway, water under the bridge... -- Marten ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Thu, Nov 2, 2017 at 5:09 PM, Benjamin Root wrote: > Duck typing is great and all for classes that implement some or all of the > ndarray interface but remember what the main reason for asarray() and > asanyarray(): to automatically promote lists and tuples and other > "array-likes" to ndarrays. Ignoring the use-case of lists of lists is > problematic at best. > > Ben Root > > > On Thu, Nov 2, 2017 at 5:05 PM, Marten van Kerkwijk < > m.h.vankerkw...@gmail.com> wrote: > >> My 2¢ here is that all code should feel free to assume certain type of >> input, as long as it is documented properly, but there is no reason to >> enforce that by, e.g., putting `asarray` everywhere. Then, for some >> pieces ducktypes and subclasses will just work like magic, and uses >> you might never have foreseen become possible. For others, whoever >> wants to use them has to do work (and up to a package maintainers to >> decide whether or not to accept PRs that implement hooks, etc.) >> >> I do see the argument that this way one becomes constrained in the >> internal implementation, as a change may break an outward-looking >> function, but while at times this may be inconvenient, in my >> experience at others it may just make one realize an even better >> implementation is possible. But then, I really like duck-typing... >> > One problem in general is that there is no protocol about what operations are implemented in a numpy ndarray equivalent way in those ducks, i.e. if they quack in a compatible way. One small example, pandas standard deviation, std, used by default ddof=1, and didn't have an option to override it instead of using ddof=0 that numpy uses. So even though we could call a std method of the ducks, the t-test results would be a bit different and visibly different in small samples depending on the type of the data. A possible alternative would be to compute std from scratch and forgo the available function or method. I tried once in the scipy.zscore function to be agnostic about the type and not use asarray, it's a simple operation but still it required special handling of numpy matrices because it preserves the dimension in reduce operations. After more than a few lines it is difficult to keep track of what type is no used. Another subclass that is often broken in default code are masked arrays because asarray throws away the mask. But asanyarray wouldn't work always either because the mask needs code for handling the masked values. For example scipy.stats ended up with separate functions for masked arrays. Josef > >> -- Marten >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Thu, Nov 2, 2017 at 12:42 PM Nathan Goldbaum wrote: > Would this issue be ameliorated given Nathaniel's proposal to try to move > away from subclasses and towards storing data in dtypes? Or would that just > mean that xarray would need to ban dtypes it doesn't know about? > Yes, I think custom dtypes would definitely help. Custom dtypes have a well contained interface, so lots of operations (e.g., concatenate, reshaping, indexing) are guaranteed to work in a dtype independent way. If you try to do an unsupported operation for such a dtype (e.g., np.datetime64), you will generally get a good error message about an invalid dtype. In contrast, you can overload a subclass with totally arbitrary semantics (e.g., np.matrix) and of course for duck-types as well. This makes a big difference for libraries like dask or xarray, which need a standard interface to guarantee they do the right thing. I'm pretty sure we can wrap a custom dtype ndarray with units, but there's no way we're going to support np.matrix without significant work. It's hard to know which is which without well defined interfaces. ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Thu, Nov 2, 2017 at 5:21 PM, Stephan Hoyer wrote: > On Thu, Nov 2, 2017 at 12:42 PM Nathan Goldbaum > wrote: > >> Would this issue be ameliorated given Nathaniel's proposal to try to move >> away from subclasses and towards storing data in dtypes? Or would that just >> mean that xarray would need to ban dtypes it doesn't know about? >> > > Yes, I think custom dtypes would definitely help. Custom dtypes have a > well contained interface, so lots of operations (e.g., concatenate, > reshaping, indexing) are guaranteed to work in a dtype independent way. If > you try to do an unsupported operation for such a dtype (e.g., > np.datetime64), you will generally get a good error message about an > invalid dtype. > > In contrast, you can overload a subclass with totally arbitrary semantics > (e.g., np.matrix) and of course for duck-types as well. > > This makes a big difference for libraries like dask or xarray, which need > a standard interface to guarantee they do the right thing. I'm pretty sure > we can wrap a custom dtype ndarray with units, but there's no way we're > going to support np.matrix without significant work. It's hard to know > which is which without well defined interfaces. > Ah, but what if the dtype modifies the interface? That might sound evil, but it's something that's been proposed. For example, if I wanted to replace yt's YTArray in a backward compatibile way with a dtype and just use plain ndarrays everywhere, the dtype would need to *at least* modify ndarray's API, adding e.g. to(), convert_to_unit(), a units attribute, and several other things. Of course if I don't care about backward compatibility I can just do all of these operations on the dtype object itself. However, I suspect whatever implementation of custom dtypes gets added to numpy, it will have the property that it can act like an arbitrary ndarray subclass otherwise libraries like yt, Pint, metpy, and astropy won't be able to switch to it. -Nathan > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
I guess my argument boils down to it being better to state that a function only accepts arrays and happily let it break on, e.g., matrix, than use `asarray` to make a matrix into an array even though it really isn't. I do like the dtype ideas, but think I'd agree they're likely to come with their own problems. But just making new numerical types possible is interesting. -- Marten ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
Maybe the best of both worlds would require explicit opt-in for classes that shouldn't be coerced, e.g., xarray.register_data_type(MyArray) or maybe better yet ;) xarray.void_my_nonexistent_warranty_its_my_fault_if_my_buggy_duck_array_breaks_everything(MyArray) On Thu, Nov 2, 2017 at 3:39 PM Marten van Kerkwijk < m.h.vankerkw...@gmail.com> wrote: > I guess my argument boils down to it being better to state that a > function only accepts arrays and happily let it break on, e.g., > matrix, than use `asarray` to make a matrix into an array even though > it really isn't. > > I do like the dtype ideas, but think I'd agree they're likely to come > with their own problems. But just making new numerical types possible > is interesting. > > -- Marten > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Thu, Nov 2, 2017 at 3:35 PM Nathan Goldbaum wrote: > Ah, but what if the dtype modifies the interface? That might sound evil, > but it's something that's been proposed. For example, if I wanted to > replace yt's YTArray in a backward compatibile way with a dtype and just > use plain ndarrays everywhere, the dtype would need to *at least* modify > ndarray's API, adding e.g. to(), convert_to_unit(), a units attribute, and > several other things. > I suppose we'll need to sort this out. But adding new methods/properties feels pretty safe to me, as long as existing ones are guaranteed to work in the same way. ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion