Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread josef . pktd
On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum 
wrote:

> I think the biggest issues could be resolved if __array_concatenate__ were
> finished. Unfortunately I don't feel like I can take that on right now.
>
> See Ryan May's talk at scipy about using an ndarray subclass for units and
> the issues he's run into:
>
> https://www.youtube.com/watch?v=qCo9bkT9sow
>


Interesting talk, but I don't see how general library code should know what
units the output has.
for example if units are some flows per unit of time and we average, sum or
integrate over time, then what are the new units? (e.g. pandas time
aggregation)
What are units of covariance or correlation between two variables with the
same units, and what are they between variables with different units?

How do you concatenate and operate arrays with different units?

interpolation or prediction would work with using the existing units.

partially related:
statsmodels uses a wrapper for pandas Series and DataFrames and tries to
preserve the index when possible and make up a new DataFrame or Series if
the existing index doesn't apply.
E.g. predicted values and residuals are in terms of the original provided
index, and could also get original units assigned. That would also be
possible with prediction confidence intervals. But for the rest, see above.

Josef


>
>
> On Wed, Nov 1, 2017 at 5:50 PM, Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> From my experience with Quantity, routines that properly ducktype work
>> well, those that feel the need to accept list and blatantly do
>> `asarray` do not - even if in many cases they would have worked if
>> they used `asanyarray`...  But there are lots of nice surprises, with,
>> e.g., `np.fft.fftfreq` just working as one would hope.  Anyway, bottom
>> line, I think you should let this stop you from trying only if you
>> know something important does not work. -- Marten
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread josef . pktd
On Thu, Nov 2, 2017 at 8:46 AM,  wrote:

>
>
> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum 
> wrote:
>
>> I think the biggest issues could be resolved if __array_concatenate__
>> were finished. Unfortunately I don't feel like I can take that on right now.
>>
>> See Ryan May's talk at scipy about using an ndarray subclass for units
>> and the issues he's run into:
>>
>> https://www.youtube.com/watch?v=qCo9bkT9sow
>>
>
>
> Interesting talk, but I don't see how general library code should know
> what units the output has.
> for example if units are some flows per unit of time and we average, sum
> or integrate over time, then what are the new units? (e.g. pandas time
> aggregation)
> What are units of covariance or correlation between two variables with the
> same units, and what are they between variables with different units?
>
> How do you concatenate and operate arrays with different units?
>
> interpolation or prediction would work with using the existing units.
>
> partially related:
> statsmodels uses a wrapper for pandas Series and DataFrames and tries to
> preserve the index when possible and make up a new DataFrame or Series if
> the existing index doesn't apply.
> E.g. predicted values and residuals are in terms of the original provided
> index, and could also get original units assigned. That would also be
> possible with prediction confidence intervals. But for the rest, see above.
>

using pint

>>> x

>>> x / x


>>> x / (1 + x)
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", line
669, in __add__
raise DimensionalityError(self._units, 'dimensionless')
return self._add_sub(other, operator.add)
  File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", line
580, in _add_sub
pint.errors.DimensionalityError: Cannot convert from 'meter' to
'dimensionless'

np.exp(x)
raises
pint.errors.DimensionalityError: Cannot convert from 'meter' ([length]) to
'dimensionless' (dimensionless)


Josef



>
> Josef
>
>
>>
>>
>> On Wed, Nov 1, 2017 at 5:50 PM, Marten van Kerkwijk <
>> m.h.vankerkw...@gmail.com> wrote:
>>
>>> From my experience with Quantity, routines that properly ducktype work
>>> well, those that feel the need to accept list and blatantly do
>>> `asarray` do not - even if in many cases they would have worked if
>>> they used `asanyarray`...  But there are lots of nice surprises, with,
>>> e.g., `np.fft.fftfreq` just working as one would hope.  Anyway, bottom
>>> line, I think you should let this stop you from trying only if you
>>> know something important does not work. -- Marten
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Marten van Kerkwijk
Hi Josef,

astropy's Quantity is well developed and would give similar results to
pint; all those results  make sense if one wants to have consistent
units. A general library code will actually do the right thing as long
as it just uses normal mathematical operations with ufuncs - and as
long as it just duck types! - the unit code will then override and
properly propagate units to outputs, as can be seen in this example:
```
import astropy.units as u
np.fft.fftfreq(8, 1*u.min)
# 
np.fft.fftfreq(8, 1*u.min).var()
# 
```

> for example if units are some flows per unit of time and we average, sum or 
> integrate over time, then what are the new units? (e.g. pandas time 
> aggregation)

The units module will force you to take into account `dt`! This is in
fact one reason why it is so powerful. So, your example might go
something like:
```
flow = [1., 1.5, 1.5] * u.g / u.s
dt = [0.5, 0.5, 1.] * u.hr
np.sum(flow * dt)
# 
np.sum(flow * dt).to(u.kg)
# 
```

> How do you concatenate and operate arrays with different units?

This is where Nathaniel's `__array_concatenate__` would come in. For
regular arrays it is fine to just concatenate, but for almost anything
else you need a different approach. For quantities, the most logical
one would be to first create an empty array of the right size with the
unit of, e.g., the first part to be concatenated, and then set
sections to the input quantities (where the setter does unit
conversion and will fail if that is not possible).

All the best,

Marten

p.s. A fun subject is what to do with logarithmic units, such as the
magnitudes in astronomy... We have a module for that as well;
http://docs.astropy.org/en/latest/units/logarithmic_units.html
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Ryan May
On Thu, Nov 2, 2017 at 6:46 AM,  wrote:

>
>
> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum 
> wrote:
>
>> I think the biggest issues could be resolved if __array_concatenate__
>> were finished. Unfortunately I don't feel like I can take that on right now.
>>
>> See Ryan May's talk at scipy about using an ndarray subclass for units
>> and the issues he's run into:
>>
>> https://www.youtube.com/watch?v=qCo9bkT9sow
>>
>
>
> Interesting talk, but I don't see how general library code should know
> what units the output has.
> for example if units are some flows per unit of time and we average, sum
> or integrate over time, then what are the new units? (e.g. pandas time
> aggregation)
>

A general library doesn't have to do anything--just not do annoying things
like isinstance() checks and calling np.asarray() everywhere. Honestly one
of those is the core of most of the problems I run into. It's currently
more complicated when doing things in compiled land, but that's
implementation details, not any kind of fundamental incompatibility.

For basic mathematical operations, units have perfectly well defined
semantics that many of us encountered in an introductory physics or
chemistry class:
- Want to add or subtract two things? They need to have the same units; a
units library can handle conversion provided they have the same
dimensionality (e.g. length, time)
- Multiplication/Divison: combine and cancel units ( m/s * s -> m)

Everything else we do on a computer with data in some way boils down to:
add, subtract, multiply, divide.

Average keeps the same units -- it's just a sum and division by a unit-less
constant
Integration (in 1-D) involves *two* variables, your data as well as the
time/space coordinates (or dx or dt); fundamentally it's a multiplication
by dx and a summation. The units results then are e.g. data.units *
dx.units. This works just like it does in Physics 101 where you integrate
velocity (i.e. m/s) over time (e.g. s) and get displacement (e.g. m)

What are units of covariance or correlation between two variables with the
> same units, and what are they between variables with different units?
>

Well, covariance is subtracting the mean from each variable and multiplying
the residuals; therefore the units for cov(x, y):

(x.units - x.units) * (y.units - y.units) -> x.units * y.units

Correlation takes covariance and divides by the product of the standard
deviations, so that's:

(x.units * y.units) / (x.units * y.units) -> dimensionless

Which is what I'd expect for a correlation.


> How do you concatenate and operate arrays with different units?
>

If all arrays have compatible dimensionality (say meters, inches, miles),
you convert to one (say the first) and concatenate like normal. If they're
not compatible, you error out.


> interpolation or prediction would work with using the existing units.
>

I'm sure you wrote that thinking units didn't play a role, but the math
behind those operations works perfectly fine with units, with things
cancelling out properly to give the same units out as in.

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread josef . pktd
On Thu, Nov 2, 2017 at 11:51 AM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Josef,
>
> astropy's Quantity is well developed and would give similar results to
> pint; all those results  make sense if one wants to have consistent
> units. A general library code will actually do the right thing as long
> as it just uses normal mathematical operations with ufuncs - and as
> long as it just duck types! - the unit code will then override and
> properly propagate units to outputs, as can be seen in this example:
> ```
> import astropy.units as u
> np.fft.fftfreq(8, 1*u.min)
> #  min>
> np.fft.fftfreq(8, 1*u.min).var()
> # 
> ```
>
> > for example if units are some flows per unit of time and we average, sum
> or integrate over time, then what are the new units? (e.g. pandas time
> aggregation)
>
> The units module will force you to take into account `dt`! This is in
> fact one reason why it is so powerful. So, your example might go
> something like:
> ```
> flow = [1., 1.5, 1.5] * u.g / u.s
> dt = [0.5, 0.5, 1.] * u.hr
> np.sum(flow * dt)
> # 
> np.sum(flow * dt).to(u.kg)
> # 
> ```
>
> > How do you concatenate and operate arrays with different units?
>
> This is where Nathaniel's `__array_concatenate__` would come in. For
> regular arrays it is fine to just concatenate, but for almost anything
> else you need a different approach. For quantities, the most logical
> one would be to first create an empty array of the right size with the
> unit of, e.g., the first part to be concatenated, and then set
> sections to the input quantities (where the setter does unit
> conversion and will fail if that is not possible).
>

For example,
"will fail if that is not possible"
rules out inhomogeneous arrays (analogous to structure dtypes)

How to you get a vander matrix for something simple like a polynomial fit?

x[:, None] ** np.arange(3)


>
> All the best,
>
> Marten
>
> p.s. A fun subject is what to do with logarithmic units, such as the
> magnitudes in astronomy... We have a module for that as well;
> http://docs.astropy.org/en/latest/units/logarithmic_units.html


similar, scipy.special has ufuncs
what units are those?

Most code that I know (i.e. scipy.stats and statsmodels) does not use only
"normal mathematical operations with ufuncs"
I guess there are a lot of "abnormal" mathematical operations
where just simply propagating the units will not work.


Aside: The problem is more general also for other datastructures.
E.g. statsmodels for most parts uses only numpy ndarrays inside the
algorithm and computations because that provides well defined
behavior. (e.g. pandas behaved too differently in many cases)
I don't have much idea yet about how to change the infrastructure to
allow the use of dask arrays, sparse matrices and similar and possibly
automatic differentiation.

Josef


>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Ryan May
On Thu, Nov 2, 2017 at 6:56 AM,  wrote:

> On Thu, Nov 2, 2017 at 8:46 AM,  wrote:
>
>> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum 
>> wrote:
>>
>>> I think the biggest issues could be resolved if __array_concatenate__
>>> were finished. Unfortunately I don't feel like I can take that on right now.
>>>
>>> See Ryan May's talk at scipy about using an ndarray subclass for units
>>> and the issues he's run into:
>>>
>>> https://www.youtube.com/watch?v=qCo9bkT9sow
>>>
>>
>>
>> Interesting talk, but I don't see how general library code should know
>> what units the output has.
>> for example if units are some flows per unit of time and we average, sum
>> or integrate over time, then what are the new units? (e.g. pandas time
>> aggregation)
>> What are units of covariance or correlation between two variables with
>> the same units, and what are they between variables with different units?
>>
>> How do you concatenate and operate arrays with different units?
>>
>> interpolation or prediction would work with using the existing units.
>>
>> partially related:
>> statsmodels uses a wrapper for pandas Series and DataFrames and tries to
>> preserve the index when possible and make up a new DataFrame or Series if
>> the existing index doesn't apply.
>> E.g. predicted values and residuals are in terms of the original provided
>> index, and could also get original units assigned. That would also be
>> possible with prediction confidence intervals. But for the rest, see above.
>>
>
> using pint
>
> >>> x
> 
> >>> x / x
> 
>
> >>> x / (1 + x)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py",
> line 669, in __add__
> raise DimensionalityError(self._units, 'dimensionless')
> return self._add_sub(other, operator.add)
>   File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py",
> line 580, in _add_sub
> pint.errors.DimensionalityError: Cannot convert from 'meter' to
> 'dimensionless'
>

I'm not sure why you have a problem with that results. You tried to take a
number in meters and add a dimensionless value to that--that's not a
defined operation. That's like saying: "I have a distance of 12 meters and
added 1 to it." 1 what? 1 meter? Great. 1 centimeter? I need to convert,
but I can do that operation. 1 second? That makes no sense.

If you add units to the 1 then it's a defined operation:

>>> reg = pint.UnitRegistry()
>>> x / (1 * ureg.meters + x)



> np.exp(x)
> raises
> pint.errors.DimensionalityError: Cannot convert from 'meter' ([length])
> to 'dimensionless' (dimensionless)
>

Well, the Taylor series for exp (around a=0) is:

exp(x) = 1 + x + x**2 / 2 + x**3 / 6 + ...

so for that to properly add up, x needs to be dimensionless. It should be
noted, though, that I've *never* seen a formula, theoretically derived or
empirically fit, require directly taking exp(x) where x is a physical
quantity with units. Instead, you have:

f = a * exp(kx)

Properly calculated values for a, k will have appropriate units attached to
them that allows the calculation to proceed without error

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread josef . pktd
On Thu, Nov 2, 2017 at 12:23 PM, Ryan May  wrote:

> On Thu, Nov 2, 2017 at 6:46 AM,  wrote:
>
>>
>>
>> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum 
>> wrote:
>>
>>> I think the biggest issues could be resolved if __array_concatenate__
>>> were finished. Unfortunately I don't feel like I can take that on right now.
>>>
>>> See Ryan May's talk at scipy about using an ndarray subclass for units
>>> and the issues he's run into:
>>>
>>> https://www.youtube.com/watch?v=qCo9bkT9sow
>>>
>>
>>
>> Interesting talk, but I don't see how general library code should know
>> what units the output has.
>> for example if units are some flows per unit of time and we average, sum
>> or integrate over time, then what are the new units? (e.g. pandas time
>> aggregation)
>>
>
> A general library doesn't have to do anything--just not do annoying things
> like isinstance() checks and calling np.asarray() everywhere. Honestly one
> of those is the core of most of the problems I run into. It's currently
> more complicated when doing things in compiled land, but that's
> implementation details, not any kind of fundamental incompatibility.
>
> For basic mathematical operations, units have perfectly well defined
> semantics that many of us encountered in an introductory physics or
> chemistry class:
> - Want to add or subtract two things? They need to have the same units; a
> units library can handle conversion provided they have the same
> dimensionality (e.g. length, time)
> - Multiplication/Divison: combine and cancel units ( m/s * s -> m)
>
> Everything else we do on a computer with data in some way boils down to:
> add, subtract, multiply, divide.
>
> Average keeps the same units -- it's just a sum and division by a
> unit-less constant
> Integration (in 1-D) involves *two* variables, your data as well as the
> time/space coordinates (or dx or dt); fundamentally it's a multiplication
> by dx and a summation. The units results then are e.g. data.units *
> dx.units. This works just like it does in Physics 101 where you integrate
> velocity (i.e. m/s) over time (e.g. s) and get displacement (e.g. m)
>
> What are units of covariance or correlation between two variables with the
>> same units, and what are they between variables with different units?
>>
>
> Well, covariance is subtracting the mean from each variable and
> multiplying the residuals; therefore the units for cov(x, y):
>
> (x.units - x.units) * (y.units - y.units) -> x.units * y.units
>
> Correlation takes covariance and divides by the product of the standard
> deviations, so that's:
>
> (x.units * y.units) / (x.units * y.units) -> dimensionless
>
> Which is what I'd expect for a correlation.
>
>
>> How do you concatenate and operate arrays with different units?
>>
>
> If all arrays have compatible dimensionality (say meters, inches, miles),
> you convert to one (say the first) and concatenate like normal. If they're
> not compatible, you error out.
>
>
>> interpolation or prediction would work with using the existing units.
>>
>
> I'm sure you wrote that thinking units didn't play a role, but the math
> behind those operations works perfectly fine with units, with things
> cancelling out properly to give the same units out as in.
>

Some of it is in my reply to Marten.

regression and polyfit requires an X matrix with different units and then
some linear algebra like solve, pinv or svd.

So, while the predicted values have well defined units, the computation
involves some messier operations, unless you want to forgo linear algebra
in all intermediate step and reduce it to sum, division and inverse.

Josef


>
> Ryan
>
> --
> Ryan May
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread josef . pktd
On Thu, Nov 2, 2017 at 12:46 PM, Ryan May  wrote:

> On Thu, Nov 2, 2017 at 6:56 AM,  wrote:
>
>> On Thu, Nov 2, 2017 at 8:46 AM,  wrote:
>>
>>> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum 
>>> wrote:
>>>
 I think the biggest issues could be resolved if __array_concatenate__
 were finished. Unfortunately I don't feel like I can take that on right 
 now.

 See Ryan May's talk at scipy about using an ndarray subclass for units
 and the issues he's run into:

 https://www.youtube.com/watch?v=qCo9bkT9sow

>>>
>>>
>>> Interesting talk, but I don't see how general library code should know
>>> what units the output has.
>>> for example if units are some flows per unit of time and we average, sum
>>> or integrate over time, then what are the new units? (e.g. pandas time
>>> aggregation)
>>> What are units of covariance or correlation between two variables with
>>> the same units, and what are they between variables with different units?
>>>
>>> How do you concatenate and operate arrays with different units?
>>>
>>> interpolation or prediction would work with using the existing units.
>>>
>>> partially related:
>>> statsmodels uses a wrapper for pandas Series and DataFrames and tries to
>>> preserve the index when possible and make up a new DataFrame or Series if
>>> the existing index doesn't apply.
>>> E.g. predicted values and residuals are in terms of the original
>>> provided index, and could also get original units assigned. That would also
>>> be possible with prediction confidence intervals. But for the rest, see
>>> above.
>>>
>>
>> using pint
>>
>> >>> x
>> 
>> >>> x / x
>> 
>>
>> >>> x / (1 + x)
>> Traceback (most recent call last):
>>   File "", line 1, in 
>>   File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py",
>> line 669, in __add__
>> raise DimensionalityError(self._units, 'dimensionless')
>> return self._add_sub(other, operator.add)
>>   File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py",
>> line 580, in _add_sub
>> pint.errors.DimensionalityError: Cannot convert from 'meter' to
>> 'dimensionless'
>>
>
> I'm not sure why you have a problem with that results. You tried to take a
> number in meters and add a dimensionless value to that--that's not a
> defined operation. That's like saying: "I have a distance of 12 meters and
> added 1 to it." 1 what? 1 meter? Great. 1 centimeter? I need to convert,
> but I can do that operation. 1 second? That makes no sense.
>
> If you add units to the 1 then it's a defined operation:
>
> >>> reg = pint.UnitRegistry()
> >>> x / (1 * ureg.meters + x)
>  'dimensionless')>
>
>
>> np.exp(x)
>> raises
>> pint.errors.DimensionalityError: Cannot convert from 'meter' ([length])
>> to 'dimensionless' (dimensionless)
>>
>
> Well, the Taylor series for exp (around a=0) is:
>
> exp(x) = 1 + x + x**2 / 2 + x**3 / 6 + ...
>
> so for that to properly add up, x needs to be dimensionless. It should be
> noted, though, that I've *never* seen a formula, theoretically derived or
> empirically fit, require directly taking exp(x) where x is a physical
> quantity with units. Instead, you have:
>
> f = a * exp(kx)
>
> Properly calculated values for a, k will have appropriate units attached
> to them that allows the calculation to proceed without error
>

I was thinking of a simple logit model to predict whether it rains tomorrow
The Logit transformation for the probability is exp(k x) / (1 + exp(k x)
where k is a parameter to search for in the optimization.

x is a matrix with all predictors or explanatory variables which could all
have different units.

So it sounds to me if we drop asarray, then we just get exceptions or
possibly strange results, or we have to introduce a unit that matches
everything (like a joker card) for any constants that we are using.

Josef



>
> Ryan
>
> --
> Ryan May
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Marten van Kerkwijk
Hi Josef,

Indeed, for some applications one would like to have different units
for different parts of an array. And that means that, at present, the
quantity implementations that we have are no good at storing, say, a
covariance matrix involving parameters with different units, where
thus each element of the covariance matrix has a different unit. I
fear at present it would have to be an object array instead; other
cases may be a bit easier to solve, by, e.g., allowing structured
arrays with similarly structured units. I do note that actually doing
it would clarify, e.g., what the axes in Vandermonde (spelling?)
matrices mean.

That said, there is truly an enormous benefit for checking units on
"regular" operations. Spacecraft have missed Mars because people
didn't do it properly...

All the best,

Marten

p.s. The scipy functions should indeed be included in the ufuncs
covered; there is a fairly long-standing issue for that in astropy...
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread josef . pktd
On Thu, Nov 2, 2017 at 2:39 PM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Josef,
>
> Indeed, for some applications one would like to have different units
> for different parts of an array. And that means that, at present, the
> quantity implementations that we have are no good at storing, say, a
> covariance matrix involving parameters with different units, where
> thus each element of the covariance matrix has a different unit. I
> fear at present it would have to be an object array instead; other
> cases may be a bit easier to solve, by, e.g., allowing structured
> arrays with similarly structured units. I do note that actually doing
> it would clarify, e.g., what the axes in Vandermonde (spelling?)
> matrices mean.
>

(I have problems remembering the spelling of proper names)
np.vander and various polyvander functions/methods

One point I wanted to make is that the units are overhead and irrelevant in
the computation. It's the outcome that might have units.
Eg. polyfit could use various underlying polynomials,
e.g. numpy.polynomial.chebyshev.chebvander(...) and various linear algebra
and projection versions, and the output would still be the same units.

aside: I just found an interesting
http://docs.astropy.org/en/latest/api/astropy.stats.biweight.biweight_midcovariance.html
is pairwise, but uses asanyarray

e.g. using asarray (for robust scatter)
https://github.com/statsmodels/statsmodels/pull/3230/files#diff-8fd46d3044db86ae7992f5d817eec6c7R473
I guess I would have problems replacing asarray by asanyarray.

one last related one
What's the inverse of a covariance matrix? It's just sum, multiplication
and division (which I wouldn't remember), but for the computation is just
np.linalg.inv or np.linalg.pinv which is a simple shortcut.


Josef


>
> That said, there is truly an enormous benefit for checking units on
> "regular" operations. Spacecraft have missed Mars because people
> didn't do it properly...
>

https://twitter.com/search?q=2%20unit%20tests.%200%20integration%20tests



>
> All the best,
>
> Marten
>
> p.s. The scipy functions should indeed be included in the ufuncs
> covered; there is a fairly long-standing issue for that in astropy...
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Stephan Hoyer
On Thu, Nov 2, 2017 at 9:45 AM  wrote:

> similar, scipy.special has ufuncs
> what units are those?
>
> Most code that I know (i.e. scipy.stats and statsmodels) does not use only
> "normal mathematical operations with ufuncs"
> I guess there are a lot of "abnormal" mathematical operations
> where just simply propagating the units will not work.
>

> Aside: The problem is more general also for other datastructures.
> E.g. statsmodels for most parts uses only numpy ndarrays inside the
> algorithm and computations because that provides well defined
> behavior. (e.g. pandas behaved too differently in many cases)
> I don't have much idea yet about how to change the infrastructure to
> allow the use of dask arrays, sparse matrices and similar and possibly
> automatic differentiation.
>

This is the exact same reason why pandas and xarray do not support wrapping
arbitrary ndarray subclasses or duck array types. The operations we use
internally (on numpy.ndarray objects) may not be what you would expect
externally, and may even be implementation details not considered part of
the public API. For example, in xarray we use numpy.nanmean() or
bottleneck.nanmean() instead of numpy.mean().

For NumPy and xarray, I think we could (and should) define an interface to
support subclasses and duck types for generic operations for core
use-cases. My main concern with subclasses / duck-arrays is
undefined/untested behavior, especially where we might silently give the
wrong answer or trigger some undesired operation (e.g., loading a lazily
computed into memory) rather than raising an informative error. Leaking
implementation details is another concern: we have already had several
cases in NumPy where a function only worked on a subclass if a particular
method was called internally, and broke when that was changed.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Nathan Goldbaum
On Thu, Nov 2, 2017 at 2:37 PM, Stephan Hoyer  wrote:

> On Thu, Nov 2, 2017 at 9:45 AM  wrote:
>
>> similar, scipy.special has ufuncs
>> what units are those?
>>
>> Most code that I know (i.e. scipy.stats and statsmodels) does not use only
>> "normal mathematical operations with ufuncs"
>> I guess there are a lot of "abnormal" mathematical operations
>> where just simply propagating the units will not work.
>>
>
>> Aside: The problem is more general also for other datastructures.
>> E.g. statsmodels for most parts uses only numpy ndarrays inside the
>> algorithm and computations because that provides well defined
>> behavior. (e.g. pandas behaved too differently in many cases)
>> I don't have much idea yet about how to change the infrastructure to
>> allow the use of dask arrays, sparse matrices and similar and possibly
>> automatic differentiation.
>>
>
> This is the exact same reason why pandas and xarray do not support
> wrapping arbitrary ndarray subclasses or duck array types. The operations
> we use internally (on numpy.ndarray objects) may not be what you would
> expect externally, and may even be implementation details not considered
> part of the public API. For example, in xarray we use numpy.nanmean() or
> bottleneck.nanmean() instead of numpy.mean().
>
> For NumPy and xarray, I think we could (and should) define an interface to
> support subclasses and duck types for generic operations for core
> use-cases. My main concern with subclasses / duck-arrays is
> undefined/untested behavior, especially where we might silently give the
> wrong answer or trigger some undesired operation (e.g., loading a lazily
> computed into memory) rather than raising an informative error. Leaking
> implementation details is another concern: we have already had several
> cases in NumPy where a function only worked on a subclass if a particular
> method was called internally, and broke when that was changed.
>

Would this issue be ameliorated given Nathaniel's proposal to try to move
away from subclasses and towards storing data in dtypes? Or would that just
mean that xarray would need to ban dtypes it doesn't know about?


>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Python @ FOSDEM 2018

2017-11-02 Thread Pierre de Buyl
Dear SciPythonists and NumPythonists,

FOSDEM is a free event for software developers to meet, share ideas and
collaborate. Every year, 6500+ of developers of free and open source software
from all over the world gather at the event in Brussels.

Every year, 6500+ of developers of free and open source software from all over
the world gather at the event in Brussels.

For FOSDEM 2018, we will try the new concept of a virtual Python-devroom: there
is no dedicated Python room but instead, we promote the presence of Python in
all devrooms. We hope to have at least one Python talk in every devroom (Yes,
even in Perl, Ada, Go and Rust devrooms ;-) ).

How can you help to highlight the Python community at Python-FOSDEM 2018?
Propose your talk in the closest related devroom:
https://fosdem.org/2018/news/2017-10-04-accepted-developer-rooms/

Not all devrooms are language-specific and a number of topics come to mind for
data and science participants:
"Monitoring & Cloud devroom" 
https://lists.fosdem.org/pipermail/fosdem/2017-October/002631.html
"HPC, Big Data, and Data Science"
https://lists.fosdem.org/pipermail/fosdem/2017-October/002615.html
"LLVM toolchain"
https://lists.fosdem.org/pipermail/fosdem/2017-October/002624.html

Most call for contributions end around the 24 of november.

Send a copy of your proposition to python-devroom AT lists.fosdem DOT org. We 
will
publish a dedicated schedule for Python on https://python-fosdem.org/ and at our
stand.

A dinner will be also organized, stay tuned.

We are waiting for your talks proposals.

The Python-FOSDEM committee

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Matthew Harrigan
Numpy already does support a specific unit, datetime64 and timedelta64,
think through that very mechanism.  Its also probably the most complicated
unit since at least there is no such thing as leap meters.  And it works
well and is very useful IMHO

On Thu, Nov 2, 2017 at 3:40 PM, Nathan Goldbaum 
wrote:

>
>
> On Thu, Nov 2, 2017 at 2:37 PM, Stephan Hoyer  wrote:
>
>> On Thu, Nov 2, 2017 at 9:45 AM  wrote:
>>
>>> similar, scipy.special has ufuncs
>>> what units are those?
>>>
>>> Most code that I know (i.e. scipy.stats and statsmodels) does not use
>>> only
>>> "normal mathematical operations with ufuncs"
>>> I guess there are a lot of "abnormal" mathematical operations
>>> where just simply propagating the units will not work.
>>>
>>
>>> Aside: The problem is more general also for other datastructures.
>>> E.g. statsmodels for most parts uses only numpy ndarrays inside the
>>> algorithm and computations because that provides well defined
>>> behavior. (e.g. pandas behaved too differently in many cases)
>>> I don't have much idea yet about how to change the infrastructure to
>>> allow the use of dask arrays, sparse matrices and similar and possibly
>>> automatic differentiation.
>>>
>>
>> This is the exact same reason why pandas and xarray do not support
>> wrapping arbitrary ndarray subclasses or duck array types. The operations
>> we use internally (on numpy.ndarray objects) may not be what you would
>> expect externally, and may even be implementation details not considered
>> part of the public API. For example, in xarray we use numpy.nanmean() or
>> bottleneck.nanmean() instead of numpy.mean().
>>
>> For NumPy and xarray, I think we could (and should) define an interface
>> to support subclasses and duck types for generic operations for core
>> use-cases. My main concern with subclasses / duck-arrays is
>> undefined/untested behavior, especially where we might silently give the
>> wrong answer or trigger some undesired operation (e.g., loading a lazily
>> computed into memory) rather than raising an informative error. Leaking
>> implementation details is another concern: we have already had several
>> cases in NumPy where a function only worked on a subclass if a particular
>> method was called internally, and broke when that was changed.
>>
>
> Would this issue be ameliorated given Nathaniel's proposal to try to move
> away from subclasses and towards storing data in dtypes? Or would that just
> mean that xarray would need to ban dtypes it doesn't know about?
>
>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Marten van Kerkwijk
My 2¢ here is that all code should feel free to assume certain type of
input, as long as it is documented properly, but there is no reason to
enforce that by, e.g., putting `asarray` everywhere. Then, for some
pieces ducktypes and subclasses will just work like magic, and uses
you might never have foreseen become possible. For others, whoever
wants to use them has to do work (and up to a package maintainers to
decide whether or not to accept PRs that implement hooks, etc.)

I do see the argument that this way one becomes constrained in the
internal implementation, as a change may break an outward-looking
function, but while at times this may be inconvenient, in my
experience at others it may just make one realize an even better
implementation is possible. But then, I really like duck-typing...

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Benjamin Root
Duck typing is great and all for classes that implement some or all of the
ndarray interface but remember what the main reason for asarray() and
asanyarray(): to automatically promote lists and tuples and other
"array-likes" to ndarrays. Ignoring the use-case of lists of lists is
problematic at best.

Ben Root


On Thu, Nov 2, 2017 at 5:05 PM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> My 2¢ here is that all code should feel free to assume certain type of
> input, as long as it is documented properly, but there is no reason to
> enforce that by, e.g., putting `asarray` everywhere. Then, for some
> pieces ducktypes and subclasses will just work like magic, and uses
> you might never have foreseen become possible. For others, whoever
> wants to use them has to do work (and up to a package maintainers to
> decide whether or not to accept PRs that implement hooks, etc.)
>
> I do see the argument that this way one becomes constrained in the
> internal implementation, as a change may break an outward-looking
> function, but while at times this may be inconvenient, in my
> experience at others it may just make one realize an even better
> implementation is possible. But then, I really like duck-typing...
>
> -- Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Marten van Kerkwijk
On Thu, Nov 2, 2017 at 5:09 PM, Benjamin Root  wrote:
> Duck typing is great and all for classes that implement some or all of the
> ndarray interface but remember what the main reason for asarray() and
> asanyarray(): to automatically promote lists and tuples and other
> "array-likes" to ndarrays. Ignoring the use-case of lists of lists is
> problematic at best.

How I wish numpy had never gone there! Convenience for what, exactly?
For the user not having to put `array()` around the list themselves?
We slow down everything for that? And even now we're trying to remove
some of the cases where both tuples and lists are allowed. Grr. Of
course, we are well and truly stuck with it - now it is one of the
main reasons to subclass rather than duck-type... Anyway, water under
the bridge...  -- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread josef . pktd
On Thu, Nov 2, 2017 at 5:09 PM, Benjamin Root  wrote:

> Duck typing is great and all for classes that implement some or all of the
> ndarray interface but remember what the main reason for asarray() and
> asanyarray(): to automatically promote lists and tuples and other
> "array-likes" to ndarrays. Ignoring the use-case of lists of lists is
> problematic at best.
>
> Ben Root
>
>
> On Thu, Nov 2, 2017 at 5:05 PM, Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> My 2¢ here is that all code should feel free to assume certain type of
>> input, as long as it is documented properly, but there is no reason to
>> enforce that by, e.g., putting `asarray` everywhere. Then, for some
>> pieces ducktypes and subclasses will just work like magic, and uses
>> you might never have foreseen become possible. For others, whoever
>> wants to use them has to do work (and up to a package maintainers to
>> decide whether or not to accept PRs that implement hooks, etc.)
>>
>> I do see the argument that this way one becomes constrained in the
>> internal implementation, as a change may break an outward-looking
>> function, but while at times this may be inconvenient, in my
>> experience at others it may just make one realize an even better
>> implementation is possible. But then, I really like duck-typing...
>>
>
One problem in general is that there is no protocol about what operations
are implemented in a numpy ndarray equivalent way in those ducks, i.e. if
they quack in a compatible way.

One small example, pandas standard deviation, std, used by default ddof=1,
and didn't have an option to override it instead of using ddof=0 that numpy
uses. So even though we could call a std method of the ducks, the t-test
results would be a bit different and visibly different in small samples
depending on the type of the data. A possible alternative would be to
compute std from scratch and forgo the available function or method.

I tried once in the scipy.zscore function to be agnostic about the type and
not use asarray, it's a simple operation but still it required special
handling of numpy matrices because it preserves the dimension in reduce
operations. After more than a few lines it is difficult to keep track of
what type is no used.

Another subclass that is often broken in default code are masked arrays
because asarray throws away the mask.
But asanyarray wouldn't work always either because the mask needs code for
handling the masked values. For example scipy.stats ended up with separate
functions for masked arrays.

Josef



>
>> -- Marten
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Stephan Hoyer
On Thu, Nov 2, 2017 at 12:42 PM Nathan Goldbaum 
wrote:

> Would this issue be ameliorated given Nathaniel's proposal to try to move
> away from subclasses and towards storing data in dtypes? Or would that just
> mean that xarray would need to ban dtypes it doesn't know about?
>

Yes, I think custom dtypes would definitely help. Custom dtypes have a well
contained interface, so lots of operations (e.g., concatenate, reshaping,
indexing) are guaranteed to work in a dtype independent way. If you try to
do an unsupported operation for such a dtype (e.g., np.datetime64), you
will generally get a good error message about an invalid dtype.

In contrast, you can overload a subclass with totally arbitrary semantics
(e.g., np.matrix) and of course for duck-types as well.

This makes a big difference for libraries like dask or xarray, which need a
standard interface to guarantee they do the right thing. I'm pretty sure we
can wrap a custom dtype ndarray with units, but there's no way we're going
to support np.matrix without significant work. It's hard to know which is
which without well defined interfaces.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Nathan Goldbaum
On Thu, Nov 2, 2017 at 5:21 PM, Stephan Hoyer  wrote:

> On Thu, Nov 2, 2017 at 12:42 PM Nathan Goldbaum 
> wrote:
>
>> Would this issue be ameliorated given Nathaniel's proposal to try to move
>> away from subclasses and towards storing data in dtypes? Or would that just
>> mean that xarray would need to ban dtypes it doesn't know about?
>>
>
> Yes, I think custom dtypes would definitely help. Custom dtypes have a
> well contained interface, so lots of operations (e.g., concatenate,
> reshaping, indexing) are guaranteed to work in a dtype independent way. If
> you try to do an unsupported operation for such a dtype (e.g.,
> np.datetime64), you will generally get a good error message about an
> invalid dtype.
>
> In contrast, you can overload a subclass with totally arbitrary semantics
> (e.g., np.matrix) and of course for duck-types as well.
>
> This makes a big difference for libraries like dask or xarray, which need
> a standard interface to guarantee they do the right thing. I'm pretty sure
> we can wrap a custom dtype ndarray with units, but there's no way we're
> going to support np.matrix without significant work. It's hard to know
> which is which without well defined interfaces.
>

Ah, but what if the dtype modifies the interface? That might sound evil,
but it's something that's been proposed. For example, if I wanted to
replace yt's YTArray in a backward compatibile way with a dtype and just
use plain ndarrays everywhere, the dtype would need to *at least* modify
ndarray's API, adding e.g. to(), convert_to_unit(), a units attribute, and
several other things.

Of course if I don't care about backward compatibility I can just do all of
these operations on the dtype object itself. However, I suspect whatever
implementation of custom dtypes gets added to numpy, it will have the
property that it can act like an arbitrary ndarray subclass otherwise
libraries like yt, Pint, metpy, and astropy won't be able to switch to it.

-Nathan


>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Marten van Kerkwijk
I guess my argument boils down to it being better to state that a
function only accepts arrays and happily let it break on, e.g.,
matrix, than use `asarray` to make a matrix into an array even though
it really isn't.

I do like the dtype ideas, but think I'd agree they're likely to come
with their own problems. But just making new numerical types possible
is interesting.

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Stephan Hoyer
Maybe the best of both worlds would require explicit opt-in for classes
that shouldn't be coerced, e.g.,
xarray.register_data_type(MyArray)

or maybe better yet ;)
xarray.void_my_nonexistent_warranty_its_my_fault_if_my_buggy_duck_array_breaks_everything(MyArray)

On Thu, Nov 2, 2017 at 3:39 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> I guess my argument boils down to it being better to state that a
> function only accepts arrays and happily let it break on, e.g.,
> matrix, than use `asarray` to make a matrix into an array even though
> it really isn't.
>
> I do like the dtype ideas, but think I'd agree they're likely to come
> with their own problems. But just making new numerical types possible
> is interesting.
>
> -- Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Stephan Hoyer
On Thu, Nov 2, 2017 at 3:35 PM Nathan Goldbaum 
wrote:

> Ah, but what if the dtype modifies the interface? That might sound evil,
> but it's something that's been proposed. For example, if I wanted to
> replace yt's YTArray in a backward compatibile way with a dtype and just
> use plain ndarrays everywhere, the dtype would need to *at least* modify
> ndarray's API, adding e.g. to(), convert_to_unit(), a units attribute, and
> several other things.
>

I suppose we'll need to sort this out. But adding new methods/properties
feels pretty safe to me, as long as existing ones are guaranteed to work in
the same way.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion