Re: [Numpy-discussion] NA/Missing Data Conference Call Summary

Bruce Southey Wed, 06 Jul 2011 14:29:08 -0700

On 07/06/2011 03:37 PM, Pierre GM wrote:
> On Jul 6, 2011, at 10:11 PM, Bruce Southey wrote:
>
>> On 07/06/2011 02:38 PM, Christopher Jordan-Squire wrote:
>>>
>>> On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker<[email protected]>  
>>> wrote:
>>> Christopher Jordan-Squire wrote:
>>>> If we follow those rules for IGNORE for all computations, we sometimes
>>>> get some weird output. For example:
>>>> [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix
>>>> multiply and not * with broadcasting.) Or should that sort of operation
>>>> through an error?
>>> That should throw an error -- matrix computation is heavily influenced
>>> by the shape and size of matrices, so I think IGNORES really don't make
>>> sense there.
>>>
>>>
>>>
>>> If the IGNORES don't make sense in basic numpy computations then I'm kinda 
>>> confused why they'd be included at the numpy core level.
>>>
>>>
>>> Nathaniel Smith wrote:
>>>> It's exactly this transparency that worries Matthew and me -- we feel
>>>> that the alterNEP preserves it, and the NEP attempts to erase it. In
>>>> the NEP, there are two totally different underlying data structures,
>>>> but this difference is blurred at the Python level. The idea is that
>>>> you shouldn't have to think about which you have, but if you work with
>>>> C/Fortran, then of course you do have to be constantly aware of the
>>>> underlying implementation anyway.
>>> I don't think this bothers me -- I think it's analogous to things in
>>> numpy like Fortran order and non-contiguous arrays -- you can ignore all
>>> that when working in pure python when performance isn't critical, but
>>> you need a deeper understanding if you want to work with the data in C
>>> or Fortran or to tune performance in python.
>>>
>>> So as long as there is an API to query and control how things work, I
>>> like that it's hidden from simple python code.
>>>
>>> -Chris
>>>
>>>
>>>
>>> I'm similarly not too concerned about it. Performance seems finicky when 
>>> you're dealing with missing data, since a lot of arrays will likely have to 
>>> be copied over to other arrays containing only complete data before being 
>>> handed over to BLAS. My primary concern is that the np.NA stuff 'just 
>>> works'. Especially since I've never run into use cases in statistics where 
>>> the difference between IGNORE and NA mattered.
>>>
>>>
>> Exactly!
>> I have not been able to think of an real example where that difference 
>> matters as the calculations are only on the 'valid' (ie non-missing and 
>> non-masked) values.
> In practice, they could be treated the same way (ie, skipped). However, they 
> are conceptually different and one may wish to keep this difference of 
> information around (between NAs you didn't have and IGNOREs you just dropped 
> temporarily.
>
>
> _______________________________________________
I have yet to see these as *conceptually different* in any of the 
arguments given.


Separate NAs or IGNORES or any number of missing value codes just 
requires use to avoid 'unmasking' those missing value codes in your 
array as, I presume like masked arrays, you need some placeholder values.

Bruce



_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NA/Missing Data Conference Call Summary

Reply via email to