On Fri, Jun 24, 2011 at 11:13 AM, Christopher Barker <[email protected]>wrote:
> Nathaniel Smith wrote: > >> The 'dtype factory' idea builds on the way I've structured datetime as a > >> parameterized type, > > ... > > Another disadvantage is that we get further from Gael Varoquaux's point: > >> Right now, the numpy array can be seen as an extension of the C > >> array, basically a pointer, a data type, and a shape (and strides). > >> This enables easy sharing with libraries that have not been > >> written with numpy in mind. > > and also PEP 3118 support > > It is very useful that a numpy array has a pointer to a regular old C > array -- if we introduce this special dtype, that will break (well, not > really, put the the c array would be of this particular struct). > Granted, any other C code would properly have to do something with the > mask anyway, but I still think it'd be better to keep that raw data > array standard. > It's not actually a pointer to a C array, there is already a lot of checking and possibly a copy/buffer required before you can treat it as such. The data may be misaligned, have noncontiguous strides, have a non-C multidimensional memory layout, or have a different byte order. Dealing with all these special cases in a uniform way is one of the things the 1.6 nditer provides a lot of helps for. > > This applies to switching between masked and not-masked numpy arrays > also -- I don't think I'd want the performance hot of that requiring a > data copy. > When performance is important, it is still possible to avoid that copy - by adding the mask to a view of the original array. The mask= parameter to ufuncs, something which is independent of arrays with masks, also provides a way to do masked operations without ever touching masked arrays. Also the idea was posted here that you could use views to have the same > data set with different masks -- that would break as well. > I'm not sure how this would break? I think that should work just fine. > > Nathaniel Smith wrote: > > > If we think that the memory overhead for floating point types is too > > high, it would be easy to add a special case where maybe(float) used a > > distinguished NaN instead of a separate boolean. > > That would be pretty cool, though in the past folks have made a good > argument that even for floats, masks have significant advantages over > "just using NaN". One might be that you can mask and unmask a value for > different operations, without losing the value. > Especially with the ability to do the "hardmask" feature, this aspect of it might end up being useful. -Mark > > -Chris > > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > [email protected] > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
