On Fri, Jun 24, 2011 at 3:38 PM, Lluís <[email protected]> wrote: > Mark Wiebe writes: > > > It's should also be possible to accomplish a general > > solution at the dtype level. We could have a 'dtype > > factory' used like: np.zeros(10, dtype=np.maybe(float)) > > where np.maybe(x) returns a new dtype whose storage size > > is x.itemsize + 1, where the extra byte is used to store > > missingness information. (There might be some annoying > > alignment issues to deal with.) Then for each ufunc we > > define a handler for the maybe dtype (or add a > > special-case to the ufunc dispatch machinery) that checks > > the missingness value and then dispatches to the ordinary > > ufunc handler for the wrapped dtype. > > > > The 'dtype factory' idea builds on the way I've structured > > datetime as a parameterized type, but the thing that kills it > > for me is the alignment problems of 'x.itemsize + 1'. Having > > the mask in a separate memory block is a lot better than > > having to store 16 bytes for an 8-byte int to preserve the > > alignment. > > > > Yes, but that assumes it is appended to the existing types in the > > dtype individually instead of the dtype as a whole. The dtype with > > mask could just indicate a shadow array, an alpha channel if you > > will, that is essentially what you are already doing but just > > probide a different place to track it. > > > > This would seem to change the definition of a dtype - currently it > > represents a contiguous block of memory. It doesn't need to use all of > > that memory, but the dtype conceptually owns it. I kind of like it > > that way, where the whole strides idea with data being all over memory > > space belonging to ndarray, not dtype. > > I don't havy any knowledge on the numpy or ma internals, so this might > well be nonsense. > > Increasing the dtype item size would certainly decrease performance when > using big structures, as it will require higher memory bandwidth. > > Why not use structured arrays? (assuming each struct element has indeed > its own buffer, otherwise it's the same as having a "bigger" dtype) Then > you can have some "blessed" struct elements, like the mask, which > influence on how to print the array or how other struct elements must be > operated. >
Structured arrays do put their fields next to each other in memory, so this is basically like having a bigger dtype. -Mark > Besides, using "blessed" struct elements falls in line with the recent > "_ufunc_wrapper_" proposal. > > > > Lluis > > -- > "And it's much the same thing with knowledge, for whenever you learn > something new, the whole world becomes that much richer." > -- The Princess of Pure Reason, as told by Norton Juster in The Phantom > Tollbooth > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
