On Thu, Jun 23, 2011 at 7:28 PM, Pierre GM <[email protected]> wrote:
> Sorry y'all, I'm just commenting bits by bits: > > "One key problem is a lack of orthogonality with other features, for > instance creating a masked array with physical quantities can't be done > because both are separate subclasses of ndarray. The only reasonable way to > deal with this is to move the mask into the core ndarray." > > Meh. I did try to make it easy to use masked arrays on top of subclasses. > There's even some tests in the suite to that effect (test_subclassing). I'm > not buying the argument. > About moving mask in the core ndarray: I had suggested back in the days to > have a mask flag/property built-in ndarrays (which would *really* have > simplified the game), but this suggestion was dismissed very quickly as > adding too much overload. I had to agree. I'm just a tad surprised the wind > has changed on that matter. Ok, I'll have to change that section then. :) I don't remember seeing mention of this ability in the documentation, but I may not have been reading closely enough for that part. > "In the current masked array, calculations are done for the whole array, > then masks are patched up afterwords. This means that invalid calculations > sitting in masked elements can raise warnings or exceptions even though they > shouldn't, so the ufunc error handling mechanism can't be relied on." > > Well, there's a reason for that. Initially, I tried to guess what the mask > of the output should be from the mask of the inputs, the objective being to > avoid getting NaNs in the C array. That was easy in most cases, but it > turned out it wasn't always possible (the `power` one caused me a lot of > issues, if I recall correctly). So, for performance issues (to avoid a lot > of expensive tests), I fell back on the old concept of "compute them all, > they'll be sorted afterwards". > Of course, that's rather clumsy an approach. But it works not too badly > when in pure Python. No doubt that a proper C implementation would work > faster. > Oh, about using NaNs for invalid data ? Well, can't work with integers. > In my proposal, NaNs stay as unmasked NaN values, instead of turning into masked values. This is necessary for uniform treatment of all dtypes, but a subclass could override this behavior with an extra mask modification after arithmetic operations. > `mask` property: > Nothing to add to it. It's basically what we have now (except for the > opposite convention). > > Working with masked values: > I recall some strong points back in the days for not using None to > represent missing values... > Adding a maskedstr argument to array2string ? Mmh... I prefer a global flag > like we have now. > I'm not really a fan of all the global state that NumPy keeps, I guess I'm trying to stamp that out bit by bit as well where I can... Design questions: > Adding `masked` or whatever we call it to a number/array should result is > masked/a fully masked array, period. That way, we can have an idea that > something was wrong with the initial dataset. > I'm not sure I understand what you mean, in the design adding a mask means setting "a.mask = True", "a.mask = False", or "a.mask = <boolean array>" in general. > hardmask: I never used the feature myself. I wonder if anyone did. Still, > it's a nice idea... > Ok, I'll leave that out of the initial design unless someone comes up with some strong use cases. -Mark > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
