On Sat, May 19, 2012 at 5:45 PM, Charles R Harris <charlesr.har...@gmail.com> wrote: > > > On Sat, May 19, 2012 at 10:02 AM, Charles R Harris > <charlesr.har...@gmail.com> wrote: >> >> >> >> On Sat, May 19, 2012 at 9:21 AM, Mark Wiebe <mwwi...@gmail.com> wrote: >>> >>> On Sat, May 19, 2012 at 10:00 AM, David Cournapeau <courn...@gmail.com> >>> wrote: >>>> >>>> On Sat, May 19, 2012 at 3:17 PM, Charles R Harris >>>> <charlesr.har...@gmail.com> wrote: >>>>> >>>>> On Fri, May 18, 2012 at 3:47 PM, Travis Oliphant <tra...@continuum.io> >>>>> wrote: >>>>>> >>>>>> Hey all, >>>>>> >>>>>> After reading all the discussion around masked arrays and getting >>>>>> input from as many people as possible, it is clear that there is still >>>>>> disagreement about what to do, but there have been some fruitful >>>>>> discussions >>>>>> that ensued. >>>>>> >>>>>> This isn't really new as there was significant disagreement about what >>>>>> to do when the masked array code was initially checked in to master. >>>>>> So, >>>>>> in order to move forward, Mark and I are going to work together with >>>>>> whomever else is willing to help with an effort that is in the spirit of >>>>>> my >>>>>> third proposal but has a few adjustments. >>>>>> >>>>>> The idea will be fleshed out in more detail as it progresses, but the >>>>>> basic concept is to create an (experimental) ndmasked object in NumPy 1.7 >>>>>> and leave the actual ndarray object unchanged. While the details need >>>>>> to >>>>>> be worked out here, a goal is to have the C-API work with both ndmasked >>>>>> arrays and arrayobjects (possibly by defining a base-class C-level >>>>>> structure >>>>>> that both ndarrays inherit from). This might also be a good way for >>>>>> Dag >>>>>> to experiment with his ideas as well but that is not an explicit goal. >>>>>> >>>>>> One way this could work, for example is to have PyArrayObject * be the >>>>>> base-class array (essentially the same C-structure we have now with a >>>>>> HASMASK flag). Then, the ndmasked object could inherit from >>>>>> PyArrayObject * >>>>>> as well but add more members to the C-structure. I think this is the >>>>>> easiest thing to do and requires the least amount of code-change. >>>>>> It is >>>>>> also possible to define an abstract base-class PyArrayObject * that both >>>>>> ndarray and ndmasked inherit from. That way ndarray and ndmasked are >>>>>> siblings even though the ndarray would essentially *be* the >>>>>> PyArrayObject * >>>>>> --- just with a different type-hierarchy on the python side. >>>>>> >>>>>> This work will take some time and, therefore, I don't expect 1.7 to be >>>>>> released prior to SciPy Austin with an end of June target date. The >>>>>> timing >>>>>> will largely depend on what time is available from people interested in >>>>>> resolving the situation. Mark and I will have some availability for >>>>>> this >>>>>> work in June but not a great deal (about 2 man-weeks total between us). >>>>>> If there are others who can step in and help, it will help accelerate >>>>>> the >>>>>> process. >>>>>> >>>>> >>>>> This will be a difficult thing for others to help with since the >>>>> concept is vague, the design decisions seem to be in your and Mark's >>>>> hands, >>>>> and you say you don't have much time. It looks to me like 1.7 will keep >>>>> slipping and I don't think that is a good thing. Why not go for option 2, >>>>> which will get 1.7 out there and push the new masked array work in to 1.8? >>>>> Breaking the flow of development and release has consequences, few of them >>>>> good. >>>> >>>> >>>> Agreed. 1.6.0 was released one year ago already, let's focus on >>>> polishing what's in there *now*. I have not followed closely what the >>>> decision was for a LTS release, but if 1.7 is supposed to be it, that's >>>> another argument about changing anything there for 1.7. >>> >>> >>> The motivation behind splitting the mask out into a separate ndmasked is >>> primarily so that pre-existing code will not silently function on NA-masked >>> arrays and produce incorrect results. This centres around using PyArray_DATA >>> to get at the data after manually checking flags, instead of calling >>> PyArray_FromAny. Maybe a reasonable solution is to tweak the behavior of >>> PyArray_DATA? It could work as follows: >>> >>> - If an ndarray has no mask, PyArray_DATA returns the data pointer as it >>> does currently. >>> - If the ndarray has an NA-mask, PyArray_DATA sets an exception and >>> returns NULL >>> - Create a new accessor, PyArray_DATAPTR or PyArray_RAWDATA, which >>> returns the array data under all circumstances. >>> >>> This way, code which currently uses the data pointer through PyArray_DATA >>> will fail instead of silently working with the wrong interpretation of the >>> data. What do people feel about this idea? >>> >> >> Code working with the wrong interpretation of the data doesn't bother me >> much at this point in development. Long term it matters, but in the short >> term we can't expect code not explicitly written to work with masked arrays >> to do the right thing. I think we are looking at a period of several years >> before things settle out and get accepted. First, the implementation and its >> interface needs to get close to final form, and then the long slow process >> of adoption into things like matplotlib needs to take place. I'd quess three >> to five years for that process. >> >> That said, my main concern is to move forward and not spend the next year >> waiting. I see splitting the masked code out as rather like the python types >> having pointers to sequence/numerical/etc methods, i.e., ndarray then looks >> something like an abstract class. I don't have a problem with that and it >> does avoid base object bloat. As to having PyArray_DATA fail for masked >> arrays and provide new functions for unrestricted access, I'd be tempted to >> have PyArray_DATA continue to behave as it does and let the new functions >> return the error for masked arrays. Making third party applications fail for >> masked arrays is going make masked arrays very unpopular. Most likely no one >> would use them and third party applications would feel no pressure to >> support them. Another possibility might be to have a compile flag that >> determines whether of not PyArray_Data returns an error for masked arrays, >> something like we do now for deprecating old macros. >> > > My own plan for the near term would be as follows: > > 1) Put in the experimental option and get the 1.7 release out. This gets us > through the next couple of months and keeps things moving.
+1 on not blocking the release while we invent+implement yet another experimental API. > 2) Look at what hooks/low level functions would let us reimplement np.ma. > Because there are so many different mask uses out there, this would be a > good way to discover what low level support is likely to provide a good > basis for others to build on. > > 3) Revisit the idea of making all ndarrays masked by default, but do so with > the experience and feedback from current mask users. I like this plan. -- Nathaniel _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion