Hi all, Since Mark's original missingdata branch made so many changes, I figured it would be a useful exercise to figure out what code in master is actually related to masked arrays, and which isn't. The easiest way seemed to be to delete the new fields, then keep removing any code that depended on them until everything built again, which gives this: https://github.com/njsmith/numpy/commits/separate-maskna
Possible uses: - Use the diff as a checklist for going through to change the API - Use the diff as a checklist for adding an experimental-API-only flag - Merge into master, then use as a reference to cherrypick the pieces that we want to save (there is some questionable stuff in here -- e.g. copy/paste code, hand-wavy half-implementation of "multi-NA" support, and PyArray_ReduceWrapper, see below) - Merge into master and then immediately 'git revert' the changes on a branch, which would effectively 'move it aside' so we can release 1.7 while Mark and Travis to continue hacking on it at their own pace This is a huge patch, but I was pretty careful not to cause any accidental non-maskna-related regressions. The whole PyArray_Diagonal thread actually happened because I noticed that test_maskna had the only tests for np.diagonal, so I wanted to write proper ones that would be independent of the maskna code. Also I ran the following tests: - numpy.test("full") - scipy v0.10.1, test("full") - matplotlib current master - pandas v0.7.3 and everything looks good. The other complicated thing to handle was the new PyArray_ReduceWrapper function that was added to the public multiarray API. Conceptually, this function has only a glancing relationship to masked arrays per se. But, it has its own problems, and I don't think it should be exposed in 1.7. Partly because its signature necessarily changes depending on whether maskna support exists. Partly because it's just kind of ugly (it has 15 arguments, after I removed some[1]). But mostly because it gives us two independent generic implementations of functions that do array->scalar operations, which seems like something we absolutely don't want to commit to supporting indefinitely. And the "generalized ufunc" alternative seems a lot more promising. So, that branch also has a followup patch that does the necessary hacking to get it out of the public API. Unfortunately, this patch is dependent on the previous one -- I'm not sure how to untangle PyArray_ReduceWrapper while keeping the maskna support in, which makes the "global experimental flag" idea for 1.7 hard to implement (assuming that others agree about PyArray_ReduceWrapper being unready for public exposure). At this point, it might easiest to just merge this branch to master, immediately revert it on a branch for Mark and Travis to work on, and then release 1.7. Ralf, IIUC merging this and my other outstanding PRs would leave the datetime issues on python3/win32 as the only outstanding blocker? - N [1] http://www-pu.informatik.uni-tuebingen.de/users/klaeren/epigrams.html , number 11 ------------ Commit messages follow for reference/discussion The main change is commit 4c16813c23b20: Remove maskna API from ndarray, and all (and only) the code supporting it The original masked-NA-NEP branch contained a large number of changes in addition to the core NA support. For example: - ufunc.__call__ support for where= argument - nditer support for arbitrary masks (in support of where=) - ufunc.reduce support for simultaneous reduction over multiple axes - a new "array assignment API" - ndarray.diagonal() returning a view in all cases - bug-fixes in __array_priority__ handling - datetime test changes etc. There's no consensus yet on what should be done with the maskna-related part of this branch, but the rest is generally useful and uncontroversial, so the goal of this branch is to identify exactly which code changes are involved in maskna support. The basic strategy used to create this patch was: - Remove the new masking-related fields from ndarray, so no arrays are masked - Go through and remove all the code that this makes dead/inaccessible/irrelevant, in a largely mechanical fashion. So for example, if I saw 'if (PyArray_HASMASK(a)) { ... }' then that whole block was obviously just dead code if no arrays have masks, and I removed it. Likewise for function arguments like skipna that are useless if there aren't any NAs to skip. This changed the signature of a number of functions that were newly exposed in the numpy public API. I've removed all such functions from the public API, since releasing them with the NA-less signature in 1.7 would create pointless compatibility hassles later if and when we add back the NA-related functionality. Most such functions are removed by this commit; the exception is PyArray_ReduceWrapper, which requires more extensive surgery, and will be handled in followup commits. I also removed the new ndarray.setasflat method. Reason: a comment noted that the only reason this was added was to allow easier testing of one branch of PyArray_CopyAsFlat. That branch is now the main branch, so that isn't an issue. Nonetheless this function is arguably useful, so perhaps it should have remained, but I judged that since numpy's API is already hairier than we would like, it's not a good idea to add extra hair "just in case". (Also AFAICT the test for this method in test_maskna was actually incorrect, as noted here: https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py so I'm not confident that it ever worked in master, though I haven't had a chance to follow-up on this.) I also removed numpy.count_reduce_items, since without skipna it became trivial. I believe that these are the only exceptions to the "remove dead code" strategy. The ReduceWrapper untangling is a001fb29c9: Remove PyArray_ReduceWrapper from public API There are two reasons to want to keep PyArray_ReduceWrapper out of the public multiarray API: - Its signature is likely to change if/when masked arrays are added - It is essentially a wrapper for array->scalar transformations (*not* just reductions as its name implies -- the whole reason it is in multiarray.so in the first place is to support count_nonzero, which is not actually a reduction!). It provides some nice conveniences (like making it easy to apply such functions to multiple axes simultaneously), but, we already have a general mechanism for writing array->scalar transformations -- generalized ufuncs. We do not want to have two independent, redundant implementations of this functionality, one in multiarray and one in umath! So in the long run we should add these nice features to the generalized ufunc machinery. And in the short run, we shouldn't add it to the public API and commit ourselves to supporting it. However, simply removing it from numpy_api.py is not easy, because this code was used in both multiarray and umath. This commit: - Moves ReduceWrapper and supporting code to umath/, and makes appropriate changes (e.g. renaming it to PyUFunc_ReduceWrapper and cleaning up the header files). - Reverts numpy.count_nonzero to its previous implementation, so that it loses the new axis= and keepdims= arguments. This is unfortunate, but this change isn't so urgent that it's worth tying our APIs in knots forever. (Perhaps in the future it can become a generalized ufunc.) _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion