On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe <[email protected]> wrote: > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <[email protected]> wrote: >> >> On 02/17/2012 05:39 AM, Charles R Harris wrote: >> > >> > >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <[email protected] >> > <mailto:[email protected]>> wrote: >> > >> > Hi Travis, >> > >> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant >> > <[email protected] <mailto:[email protected]>> wrote: >> > > Mark Wiebe and I have been discussing off and on (as well as >> > talking with Charles) a good way forward to balance two competing >> > desires: >> > > >> > > * addition of new features that are needed in NumPy >> > > * improving the code-base generally and moving towards a >> > more maintainable NumPy >> > > >> > > I know there are load voices for just focusing on the second of >> > these and avoiding the first until we have finished that. I >> > recognize the need to improve the code base, but I will also be >> > pushing for improvements to the feature-set and user experience in >> > the process. >> > > >> > > As a result, I am proposing a rough outline for releases over the >> > next year: >> > > >> > > * NumPy 1.7 to come out as soon as the serious bugs can be >> > eliminated. Bryan, Francesc, Mark, and I are able to help triage >> > some of those. >> > > >> > > * NumPy 1.8 to come out in July which will have as many >> > ABI-compatible feature enhancements as we can add while improving >> > test coverage and code cleanup. I will post to this list more >> > details of what we plan to address with it later. Included for >> > possible inclusion are: >> > > * resolving the NA/missing-data issues >> > > * finishing group-by >> > > * incorporating the start of label arrays >> > > * incorporating a meta-object >> > > * a few new dtypes (variable-length string, >> > varialbe-length unicode and an enum type) >> > > * adding ufunc support for flexible dtypes and possibly >> > structured arrays >> > > * allowing generalized ufuncs to work on more kinds of >> > arrays besides just contiguous >> > > * improving the ability for NumPy to receive JIT-generated >> > function pointers for ufuncs and other calculation opportunities >> > > * adding "filters" to Input and Output >> > > * simple computed fields for dtypes >> > > * accepting a Data-Type specification as a class or JSON >> > file >> > > * work towards improving the dtype-addition mechanism >> > > * re-factoring of code so that it can compile with a C++ >> > compiler and be minimally dependent on Python data-structures. >> > >> > This is a pretty exciting list of features. What is the rationale >> > for >> > code being compiled as C++ ? IMO, it will be difficult to do so >> > without preventing useful C constructs, and without removing some of >> > the existing features (like our use of C99 complex). The subset that >> > is both C and C++ compatible is quite constraining. >> > >> > >> > I'm in favor of this myself, C++ would allow a lot code cleanup and make >> > it easier to provide an extensible base, I think it would be a natural >> > fit with numpy. Of course, some C++ projects become tangled messes of >> > inheritance, but I'd be very interested in seeing what a good C++ >> > designer like Mark, intimately familiar with the numpy code base, could >> > do. This opportunity might not come by again anytime soon and I think we >> > should grab onto it. The initial step would be a release whose code that >> > would compile in both C/C++, which mostly comes down to removing C++ >> > keywords like 'new'. >> > >> > I did suggest running it by you for build issues, so please raise any >> > you can think of. Note that MatPlotLib is in C++, so I don't think the >> > problems are insurmountable. And choosing a set of compilers to support >> > is something that will need to be done. >> >> It's true that matplotlib relies heavily on C++, both via the Agg >> library and in its own extension code. Personally, I don't like this; I >> think it raises the barrier to contributing. C++ is an order of >> magnitude more complicated than C--harder to read, and much harder to >> write, unless one is a true expert. In mpl it brings reliance on the CXX >> library, which Mike D. has had to help maintain. And if it does >> increase compiler specificity, that's bad. > > > This gets to the recruitment issue, which is one of the most important > problems I see numpy facing. I personally have contributed a lot of code to > NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was > the biggest negative point when I considered whether it was worth > contributing to the project. I suspect there are many programmers out there > who are skilled in low-level, high-performance C++, who would be willing to > contribute, but don't want to code in C. > > I believe NumPy should be trying to find people who want to make high > performance, close to the metal, libraries. This is a very different type of > programmer than one who wants to program in Python, but is willing to dabble > in a lower level language to make something run faster. High performance > library development is one of the things the C++ developer community does > very well, and that community is where we have a good chance of finding the > programmers NumPy needs. > >> I would much rather see development in the direction of sticking with C >> where direct low-level control and speed are needed, and using cython to >> gain higher level language benefits where appropriate. Of course, that >> brings in the danger of reliance on another complex tool, cython. If >> that danger is considered excessive, then just stick with C. > > > There are many small benefits C++ can offer, even if numpy chooses only to > use a tiny subset of the C++ language. For example, RAII can be used to > reliably eliminate PyObject reference leaks. > > Consider a regression like this: > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html > > Fixing this in C would require switching all the relevant usages of > NPY_MAXARGS to use a dynamic memory allocation. This brings with it the > potential of easily introducing a memory leak, and is a lot of work to do. > In C++, this functionality could be placed inside a class, where the > deterministic construction/destruction semantics eliminate the risk of > memory leaks and make the code easier to read at the same time. There are > other examples like this where the C language has forced a suboptimal design > choice because of how hard it would be to do it better. > > Cheers, > Mark >
In a similar vein, could incorporating C++ lead to a simpler low-level API for numpy? I know Mark has talked before about--in the long-term, as a dream project to scratch his own itch, and something the BDF12 doesn't necessarily agree with--implementing the great ideas in numpy as a layered C++ library. (Which would have the added benefit of making numpy more of a general array library that could be exposed to any language which can call C++ libraries.) I don't imagine that's on the table for anything near-term, but I wonder if making more of the low-level stuff C++ would make it easier for performance nuts to write their own code in C/C++ interfacing with numpy, and then expose it to python. After playing around with ufuncs at the C level for a little while last summer, I quickly realized any simplifications would be greatly appreciated. -Chris >> >> Eric >> >> > >> > Chuck >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
