Dear all, We've been having discussions on the pystatsmodels mailing list recently regarding data structures and other tools for statistics / other related data analysis applications. I believe we're trying to answer a number of different, but related questions:
1. What are the sets of functionality (and use cases) which would be desirable for the scientific (or statistical) Python programmer? Things like groupby (http://projects.scipy.org/numpy/browser/trunk/doc/neps/groupby_additions.rst) fall into this category. 2. Do we really need to build custom data structures (larry, pandas, tabular, etc.) or are structured ndarrays enough? (My conclusion is that we do need to, but others might disagree). If so, how much performance are we willing to trade for functionality? 3. What needs to happen for Python / NumPy / SciPy to really "break in" to the statistical computing field? In other words, could a Python-based stack one day be a competitive alternative to R? These are just some ideas for collecting community input. Of course as we're all working in different problem domains, the needs of users will vary quite a bit across the board. We've started to collect some thoughts, links, etc. on the scipy.org wiki: http://scipy.org/StatisticalDataStructures A lot of what's there already is commentary and comparison on the functionality provided by pandas and la / larry (since Keith and I wrote most of the stuff there). But I think we're trying to identify more generally the things that are lacking in NumPy/SciPy and related libraries for particular applications. At minimum it should be good fodder for the SciPy conferences this year and afterward (I am submitting a paper on this subject based on my experiences). - Wes _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion