Based on some previous discussion on the numpy list [1] and in now-cancelled PRs [2,3], I'd like to solicit opinions on adding an interface for numpy memory allocation event tracking, as implemented in this PR:
https://github.com/numpy/numpy/pull/309 A brief summary of the changes: - PyDataMem_NEW/FREE/RENEW become functions in the numpy API. (they used to be macros for malloc/free/realloc) These are the functions used to manage allocations for array's internal data. Most other numpy data is allocated through Python's allocator. - PyDataMem_NEW/RENEW return void* instead of char*. - Adds PyDataMem_SetEventHook() to the API, with this description: * Sets the allocation event hook for numpy array data. * Takes a PyDataMem_EventHookFunc *, which has the signature: * void hook(void *old, void *new, size_t size, void *user_data). * Also takes a void *user_data, and void **old_data. * * Returns a pointer to the previous hook or NULL. If old_data is * non-NULL, the previous user_data pointer will be copied to it. * * If not NULL, hook will be called at the end of each PyDataMem_NEW/FREE/RENEW: * result = PyDataMem_NEW(size) -> (*hook)(NULL, result, size, user_data) * PyDataMem_FREE(ptr) -> (*hook)(ptr, NULL, 0, user_data) * result = PyDataMem_RENEW(ptr, size) -> (*hook)(ptr, result, size, user_data) * * When the hook is called, the GIL will be held by the calling * thread. The hook should be written to be reentrant, if it performs * operations that might cause new allocation events (such as the * creation/descruction numpy objects, or creating/destroying Python * objects which might cause a gc) The PR also includes an example using the hook functions to track allocation via Python callback funcions (in tools/allocation_tracking). Why I think this is worth adding to numpy, even though other tools may be able to provide similar functionality: - numpy arrays use orders of magnitude more memory than most python objects, and this is often a limiting factor in algorithms. - numpy can behave in complicated ways with regards to memory management, e.g., views, OWNDATA, temporaries, etc., making it sometimes difficult to know where memory usage problems are happening and why. - numpy attracts a large number of programmers with limited low-level programming expertise, and who don't have the skills to use external tools (or time/motivation to acquire those skills), but still need to be able to diagnose these sorts of problems. - Other tools are not well integrated with Python, and vary a great deal between OS and compiler setup. I appreciate any feedback. Ray Jones [1] http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062373.html [2] (python callbacks) https://github.com/numpy/numpy/pull/284 [3] (C-level logging) https://github.com/numpy/numpy/pull/301 _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion