Nathaniel Smith added the comment:
It's not terribly difficult to write a crude-but-effective aligned allocator on
top of raw malloc:
def aligned_malloc(size, alignment):
assert alignment < 255
raw_pointer = (uint8*) malloc(size + alignment)
shift = alignment - (raw_pointer % alignment)
assert 0 < shift <= alignment
aligned_pointer = raw_pointer + shift
*(aligned_pointer - 1) = shift
return aligned_pointer
def aligned_free(uint8* pointer):
shift = *(pointer - 1)
free(pointer - shift)
But, this fallback and the official Win32 API both disallow the use of plain
free() (like Victor points out in msg196834), so we can't just add an
aligned_malloc slot to the PyMemAllocator struct. This kind of aligned
allocation is effectively its own memory domain.
If native aligned allocation support were added to PyMalloc then it could
potentially do better (e.g. by noticing that it already has a block on its
freelist with the requested alignment and just returning that instead of
overallocating). This might be the ideal solution for Raymond's use case, but I
have no idea how much work it would be to mess around with PyMalloc innards.
Numpy doesn't currently use aligned allocation for anything, but we'd like to
keep our options open. If we do end up using it in the future then there's a
reasonable chance we might want to use it *without* the GIL held (e.g. for
allocating temporary buffers inside C loops). OTOH we are also happy to
implement the aligned allocation ourselves (either on top of the system APIs or
directly) -- we just don't want to lose tracemalloc support when we do.
For numpy's purposes, I think the best approach would be to add a tracemalloc
"escape valve", with an interface like:
PyMem_RecordAlloc(const char* domain, void* tag, size_t quantity,
PyMem_RecordRealloc(const char* domain, void* old_tag, void* new_tag, size_t
new_quantity)
PyMem_RecordFree(const char* domain, void* tag)
where the idea is that if after someone allocates memory (or potentially other
discrete resources) directly without going through PyMem_*, they could then
call these functions to tell tracemalloc what they just did.
This would be useful in a number of cases: in addition to tracking aligned
allocations, it would make it possible to re-use the tracemalloc infrastructure
to track GPU buffers allocated by CUDA/GPGPU-type code, mmap usage, hugetlbfs
usage, etc. Potentially even open file descriptors if one wants to go there
(seems pretty useful, actually).
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue18835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com