Hello, In https://github.com/numpy/numpy/issues/5312 there's a request for an aligned allocator in Numpy (more than the default alignment of the platform's memory allocator). The reason is that on modern vectorization instruction sets, a certain alignment is required for optimal performance (even though unaligned data still works: it's just that performance is degraded... by how much will depend on the CPU micro-architecture). For example Intel recommends a 32-byte alignment for AVX loads and stores.
In https://github.com/numpy/numpy/pull/5457 I have proposed a patch to wrap the system allocator in an aligned allocator. The proposed scheme makes the alignment configurable at runtime (through a Python API), because different platforms may have different desirable alignments, and it is not reasonable for Numpy to know about them all, nor for users to recompile Numpy each time they have a different CPU. By always using an aligned allocator there is some overhead: - all arrays occupy a bit more memory by a small average amount (probably 16 bytes average on a 64-bit machine, for a 16 byte guaranteed alignment) - array resizes can be more expensive in CPU time, when the physical start changes and its alignment changes too There is also a limitation: while the physical start of an array will always be aligned, this can be defeated when taking a view starting at a non-zero index. (note that to take advantage of certain instruction set features such as AVX, Numpy may need to be compiled with specific compiler flags... but Numpy's allocations also affect other packages such as Numba which is able to generate code at runtime) I would like to know if people are interested in this feature, and if the proposed approach is acceptable. Regards Antoine. _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
