note that anything larger than 16 bytes alignment is unnecessary for
simd purposes on current hardware (>= haswell). 16 byte is default
malloc alignment on amd64.
And even on older ones (sandy bridge) the penalty is pretty minor.
On 05.05.2016 22:32, Charles R Harris wrote:
On Thu, May 5, 2016 at 2:10 PM, Øystein Schønning-Johansen
<oyste...@gmail.com <mailto:oyste...@gmail.com>> wrote:
Thanks for your answer, Francesc. Knowing that there is no numpy
solution saves the work of searching for this. I've not tried the
solution described at SO, but it looks like a real performance
killer. I'll rather try to override malloc with glibs malloc_hooks
or LD_PRELOAD tricks. Do you think that will do it? I'll try it and
report back.
Thanks,
-Øystein
Might take a look at how numpy handles this in
`numpy/core/src/umath/simd.inc.src`.
<snip>
Chuck
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion