note that anything larger than 16 bytes alignment is unnecessary for simd purposes on current hardware (>= haswell). 16 byte is default malloc alignment on amd64.
And even on older ones (sandy bridge) the penalty is pretty minor.

On 05.05.2016 22:32, Charles R Harris wrote:


On Thu, May 5, 2016 at 2:10 PM, Øystein Schønning-Johansen
<oyste...@gmail.com <mailto:oyste...@gmail.com>> wrote:

    Thanks for your answer, Francesc. Knowing that there is no numpy
    solution saves the work of searching for this. I've not tried the
    solution described at SO, but it looks like a real performance
    killer. I'll rather try to override malloc with glibs malloc_hooks
    or LD_PRELOAD tricks. Do you think that will do it? I'll try it and
    report back.

    Thanks,
    -Øystein


Might take a look at how numpy handles this in
`numpy/core/src/umath/simd.inc.src`.

<snip>

Chuck


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to