note that anything larger than 16 bytes alignment is unnecessary for
simd purposes on current hardware (>= haswell). 16 byte is default
malloc alignment on amd64.
And even on older ones (sandy bridge) the penalty is pretty minor.
On 05.05.2016 22:32, Charles R Harris wrote:
On Thu, May 5, 20
2016-05-05 22:10 GMT+02:00 Øystein Schønning-Johansen :
> Thanks for your answer, Francesc. Knowing that there is no numpy solution
> saves the work of searching for this. I've not tried the solution described
> at SO, but it looks like a real performance killer. I'll rather try to
> override mall