Re: [Numpy-discussion] Calling C code that assumes SIMD aligned data.

2016-05-06 Thread Francesc Alted
2016-05-05 22:10 GMT+02:00 Øystein Schønning-Johansen :

> Thanks for your answer, Francesc. Knowing that there is no numpy solution
> saves the work of searching for this. I've not tried the solution described
> at SO, but it looks like a real performance killer. I'll rather try to
> override malloc with glibs malloc_hooks or LD_PRELOAD tricks. Do you think
> that will do it? I'll try it and report back.
>

I don't think you need that much weaponry.  Just create an array with some
spare space for alignment.  Realize that you want a 64-byte aligned double
precision array.  With that, create your desired array + 64 additional
bytes (8 doubles):

In [92]: a = np.zeros(int(1e6) + 8)

In [93]: a.ctypes.data % 64
Out[93]: 16

and compute the elements to shift this:

In [94]: shift = (64 / a.itemsize) - (a.ctypes.data % 64) / a.itemsize

In [95]: shift
Out[95]: 6

now, create a view with the required elements less:

In [98]: b = a[shift:-((64 / a.itemsize)-shift)]

In [99]: len(b)
Out[99]: 100

In [100]: b.ctypes.data % 64
Out[100]: 0

and voila, b is now aligned to 64 bytes.  As the view is a copy-free
operation, this is fast, and you only wasted 64 bytes.  Pretty cheap indeed.

Francesc


>
> Thanks,
> -Øystein
>
> On Thu, May 5, 2016 at 1:55 PM, Francesc Alted  wrote:
>
>> 2016-05-05 11:38 GMT+02:00 Øystein Schønning-Johansen > >:
>>
>>> Hi!
>>>
>>> I've written a little code of numpy code that does a neural network
>>> feedforward calculation:
>>>
>>> def feedforward(self,x):
>>> for activation, w, b in zip( self.activations, self.weights,
>>> self.biases ):
>>> x = activation( np.dot(w, x) + b)
>>>
>>> This works fine when my activation functions are in Python, however I've
>>> wrapped the activation functions from a C implementation that requires the
>>> array to be memory aligned. (due to simd instructions in the C
>>> implementation.) So I need the operation np.dot( w, x) + b to return a
>>> ndarray where the data pointer is aligned. How can I do that? Is it
>>> possible at all?
>>>
>>
>> Yes.  np.dot() does accept an `out` parameter where you can pass your
>> aligned array.  The way for testing if numpy is returning you an aligned
>> array is easy:
>>
>> In [15]: x = np.arange(6).reshape(2,3)
>>
>> In [16]: x.ctypes.data % 16
>> Out[16]: 0
>>
>> but:
>>
>> In [17]: x.ctypes.data % 32
>> Out[17]: 16
>>
>> so, in this case NumPy returned a 16-byte aligned array which should be
>> enough for 128 bit SIMD (SSE family).  This kind of alignment is pretty
>> common in modern computers.  If you need 256 bit (32-byte) alignment then
>> you will need to build your container manually.  See here for an example:
>> http://stackoverflow.com/questions/9895787/memory-alignment-for-fast-fft-in-python-using-shared-arrrays
>>
>> Francesc
>>
>>
>>>
>>> (BTW: the function works  correctly about 20% of the time I run it, and
>>> else it segfaults on the simd instruction in the the C function)
>>>
>>> Thanks,
>>> -Øystein
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>>
>> --
>> Francesc Alted
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Calling C code that assumes SIMD aligned data.

2016-05-06 Thread Julian Taylor
note that anything larger than 16 bytes alignment is unnecessary for 
simd purposes on current hardware (>= haswell). 16 byte is default 
malloc alignment on amd64.

And even on older ones (sandy bridge) the penalty is pretty minor.

On 05.05.2016 22:32, Charles R Harris wrote:



On Thu, May 5, 2016 at 2:10 PM, Øystein Schønning-Johansen
mailto:oyste...@gmail.com>> wrote:

Thanks for your answer, Francesc. Knowing that there is no numpy
solution saves the work of searching for this. I've not tried the
solution described at SO, but it looks like a real performance
killer. I'll rather try to override malloc with glibs malloc_hooks
or LD_PRELOAD tricks. Do you think that will do it? I'll try it and
report back.

Thanks,
-Øystein


Might take a look at how numpy handles this in
`numpy/core/src/umath/simd.inc.src`.



Chuck


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion