On Sun, Apr 22, 2012 at 1:20 PM, mark florisson <markflorisso...@gmail.com> wrote: > On 21 April 2012 20:17, Dimitri Tcaciuc <dtcac...@gmail.com> wrote: >> Hey everyone, >> >> Congratulations on shipping 0.16! I think I found a problem which >> seems pretty straight forward. Say I want to factor out inner part of >> some N^2 loops over a flow array, I write something like >> >> cdef inline float _inner(size_t i, size_t j, float[:] x): >> cdef float d = x[i] - x[j] >> return sqrtf(d * d) >> >> In 0.16, this actually compiles (as opposed to 0.15 with ndarray) and >> function is declared as inline, which is great. However, the >> memoryview structure is passed by value: >> >> static CYTHON_INLINE float __pyx_f_3foo__inner(size_t __pyx_v_i, >> size_t __pyx_v_j, __Pyx_memviewslice __pyx_v_x) { >> ... >> >> This seems to hinder compiler's (in my case, GCC 4.3.4) ability to >> perform efficient inlining (although function does in fact get >> inlined). If I manually inline that distance calculation, I get 3x >> speedup. (in my case 0.324020147324 vs 1.43209195137 seconds for 10k >> elements). When I manually modified generated .c file to pass memory >> view slice by pointer, slowdown was eliminated completely. >> >> On a somewhat relevant node, have you considered enabling Issues page on >> Github? >> >> >> Thanks! >> >> >> Dimitri. >> _______________________________________________ >> cython-devel mailing list >> cython-devel@python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > Although it is neither documented nor tested, it works if you just > take the address of the memoryview. You can then index it using > memoryview_pointer[0][i]. One should be careful, as taking the pointer > and passing that around means that pointer is not acquisition counted, > and will point to invalid memory if the memoryview goes out of scope > (e.g. if it's a local variable, when you return).
Nice, passing by pointer did the trick! As an observation, I tried using `cython.operator.dereference(x)` and in this case it's way less efficient than `x[0]`. Dereferencing actually allocates an empty memory view slice and copies the contents of `x`, even if the `dereference(x)` result is never assigned anywhere and is only a temporary value in the expression. Dimitri. > Cython could manually inline functions though, which could greatly > reduce argument passing and unpacking overhead in some situations > (like buffers). > _______________________________________________ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel