[Cython] Can't assign memview cast to memview slice

2013-02-26 Thread Dave Hirschfeld
The following works:

```
%%cython
cimport cython
import numpy as np
cimport numpy as np

def f(double[:,:] arr):
cdef double[:] res = np.zeros(2*arr.size, dtype=np.float64)
cdef double[:] tmp
tmp = &arr[0,0]
res[:arr.size] = tmp
return res
```

whereas the following:

```
%%cython
cimport cython
import numpy as np
cimport numpy as np

def f(double[:,:] arr):
cdef double[:] res = np.zeros(2*arr.size, dtype=np.float64)
res[:arr.size] = &arr[0,0]
return res
```

...gives the below error:

Error compiling Cython file:

...
import numpy as np
cimport numpy as np

def f(double[:,:] arr):
cdef double[:] res = np.zeros(2*arr.size, dtype=np.float64)
res[:arr.size] = &arr[0,0]
^


d3ce.pyx:7:21: Cannot assign type 'double[::1]' to 'double'


It would be nice if cython could take care of the temporary 
itself though the workaround is certainly simple enough
that it's not a big issue at all.

Thanks,
Dave



___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] [cython-users] freelist benchmarks

2013-02-26 Thread Robert Bradshaw
On Mon, Feb 25, 2013 at 1:17 AM, Stefan Behnel  wrote:
> Hi,
>
> thanks for looking through it.
>
> David Roe, 25.02.2013 00:00:
>> I changed the current type pointer check to look at tp_basicsize instead.
>>
>>> That made it work for almost all classes in lxml's own Element hierarchy,
>>> with only a couple of exceptions in lxml.objectify that have one additional
>>> object field. So, just extending the freelist support to use two different
>>> lists for different struct sizes instead of just one would make it work for
>>> all of lxml already. Taking a look at Sage to see how the situation appears
>>> over there would be interesting, I guess.
>>
>> I found some chains of length 5.  This could be shortened to 4 by putting
>> the freelist at the level of Element (which is where you most care about
>> speed of object creation).
>
> It's substantially easier to keep it in the top-level base class, though.
> Otherwise, we'd need a new protocol between inheriting types as I
> previously described. That add a *lot* of complexity.
>
>
>> SageObject
>> -> Element (_parent attribute and cdef methods)
>> -> Vector (_degree)
>> -> FreeModuleElement (_is_mutable)
>> -> FreeModuleElement_generic_dense (_entries)
>>
>> SageObject
>> -> Element (_parent attribute and cdef methods)
>> ->sage.structure.element.Matrix (_nrows)
>> -> sage.matrix.matrix.Matrix (_base_ring)
>> -> Matrix_integer_dense (_entries)

I don't know that (expensive) matrices are the best example, and often
the chains are larger for elements one really cares about.

sage: def base_tree(x): return [] if x is None else [x] +
base_tree(x.__base__)
   ...:

sage: base_tree(Integer)
 [, , , , , , , , , , ]

sage: base_tree(RealDoubleElement)
 [, , , , , , , ]

sage: base_tree(type(mod(1, 10)))
 [, , , , , , , , ]

> Ok, so even for something as large as Sage, we'd apparently end up with
> just a couple of freelists for a given base type. That really makes it
> appear reasonable to make that number a compile time constant as well. I
> mean, even if you *really* oversize it, all you loose is the static memory
> for a couple of pointers. On a 64 bit system, if you use a freelist size of
> 8 objects and provision freelists for 8 differently sized subtypes, that's
> 8*8*8 bytes in total, or half a KB, statically allocated. Even a hundred
> times that size shouldn't hurt anyone. Unused subtype freelists really take
> almost no space and won't hurt performance either.

Elements in Sage are typically larger than 8 bytes, and our
experiments for Integer showed that the benefit (for this class)
extended well beyond 8 items. On the other hand lots of elements are
so expensive that they don't merit this at all.

I think one thing to keep in mind is that Python's heap is essentially
a "freelist" of objects of every size up to 128(?) bytes, so what are
we trying to save by putting it at the base type and going up and down
the __cinit__/__dealloc__ chain? I suppose we save the zero-ing out of
memory and a function call or two, but that's not the expensive part.
For our Integer free list, we save going up the __cinit__/__dealloc__
call, initializing a couple of members, setting the vtable pointers,
which turns out to be the bulk of the cost. I'd love to see something
like this work, if just for final classes. It may require the
introduction of new functions to determine exactly how much
cleanup/setup we want to do when inserting/removing stuff from the
pool rather than giving up the memory completely.

>> This does look cool to have though.
>
> It definitely is.

Yes!

- Robert
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] [cython-users] freelist benchmarks

2013-02-26 Thread Stefan Behnel
Robert Bradshaw, 26.02.2013 21:16:
> On Mon, Feb 25, 2013 at 1:17 AM, Stefan Behnel wrote:
>> David Roe, 25.02.2013 00:00:
>>> I changed the current type pointer check to look at tp_basicsize instead.
>>>
 That made it work for almost all classes in lxml's own Element hierarchy,
 with only a couple of exceptions in lxml.objectify that have one additional
 object field. So, just extending the freelist support to use two different
 lists for different struct sizes instead of just one would make it work for
 all of lxml already. Taking a look at Sage to see how the situation appears
 over there would be interesting, I guess.
>>>
>>> I found some chains of length 5.  This could be shortened to 4 by putting
>>> the freelist at the level of Element (which is where you most care about
>>> speed of object creation).
>>
>> It's substantially easier to keep it in the top-level base class, though.
>> Otherwise, we'd need a new protocol between inheriting types as I
>> previously described. That add a *lot* of complexity.
>>
>>
>>> SageObject
>>> -> Element (_parent attribute and cdef methods)
>>> -> Vector (_degree)
>>> -> FreeModuleElement (_is_mutable)
>>> -> FreeModuleElement_generic_dense (_entries)
>>>
>>> SageObject
>>> -> Element (_parent attribute and cdef methods)
>>> ->sage.structure.element.Matrix (_nrows)
>>> -> sage.matrix.matrix.Matrix (_base_ring)
>>> -> Matrix_integer_dense (_entries)
> 
> I don't know that (expensive) matrices are the best example, and often
> the chains are larger for elements one really cares about.
> 
> sage: def base_tree(x): return [] if x is None else [x] +
> base_tree(x.__base__)
>...:
> 
> sage: base_tree(Integer)
>  [,  'sage.structure.element.EuclideanDomainElement'>,  'sage.structure.element.PrincipalIdealDomainElement'>,  'sage.structure.element.DedekindDomainElement'>,  'sage.structure.element.IntegralDomainElement'>,  'sage.structure.element.CommutativeRingElement'>,  'sage.structure.element.RingElement'>,  'sage.structure.element.ModuleElement'>,  'sage.structure.element.Element'>,  'sage.structure.sage_object.SageObject'>, ]
> 
> sage: base_tree(RealDoubleElement)
>  [,  'sage.structure.element.FieldElement'>,  'sage.structure.element.CommutativeRingElement'>,  'sage.structure.element.RingElement'>,  'sage.structure.element.ModuleElement'>,  'sage.structure.element.Element'>,  'sage.structure.sage_object.SageObject'>, ]
> 
> sage: base_tree(type(mod(1, 10)))
>  [,  'sage.rings.finite_rings.integer_mod.IntegerMod_abstract'>,  'sage.rings.finite_rings.element_base.FiniteRingElement'>,  'sage.structure.element.CommutativeRingElement'>,  'sage.structure.element.RingElement'>,  'sage.structure.element.ModuleElement'>,  'sage.structure.element.Element'>,  'sage.structure.sage_object.SageObject'>, ]

My original question was if they have differently sized object structs or
not. Those that don't would currently go into the same freelist.


>> Ok, so even for something as large as Sage, we'd apparently end up with
>> just a couple of freelists for a given base type. That really makes it
>> appear reasonable to make that number a compile time constant as well. I
>> mean, even if you *really* oversize it, all you loose is the static memory
>> for a couple of pointers. On a 64 bit system, if you use a freelist size of
>> 8 objects and provision freelists for 8 differently sized subtypes, that's
>> 8*8*8 bytes in total, or half a KB, statically allocated. Even a hundred
>> times that size shouldn't hurt anyone. Unused subtype freelists really take
>> almost no space and won't hurt performance either.
> 
> Elements in Sage are typically larger than 8 bytes

I wasn't adding up the size of the objects, only of the pointers in the
freelists. If the objects end up in the freelist, they'll also be used on
the next instantiation, so their size doesn't really matter.


> and our
> experiments for Integer showed that the benefit (for this class)
> extended well beyond 8 items. On the other hand lots of elements are
> so expensive that they don't merit this at all.
>
> I think one thing to keep in mind is that Python's heap is essentially
> a "freelist" of objects of every size up to 128(?) bytes, so what are
> we trying to save by putting it at the base type and going up and down
> the __cinit__/__dealloc__ chain?

Allocation still seems to take its time.


> I suppose we save the zero-ing out of
> memory and a function call or two, but that's not the expensive part.

And I noticed now that we still have to do the zeroing out in order to
initialise C typed attributes. And that it's actually not trivial to figure
out in what cases we can safely put a subtype into the freelist. There are
a couple of special cases in CPython's object allocation, e.g. for heap
types. Their instances own a reference to the type, which is not the case
for static types.


> For our Integer free list, we save going up the __cinit__/__dealloc__
> call, ini