[Cython] Fused Types
Hey, I've been working a bit on fused types (http://wiki.cython.org/enhancements/fusedtypes), and I've got it to generate code for every permutation of specific types. Currently it only works for cdef functions. Now in SimpleCallNode I'm trying to use PyrexTypes.best_match() to find the function with the best signature, but it doesn't seem to take care of coercions, e.g. it calls 'assignable_from' on the dst_type (e.g. char *), but for BuiltinObjectType (a str) this returns false. Why is this the case, don't calls to overloaded C++ methods need to dispatch properly here also? Other issues are public and api declarations. So should we support that at all? We could define a macro that calls a function that does the dispatch. So e.g. for this ctypedef cython.fused_type(typeA, typeB) dtype cdef func(dtype x): ... we would get two generated functions, say, __pyx_typeA_func and __pyx_typeB_func. So we could have a macro get_func(dtype) or something that then substitutes __pyx_get_func(#dtype), where __pyx_get_func returns the pointer to the right function based on the type names. I'm not sure we should support it, right now I just put the mangled names in the header. At least the cdef functions will be sharable between Cython implementation files. I also noticed that for cdef functions with optional argument it gets a struct as argument, but this struct doesn't seem to end up in the header file when the function is declared public. I believe that also the typedefs for ctypedef-ed things in the .pyx file don't make it there when used to type a cdef function's arguments. Should that be fixed? Cheers, Mark ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] prange CEP updated
On 21 April 2011 20:13, Dag Sverre Seljebotn wrote: > On 04/21/2011 10:37 AM, Robert Bradshaw wrote: >> >> On Mon, Apr 18, 2011 at 7:51 AM, mark florisson >> wrote: >>> >>> On 18 April 2011 16:41, Dag Sverre Seljebotn >>> wrote: Excellent! Sounds great! (as I won't have my laptop for some days I can't have a look yet but I will later) You're right about (the current) buffers and the gil. A testcase explicitly for them would be good. Firstprivate etc: i think it'd be nice myself, but it is probably better to take a break from it at this point so that we can think more about that and not do anything rash; perhaps open up a specific thread on them and ask for more general input. Perhaps you want to take a break or task-switch to something else (fused types?) until I can get around to review and merge what you have so far? You'll know best what works for you though. If you decide to implement explicit threadprivate variables because you've got the flow I certainly wom't object myself. >>> Ok, cool, I'll move on :) I already included a test with a prange and >>> a numpy buffer with indexing. >> >> Wow, you're just plowing away at this. Very cool. >> >> +1 to disallowing nested prange, that seems to get really messy with >> little benefit. >> >> In terms of the CEP, I'm still unconvinced that firstprivate is not >> safe to infer, but lets leave the initial values undefined rather than >> specifying them to be NaNs (we can do that as an implementation if you >> want), which will give us flexibility to change later once we've had a >> chance to play around with it. > > I don't see any technical issues with inferring firstprivate, the question > is whether we want to. I suggest not inferring it in order to make this > safer: One should be able to just try to change a loop from "range" to > "prange", and either a) have things fail very hard, or b) just work > correctly and be able to trust the results. > > Note that when I suggest using NaN, it is as initial values for EACH > ITERATION, not per-thread initialization. It is not about "firstprivate" or > not, but about disabling thread-private variables entirely in favor of > "per-iteration" variables. > > I believe that by talking about "readonly" and "per-iteration" variables, > rather than "thread-shared" and "thread-private" variables, this can be used > much more safely and with virtually no knowledge of the details of > threading. Again, what's in my mind are scientific programmers with (too) > little training. > > In the end it's a matter of taste and what is most convenient to more users. > But I believe the case of needing real thread-private variables that > preserves per-thread values across iterations (and thus also can possibly > benefit from firstprivate) is seldomly enough used that an explicit > declaration is OK, in particular when it buys us so much in safety in the > common case. > > To be very precise, > > cdef double x, z > for i in prange(n): > x = f(x) > z = f(i) > ... > > goes to > > cdef double x, z > for i in prange(n): > x = z = nan > x = f(x) > z = f(i) > ... > > and we leave it to the C compiler to (trivially) optimize away "z = nan". > And, yes, it is a stopgap solution until we've got control flow analysis so > that we can outright disallow such uses of x (without threadprivate > declaration, which also gives firstprivate behaviour). > Ah, I see, sure, that sounds sensible. I'm currently working on fused types, so when I finish that up I'll return to that. > >> >> The "cdef threadlocal(int) foo" declaration syntax feels odd to me... >> We also probably want some way of explicitly marking a variable as >> shared and still be able to assign to/flush/sync it. Perhaps the >> parallel context could be used for these declarations, i.e. >> >> with parallel(threadlocal=a, shared=(b,c)): >> ... >> >> which would be considered an "expert" usecase. > > I'm not set on the syntax for threadlocal variables; although your proposal > feels funny/very unpythonic, almost like a C macro. For some inspiration, > here's the Python solution (with no obvious place to put the type): > > import threading > mydata = threading.local() > mydata.myvar = ... # value is threadprivate > >> For all the discussion of threadsavailable/threadid, the most common >> usecase I see is for allocating a large shared buffer and partitioning >> it. This seems better handled by allocating separate thread-local >> buffers, no? I still like the context idea, but everything in a >> parallel block before and after the loop(s) also seems like a natural >> place to put any setup/teardown code (though the context has the >> advantage that __exit__ is always called, even if exceptions are >> raised, which makes cleanup a lot easier to handle). > > I'd *really* like to have try/finally available in cython.parallel block for > this, although I re
Re: [Cython] Fused Types
mark florisson, 26.04.2011 16:23: I've been working a bit on fused types (http://wiki.cython.org/enhancements/fusedtypes), and I've got it to generate code for every permutation of specific types. Currently it only works for cdef functions. Now in SimpleCallNode I'm trying to use PyrexTypes.best_match() to find the function with the best signature, but it doesn't seem to take care of coercions, e.g. it calls 'assignable_from' on the dst_type (e.g. char *), but for BuiltinObjectType (a str) this returns false. Which is correct. "char*" cannot coerce from/to "str". It can coerce to "bytes", though. http://wiki.cython.org/enhancements/stringliterals Why is this the case, don't calls to overloaded C++ methods need to dispatch properly here also? If this doesn't work, I assume it just isn't implemented. Other issues are public and api declarations. So should we support that at all? We could define a macro that calls a function that does the dispatch. So e.g. for this ctypedef cython.fused_type(typeA, typeB) dtype cdef func(dtype x): ... we would get two generated functions, say, __pyx_typeA_func and __pyx_typeB_func. So we could have a macro get_func(dtype) or something that then substitutes __pyx_get_func(#dtype), where __pyx_get_func returns the pointer to the right function based on the type names. I'm not sure we should support it, right now I just put the mangled names in the header. At least the cdef functions will be sharable between Cython implementation files. I'm fine with explicitly forbidding this for now. It may eventually work for Python object types where we can properly dispatch, but it won't easily work for C types. It may work in C++, though. I also noticed that for cdef functions with optional argument it gets a struct as argument, but this struct doesn't seem to end up in the header file when the function is declared public. I believe that also the typedefs for ctypedef-ed things in the .pyx file don't make it there when used to type a cdef function's arguments. Should that be fixed? No, I think this should also become a compiler error. These functions are not meant to be called from C code. It's a Cython convenience feature. As long as it's not correctly supported on both ends of the publicly exported C-API, it's best to keep users from using it at all. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] speed.pypy.org
Stefan Behnel, 15.04.2011 22:20: Stefan Behnel, 11.04.2011 15:08: I'm currently discussing with Maciej Fijalkowski (PyPy) how to get Cython running on speed.pypy.org (that's what I wrote "cythonrun" for). If it works out well, we may have it up in a couple of days. ... or maybe not. It may take a little longer due to lack of time on his side. I would expect that Cython won't be a big winner in this game, given that it will only compile plain untyped Python code. It's also going to fail entirely in some of the benchmarks. But I think it's worth having it up there, simply as a way for us to see where we are performance-wise and to get quick (nightly) feed-back about optimisations we try. The benchmark suite is also a nice set of real-world Python code that will allow us to find compliance issues. Ok, here's what I have so far. I fixed a couple of bugs in Cython and got at least some of the benchmarks running. Note that they are actually simple ones, only a single module. Basically all complex benchmarks fail due to known bugs, such as Cython def functions not accepting attribute assignments (e.g. on wrapping). There's also a problem with code that uses platform specific names conditionally, such as WindowsError when running on Windows. Cython complains about non-builtin names here. I'm considering to turn that into a visible warning instead of an error, so that the name would instead be looked up dynamically to let the code fail at runtime *iff* it reaches the name lookup. Anyway, here are the numbers. I got them with "auto_cpdef" enabled, although that doesn't even seem to make that a big difference. The baseline is a self-compiled Python 2.7.1+ (about a month old). [numbers stripped] And here's the shiny graph: https://sage.math.washington.edu:8091/hudson/job/cython-devel-benchmarks-py27/lastSuccessfulBuild/artifact/chart.html It gets automatically rebuilt by this Hudson job: https://sage.math.washington.edu:8091/hudson/job/cython-devel-benchmarks-py27/ Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Fused Types
On 26 April 2011 16:43, Stefan Behnel wrote: > mark florisson, 26.04.2011 16:23: >> >> I've been working a bit on fused types >> (http://wiki.cython.org/enhancements/fusedtypes), and I've got it to >> generate code for every permutation of specific types. Currently it >> only works for cdef functions. Now in SimpleCallNode I'm trying to use >> PyrexTypes.best_match() to find the function with the best signature, >> but it doesn't seem to take care of coercions, e.g. it calls >> 'assignable_from' on the dst_type (e.g. char *), but for >> BuiltinObjectType (a str) this returns false. > > Which is correct. "char*" cannot coerce from/to "str". It can coerce to > "bytes", though. > > http://wiki.cython.org/enhancements/stringliterals Right, I see, so the thing is that I was using string literals which should be inferred as the bytes type when they are assigned to a char *. The thing is that because the type is fused, the argument may be anything, so I was hoping best_match would figure it out for me. Apparently it doesn't, and indeed, this example doesn't work: cdef extern from "Foo.h": cdef cppclass Foo: Foo(char *) Foo(int) cdef char *foo = "foo" cdef Foo* foo = new Foo("foo") # <- this doesn't work ("no suitable method found") cdef Foo* bar = new Foo(foo) # <- this works So that's pretty lame, I think I should fix that. > >> Why is this the case, >> don't calls to overloaded C++ methods need to dispatch properly here >> also? > > If this doesn't work, I assume it just isn't implemented. > > >> Other issues are public and api declarations. So should we support >> that at all? We could define a macro that calls a function that does >> the dispatch. So e.g. for this >> >> ctypedef cython.fused_type(typeA, typeB) dtype >> >> cdef func(dtype x): >> ... >> >> we would get two generated functions, say, __pyx_typeA_func and >> __pyx_typeB_func. So we could have a macro get_func(dtype) or >> something that then substitutes __pyx_get_func(#dtype), where >> __pyx_get_func returns the pointer to the right function based on the >> type names. I'm not sure we should support it, right now I just put >> the mangled names in the header. At least the cdef functions will be >> sharable between Cython implementation files. > > I'm fine with explicitly forbidding this for now. It may eventually work for > Python object types where we can properly dispatch, but it won't easily work > for C types. It may work in C++, though. > Ok, will do. >> I also noticed that for cdef functions with optional argument it gets >> a struct as argument, but this struct doesn't seem to end up in the >> header file when the function is declared public. I believe that also >> the typedefs for ctypedef-ed things in the .pyx file don't make it >> there when used to type a cdef function's arguments. Should that be >> fixed? > > No, I think this should also become a compiler error. These functions are > not meant to be called from C code. It's a Cython convenience feature. As > long as it's not correctly supported on both ends of the publicly exported > C-API, it's best to keep users from using it at all. Ok, I'll try to make Cython issue an error for these cases. > Stefan > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel > ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] speed.pypy.org
On Tue, Apr 26, 2011 at 7:50 AM, Stefan Behnel wrote: > Stefan Behnel, 15.04.2011 22:20: >> >> Stefan Behnel, 11.04.2011 15:08: >>> >>> I'm currently discussing with Maciej Fijalkowski (PyPy) how to get Cython >>> running on speed.pypy.org (that's what I wrote "cythonrun" for). If it >>> works out well, we may have it up in a couple of days. >> >> ... or maybe not. It may take a little longer due to lack of time on his >> side. >> >> >>> I would expect that Cython won't be a big winner in this game, given that >>> it will only compile plain untyped Python code. It's also going to fail >>> entirely in some of the benchmarks. But I think it's worth having it up >>> there, simply as a way for us to see where we are performance-wise and to >>> get quick (nightly) feed-back about optimisations we try. The benchmark >>> suite is also a nice set of real-world Python code that will allow us to >>> find compliance issues. >> >> Ok, here's what I have so far. I fixed a couple of bugs in Cython and got >> at least some of the benchmarks running. Note that they are actually >> simple >> ones, only a single module. Basically all complex benchmarks fail due to >> known bugs, such as Cython def functions not accepting attribute >> assignments (e.g. on wrapping). There's also a problem with code that uses >> platform specific names conditionally, such as WindowsError when running >> on >> Windows. Cython complains about non-builtin names here. I'm considering to >> turn that into a visible warning instead of an error, so that the name >> would instead be looked up dynamically to let the code fail at runtime >> *iff* it reaches the name lookup. >> >> Anyway, here are the numbers. I got them with "auto_cpdef" enabled, >> although that doesn't even seem to make that a big difference. The >> baseline >> is a self-compiled Python 2.7.1+ (about a month old). > > [numbers stripped] > > And here's the shiny graph: > > https://sage.math.washington.edu:8091/hudson/job/cython-devel-benchmarks-py27/lastSuccessfulBuild/artifact/chart.html > > It gets automatically rebuilt by this Hudson job: > > https://sage.math.washington.edu:8091/hudson/job/cython-devel-benchmarks-py27/ Cool. Any history stored/displayed? - Robert ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] prange CEP updated
On Tue, Apr 26, 2011 at 7:25 AM, mark florisson wrote: > On 21 April 2011 20:13, Dag Sverre Seljebotn > wrote: >> On 04/21/2011 10:37 AM, Robert Bradshaw wrote: >>> >>> On Mon, Apr 18, 2011 at 7:51 AM, mark florisson >>> wrote: On 18 April 2011 16:41, Dag Sverre Seljebotn wrote: > > Excellent! Sounds great! (as I won't have my laptop for some days I > can't > have a look yet but I will later) > > You're right about (the current) buffers and the gil. A testcase > explicitly > for them would be good. > > Firstprivate etc: i think it'd be nice myself, but it is probably better > to > take a break from it at this point so that we can think more about that > and > not do anything rash; perhaps open up a specific thread on them and ask > for > more general input. Perhaps you want to take a break or task-switch to > something else (fused types?) until I can get around to review and merge > what you have so far? You'll know best what works for you though. If you > decide to implement explicit threadprivate variables because you've got > the > flow I certainly wom't object myself. > Ok, cool, I'll move on :) I already included a test with a prange and a numpy buffer with indexing. >>> >>> Wow, you're just plowing away at this. Very cool. >>> >>> +1 to disallowing nested prange, that seems to get really messy with >>> little benefit. >>> >>> In terms of the CEP, I'm still unconvinced that firstprivate is not >>> safe to infer, but lets leave the initial values undefined rather than >>> specifying them to be NaNs (we can do that as an implementation if you >>> want), which will give us flexibility to change later once we've had a >>> chance to play around with it. >> >> I don't see any technical issues with inferring firstprivate, the question >> is whether we want to. I suggest not inferring it in order to make this >> safer: One should be able to just try to change a loop from "range" to >> "prange", and either a) have things fail very hard, or b) just work >> correctly and be able to trust the results. >> >> Note that when I suggest using NaN, it is as initial values for EACH >> ITERATION, not per-thread initialization. It is not about "firstprivate" or >> not, but about disabling thread-private variables entirely in favor of >> "per-iteration" variables. >> >> I believe that by talking about "readonly" and "per-iteration" variables, >> rather than "thread-shared" and "thread-private" variables, this can be used >> much more safely and with virtually no knowledge of the details of >> threading. Again, what's in my mind are scientific programmers with (too) >> little training. >> >> In the end it's a matter of taste and what is most convenient to more users. >> But I believe the case of needing real thread-private variables that >> preserves per-thread values across iterations (and thus also can possibly >> benefit from firstprivate) is seldomly enough used that an explicit >> declaration is OK, in particular when it buys us so much in safety in the >> common case. >> >> To be very precise, >> >> cdef double x, z >> for i in prange(n): >> x = f(x) >> z = f(i) >> ... >> >> goes to >> >> cdef double x, z >> for i in prange(n): >> x = z = nan >> x = f(x) >> z = f(i) >> ... >> >> and we leave it to the C compiler to (trivially) optimize away "z = nan". >> And, yes, it is a stopgap solution until we've got control flow analysis so >> that we can outright disallow such uses of x (without threadprivate >> declaration, which also gives firstprivate behaviour). >> > Ah, I see, sure, that sounds sensible. I'm currently working on fused > types, so when I finish that up I'll return to that. >> >>> >>> The "cdef threadlocal(int) foo" declaration syntax feels odd to me... >>> We also probably want some way of explicitly marking a variable as >>> shared and still be able to assign to/flush/sync it. Perhaps the >>> parallel context could be used for these declarations, i.e. >>> >>> with parallel(threadlocal=a, shared=(b,c)): >>> ... >>> >>> which would be considered an "expert" usecase. >> >> I'm not set on the syntax for threadlocal variables; although your proposal >> feels funny/very unpythonic, almost like a C macro. For some inspiration, >> here's the Python solution (with no obvious place to put the type): >> >> import threading >> mydata = threading.local() >> mydata.myvar = ... # value is threadprivate >> >>> For all the discussion of threadsavailable/threadid, the most common >>> usecase I see is for allocating a large shared buffer and partitioning >>> it. This seems better handled by allocating separate thread-local >>> buffers, no? I still like the context idea, but everything in a >>> parallel block before and after the loop(s) also seems like a natural >>> place to put any setup/teardown code (though the context has the >>> advantage that __exit__ is always called,
Re: [Cython] Fused Types
On Tue, Apr 26, 2011 at 8:18 AM, mark florisson wrote: > On 26 April 2011 16:43, Stefan Behnel wrote: >> mark florisson, 26.04.2011 16:23: >>> >>> I've been working a bit on fused types >>> (http://wiki.cython.org/enhancements/fusedtypes), and I've got it to >>> generate code for every permutation of specific types. Currently it >>> only works for cdef functions. Now in SimpleCallNode I'm trying to use >>> PyrexTypes.best_match() to find the function with the best signature, >>> but it doesn't seem to take care of coercions, e.g. it calls >>> 'assignable_from' on the dst_type (e.g. char *), but for >>> BuiltinObjectType (a str) this returns false. >> >> Which is correct. "char*" cannot coerce from/to "str". It can coerce to >> "bytes", though. >> >> http://wiki.cython.org/enhancements/stringliterals > > Right, I see, so the thing is that I was using string literals which > should be inferred as the bytes type when they are assigned to a char > *. The thing is that because the type is fused, the argument may be > anything, so I was hoping best_match would figure it out for me. > Apparently it doesn't, and indeed, this example doesn't work: > > cdef extern from "Foo.h": > cdef cppclass Foo: > Foo(char *) > Foo(int) > > cdef char *foo = "foo" > cdef Foo* foo = new Foo("foo") # <- this doesn't work ("no suitable > method found") > cdef Foo* bar = new Foo(foo) # <- this works > > So that's pretty lame, I think I should fix that. Agreed. >>> Why is this the case, >>> don't calls to overloaded C++ methods need to dispatch properly here >>> also? >> >> If this doesn't work, I assume it just isn't implemented. >> >> >>> Other issues are public and api declarations. So should we support >>> that at all? We could define a macro that calls a function that does >>> the dispatch. So e.g. for this >>> >>> ctypedef cython.fused_type(typeA, typeB) dtype >>> >>> cdef func(dtype x): >>> ... >>> >>> we would get two generated functions, say, __pyx_typeA_func and >>> __pyx_typeB_func. So we could have a macro get_func(dtype) or >>> something that then substitutes __pyx_get_func(#dtype), where >>> __pyx_get_func returns the pointer to the right function based on the >>> type names. I'm not sure we should support it, right now I just put >>> the mangled names in the header. At least the cdef functions will be >>> sharable between Cython implementation files. >> >> I'm fine with explicitly forbidding this for now. It may eventually work for >> Python object types where we can properly dispatch, but it won't easily work >> for C types. It may work in C++, though. >> > > Ok, will do. For the moment, putting mangled names in the header should be fine. A macro might make sense in the long term. Somewhat orthogonal, it could makes sense to do some dispatching on type for cpdef functions. >>> I also noticed that for cdef functions with optional argument it gets >>> a struct as argument, but this struct doesn't seem to end up in the >>> header file when the function is declared public. I believe that also >>> the typedefs for ctypedef-ed things in the .pyx file don't make it >>> there when used to type a cdef function's arguments. Should that be >>> fixed? >> >> No, I think this should also become a compiler error. These functions are >> not meant to be called from C code. It's a Cython convenience feature. As >> long as it's not correctly supported on both ends of the publicly exported >> C-API, it's best to keep users from using it at all. > > Ok, I'll try to make Cython issue an error for these cases. +1 - Robert ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Fused Types
mark florisson, 26.04.2011 17:18: On 26 April 2011 16:43, Stefan Behnel wrote: mark florisson, 26.04.2011 16:23: I've been working a bit on fused types (http://wiki.cython.org/enhancements/fusedtypes), and I've got it to generate code for every permutation of specific types. Currently it only works for cdef functions. Now in SimpleCallNode I'm trying to use PyrexTypes.best_match() to find the function with the best signature, but it doesn't seem to take care of coercions, e.g. it calls 'assignable_from' on the dst_type (e.g. char *), but for BuiltinObjectType (a str) this returns false. Which is correct. "char*" cannot coerce from/to "str". It can coerce to "bytes", though. http://wiki.cython.org/enhancements/stringliterals Right, I see, so the thing is that I was using string literals which should be inferred as the bytes type when they are assigned to a char *. The thing is that because the type is fused, the argument may be anything, so I was hoping best_match would figure it out for me. Apparently it doesn't, and indeed, this example doesn't work: cdef extern from "Foo.h": cdef cppclass Foo: Foo(char *) Foo(int) cdef char *foo = "foo" cdef Foo* foo = new Foo("foo") #<- this doesn't work ("no suitable method found") cdef Foo* bar = new Foo(foo) #<- this works So that's pretty lame, I think I should fix that. Well, I assume it's less lame when you use b"foo". But given the current auto-coerce semantics for unprefixed string literals in Cython, I agree that it should also auto-coerce here. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel