[Cython] CEP: prange for parallel loops
CEP up at http://wiki.cython.org/enhancements/prange """ This spec is the result of a number of discussions at Cython workshop 1. Quite a few different ways of expressing parallelism was looked at, and finally we decided to split the problem in two: * A simple and friendly solution that covers, perhaps, 80% of the cases, based on simply replacing range with prange. * Less friendly solutions for the remaining cases. These cases may well not even require language support in Cython, or only in indirect ways (e.g., cdef closures if normal closures are too expensive). This document focuses exclusively on the former solution and does not intend to cover all use-cases for parallel programming, only the most common ones. """ Note that me and Mark talked some more on the way to the airport, and also I got a couple of more ideas afterwards, so everybody interested should probably take a read even if you were there for discussions. Main post-workshop changes: * cython.parallel.firstiteration()/lastiteration # for in-loop if-test for thread setup/teardown blocks * An idea for how to implement numthreads(), so that we can drop the rather complex Context idea. * More thoughts on firstprivate/lastprivate Dag Sverre ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
[Cython] CEP: prange for parallel loops
CEP up at http://wiki.cython.org/enhancements/prange """ This spec is the result of a number of discussions at Cython workshop 1. Quite a few different ways of expressing parallelism was looked at, and finally we decided to split the problem in two: * A simple and friendly solution that covers, perhaps, 80% of the cases, based on simply replacing range with prange. * Less friendly solutions for the remaining cases. These cases may well not even require language support in Cython, or only in indirect ways (e.g., cdef closures if normal closures are too expensive). This document focuses exclusively on the former solution and does not intend to cover all use-cases for parallel programming, only the most common ones. """ Note that me and Mark talked some more on the way to the airport, and also I got a couple of more ideas afterwards, so everybody interested should probably take a read even if you were there for discussions. Dag Sverre ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP: prange for parallel loops
On 04/04/2011 11:43 AM, Dag Sverre Seljebotn wrote: CEP up at http://wiki.cython.org/enhancements/prange """ This spec is the result of a number of discussions at Cython workshop 1. Quite a few different ways of expressing parallelism was looked at, and finally we decided to split the problem in two: * A simple and friendly solution that covers, perhaps, 80% of the cases, based on simply replacing range with prange. * Less friendly solutions for the remaining cases. These cases may well not even require language support in Cython, or only in indirect ways (e.g., cdef closures if normal closures are too expensive). This document focuses exclusively on the former solution and does not intend to cover all use-cases for parallel programming, only the most common ones. """ Note that me and Mark talked some more on the way to the airport, and also I got a couple of more ideas afterwards, so everybody interested should probably take a read even if you were there for discussions. To be more specific, here's the main post-workshop changes: * if cython.parallel.firstthreaditer()/lastthreaditer() # Use if-test in loop for thread setup/teardown * An idea for implementing threadnum() in a way so that we can drop the rather complex Context idea. * More thoughts on firstprivate/lastprivate Dag Sverre ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP: prange for parallel loops
Dag Sverre Seljebotn, 04.04.2011 12:17: CEP up at http://wiki.cython.org/enhancements/prange """ Variable handling Rather than explicit declaration of shared/private variables we rely on conventions: * Thread-shared: Variables that are only read and not written in the loop body are shared across threads. Variables that are only used in the else block are considered shared as well. * Thread-private: Variables that are assigned to in the loop body are thread-private. Obviously, the iteration counter is thread-private as well. * Reduction: Variables that only used on the LHS of an inplace operator, such as s above, are marked as targets for reduction. If the variable is also used in other ways (LHS of assignment or in an expression) it does instead turn into a thread-private variable. Note: This means that if one, e.g., inserts printf(... s) above, s is turned into a thread-local variable. OTOH, there is simply no way to correctly emulate the effect printf(... s) would have in a sequential loop, so such code must be discouraged anyway. """ What about simply (ab-)using Python semantics and creating a new inner scope for the prange loop body? That would basically make the loop behave like a closure function, but with the looping header at the 'right' place rather than after the closure. Also, in the example, the local variable declaration of "tmp" outside of the loop looks somewhat misplaced, although it's precedented by comprehensions (which also have their own local scope in Cython). Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP: prange for parallel loops
On 04/04/2011 01:23 PM, Stefan Behnel wrote: Dag Sverre Seljebotn, 04.04.2011 12:17: CEP up at http://wiki.cython.org/enhancements/prange """ Variable handling Rather than explicit declaration of shared/private variables we rely on conventions: * Thread-shared: Variables that are only read and not written in the loop body are shared across threads. Variables that are only used in the else block are considered shared as well. * Thread-private: Variables that are assigned to in the loop body are thread-private. Obviously, the iteration counter is thread-private as well. * Reduction: Variables that only used on the LHS of an inplace operator, such as s above, are marked as targets for reduction. If the variable is also used in other ways (LHS of assignment or in an expression) it does instead turn into a thread-private variable. Note: This means that if one, e.g., inserts printf(... s) above, s is turned into a thread-local variable. OTOH, there is simply no way to correctly emulate the effect printf(... s) would have in a sequential loop, so such code must be discouraged anyway. """ What about simply (ab-)using Python semantics and creating a new inner scope for the prange loop body? That would basically make the loop behave like a closure function, but with the looping header at the 'right' place rather than after the closure. I'm not quite sure what the concrete changes to the CEP this would lead to (assuming you mean this as a proposal for alternative semantics, and not an implementation detail). How would we treat reduction variables? They need to be supported, and there's nothing in Python semantics to support reduction variables, they are a rather special case everywhere. I suppose keeping the reduction clause above, or use the "nonlocal" keyword in the loop body... Also there's the else:-block, although we could make that part of the scope. And the "lastprivate" functionality, although that could be dropped without much loss. Also, in the example, the local variable declaration of "tmp" outside of the loop looks somewhat misplaced, although it's precedented by comprehensions (which also have their own local scope in Cython). Well, depending on the decision of lastprivate, the declaration would need to be outside; I really like the idea of moving "cdef", and am prepared to drop lastprivate for this. Being explicit about thread-local variables does make things a lot safer to use. (One problem is that switching between serial and parallel one needs to move variable declarations. But that only happens once, and one can use "nthreads=1" to disable parallel after that.) An example would then be: def f(np.ndarray[double] x, double alpha): cdef double s = 0, globtmp with nogil: for i in prange(x.shape[0]): cdef double tmp # thread-private tmp = alpha * i # alpha available from global scope s += x[i] * tmp # still automatic reduction for inplace operators # printf(...s) -> now leads to error, since s is not declared thread-private but is read else: # tmp still available here...looks a bit strange, but useful s += tmp * 10 globtmp = tmp # we save tmp for later # tmp not available here, globtmp is return s Or, we just drop support for the else block on these loops. Dag Sverre ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP: prange for parallel loops
Dag Sverre Seljebotn, 04.04.2011 13:53: On 04/04/2011 01:23 PM, Stefan Behnel wrote: Dag Sverre Seljebotn, 04.04.2011 12:17: CEP up at http://wiki.cython.org/enhancements/prange """ Variable handling Rather than explicit declaration of shared/private variables we rely on conventions: * Thread-shared: Variables that are only read and not written in the loop body are shared across threads. Variables that are only used in the else block are considered shared as well. * Thread-private: Variables that are assigned to in the loop body are thread-private. Obviously, the iteration counter is thread-private as well. * Reduction: Variables that only used on the LHS of an inplace operator, such as s above, are marked as targets for reduction. If the variable is also used in other ways (LHS of assignment or in an expression) it does instead turn into a thread-private variable. Note: This means that if one, e.g., inserts printf(... s) above, s is turned into a thread-local variable. OTOH, there is simply no way to correctly emulate the effect printf(... s) would have in a sequential loop, so such code must be discouraged anyway. """ What about simply (ab-)using Python semantics and creating a new inner scope for the prange loop body? That would basically make the loop behave like a closure function, but with the looping header at the 'right' place rather than after the closure. I'm not quite sure what the concrete changes to the CEP this would lead to (assuming you mean this as a proposal for alternative semantics, and not an implementation detail). What I would like to avoid is having to tell users "and now for something completely different". It looks like a loop, but then there's a whole page of new semantics for it. And this also cannot be used in plain Python code due to the differing scoping behaviour. How would we treat reduction variables? They need to be supported, and there's nothing in Python semantics to support reduction variables, they are a rather special case everywhere. I suppose keeping the reduction clause above, or use the "nonlocal" keyword in the loop body... That's what I thought, yes. It looks unexpected, sure. That's the clear advantage of using inner functions, which do not add anything new at all. But if we want to add something that looks more like a loop, we should at least make it behave like something that's easy to explain. Sorry for not taking the opportunity to articulate my scepticism in the workshop discussion. Skipping through the CEP now, I think this feature adds quite some complexity to the language, and I'm not sure it's worth that when compared to the existing closures. The equivalent closure+decorator syntax is certainly easier to explain, and could translate into exactly the same code. But with the clear advantage that the scope of local, nonlocal and thread-configuring variables is immediately obvious. Basically, your example would become def f(np.ndarray[double] x, double alpha): cdef double s = 0 with cython.nogil: @cython.run_parallel_for_loop( range(x.shape[0]) ) cdef threaded_loop(i):# 'nogil' is inherited cdef double tmp = alpha * i nonlocal s s += x[i] * tmp s += alpha * (x.shape[0] - 1) return s We likely agree that this is not beautiful. It's also harder to implement than a "simple" for-in-prange loop. But I find it at least easier to explain and semantically 'obvious'. And it would allow us to write a pure mode implementation for this based on the threading module. Also there's the else:-block, although we could make that part of the scope. Since that's supposed to run single-threaded anyway, it can be written after the loop, right? Or is there really a use case where one of the threads has to do something in parallel, especially based on its local thread state, that the others don't do? And the "lastprivate" functionality, although that could be dropped without much loss. I'm not sure how the "else" block and "lastprivate" could be integrated into the closures approach. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP: prange for parallel loops
On Mon, Apr 4, 2011 at 3:17 AM, Dag Sverre Seljebotn wrote: > * A simple and friendly solution that covers, perhaps, 80% of the cases, > based on simply replacing range with prange. This is a "merely" aesthetic objection, while remaining agnostic on the larger discussion, but -- 'for i in prange(...)' looks Just Wrong. This is not a regular loop over a funny range, it's a funny loop over a regular range. Surely it should be 'pfor i in range(...)'. Or better yet, spell it 'parallel_for'. -- Nathaniel ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP: prange for parallel loops
On 04/04/2011 03:04 PM, Stefan Behnel wrote: Dag Sverre Seljebotn, 04.04.2011 13:53: On 04/04/2011 01:23 PM, Stefan Behnel wrote: Dag Sverre Seljebotn, 04.04.2011 12:17: CEP up at http://wiki.cython.org/enhancements/prange """ Variable handling Rather than explicit declaration of shared/private variables we rely on conventions: * Thread-shared: Variables that are only read and not written in the loop body are shared across threads. Variables that are only used in the else block are considered shared as well. * Thread-private: Variables that are assigned to in the loop body are thread-private. Obviously, the iteration counter is thread-private as well. * Reduction: Variables that only used on the LHS of an inplace operator, such as s above, are marked as targets for reduction. If the variable is also used in other ways (LHS of assignment or in an expression) it does instead turn into a thread-private variable. Note: This means that if one, e.g., inserts printf(... s) above, s is turned into a thread-local variable. OTOH, there is simply no way to correctly emulate the effect printf(... s) would have in a sequential loop, so such code must be discouraged anyway. """ What about simply (ab-)using Python semantics and creating a new inner scope for the prange loop body? That would basically make the loop behave like a closure function, but with the looping header at the 'right' place rather than after the closure. I'm not quite sure what the concrete changes to the CEP this would lead to (assuming you mean this as a proposal for alternative semantics, and not an implementation detail). What I would like to avoid is having to tell users "and now for something completely different". It looks like a loop, but then there's a whole page of new semantics for it. And this also cannot be used in plain Python code due to the differing scoping behaviour. Well, at least it's better than the 300 pages of semantics for OpenMP :-) How would we treat reduction variables? They need to be supported, and there's nothing in Python semantics to support reduction variables, they are a rather special case everywhere. I suppose keeping the reduction clause above, or use the "nonlocal" keyword in the loop body... That's what I thought, yes. It looks unexpected, sure. That's the clear advantage of using inner functions, which do not add anything new at all. But if we want to add something that looks more like a loop, we should at least make it behave like something that's easy to explain. Sorry for not taking the opportunity to articulate my scepticism in the workshop discussion. I like the idea of considering cdef/nonlocal in the prange blocks. But, yes, I do feel that opposing a parallel loop construct in general is rather late, or at least could have been done at a more convenient time... All I know and care about is that a decorator-and-closure solution will be a lot more obscure among non-CS people who have no clue what a closure or decorator is, and those are exactly the people who need this kind of simple 80%-solution. You and me don't really need any support from Cython at all to write multithreaded apps (leaving aesthetics and number of keystrokes to the side). It'd be good to hear Robert's and Mark's opinions before going further, let's economise this thread a bit. Dag Sverre ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP: prange for parallel loops
Den 04.04.2011 15:04, skrev Stefan Behnel: What I would like to avoid is having to tell users "and now for something completely different". It looks like a loop, but then there's a whole page of new semantics for it. And this also cannot be used in plain Python code due to the differing scoping behaviour. I've been working on something similar, which does not involve any changes to Cython, and will work from Python as well. It's been discussed before, basically it involves wrapping a loop in a closure, and then normal Python scoping rules applies. cdef int n @parallel def _parallel_loop(parallel_env): cdef int i, s0, s1 for s0,s1 in parallel_env.range(n): for i in range(s0,s1): pass I am not happy about the verbosity of the wrapper compared to for i in prange(n): pass but this is the best I can do without changing the compiler. Notice e.g. that the loop becomes two nested loops, which is required for efficient work scheduling. Progress is mainly limited by lack of time and personal need. If I ned parallel computing I use Fortran or an optimized LAPACK library (e.g. ACML). Sturla ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP: prange for parallel loops
On 04/04/2011 03:27 PM, Nathaniel Smith wrote: On Mon, Apr 4, 2011 at 3:17 AM, Dag Sverre Seljebotn wrote: * A simple and friendly solution that covers, perhaps, 80% of the cases, based on simply replacing range with prange. This is a "merely" aesthetic objection, while remaining agnostic on the larger discussion, but -- 'for i in prange(...)' looks Just Wrong. This is not a regular loop over a funny range, it's a funny loop over a regular range. Surely it should be 'pfor i in range(...)'. Or better yet, spell it 'parallel_for'. I don't mind calling it "parallel_for" myself, if only a good place to provide scheduling parameters (numthreads, dynamic vs. static scheduling, chunksize) can be found. That would make it more obvious that scoping rules are different too. No sense in discussing this further until the higher-level discussion on whether to do it or not has completed though. Dag Sverre ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP: prange for parallel loops
On 4 April 2011 13:53, Dag Sverre Seljebotn wrote: > On 04/04/2011 01:23 PM, Stefan Behnel wrote: >> >> Dag Sverre Seljebotn, 04.04.2011 12:17: >>> >>> CEP up at http://wiki.cython.org/enhancements/prange >> >> """ >> Variable handling >> >> Rather than explicit declaration of shared/private variables we rely on >> conventions: >> >> * Thread-shared: Variables that are only read and not written in the >> loop body are shared across threads. Variables that are only used in the >> else block are considered shared as well. >> >> * Thread-private: Variables that are assigned to in the loop body are >> thread-private. Obviously, the iteration counter is thread-private as well. >> >> * Reduction: Variables that only used on the LHS of an inplace >> operator, such as s above, are marked as targets for reduction. If the >> variable is also used in other ways (LHS of assignment or in an expression) >> it does instead turn into a thread-private variable. Note: This means that >> if one, e.g., inserts printf(... s) above, s is turned into a thread-local >> variable. OTOH, there is simply no way to correctly emulate the effect >> printf(... s) would have in a sequential loop, so such code must be >> discouraged anyway. >> """ >> >> What about simply (ab-)using Python semantics and creating a new inner >> scope for the prange loop body? That would basically make the loop behave >> like a closure function, but with the looping header at the 'right' place >> rather than after the closure. > > I'm not quite sure what the concrete changes to the CEP this would lead to > (assuming you mean this as a proposal for alternative semantics, and not an > implementation detail). > > How would we treat reduction variables? They need to be supported, and > there's nothing in Python semantics to support reduction variables, they are > a rather special case everywhere. I suppose keeping the reduction clause > above, or use the "nonlocal" keyword in the loop body... > > Also there's the else:-block, although we could make that part of the scope. > And the "lastprivate" functionality, although that could be dropped without > much loss. > >> >> Also, in the example, the local variable declaration of "tmp" outside of >> the loop looks somewhat misplaced, although it's precedented by >> comprehensions (which also have their own local scope in Cython). > > Well, depending on the decision of lastprivate, the declaration would need > to be outside; I really like the idea of moving "cdef", and am prepared to > drop lastprivate for this. > > Being explicit about thread-local variables does make things a lot safer to > use. > > (One problem is that switching between serial and parallel one needs to move > variable declarations. But that only happens once, and one can use > "nthreads=1" to disable parallel after that.) > > An example would then be: > > def f(np.ndarray[double] x, double alpha): > cdef double s = 0, globtmp > with nogil: > for i in prange(x.shape[0]): > cdef double tmp # thread-private > tmp = alpha * i # alpha available from global scope > s += x[i] * tmp # still automatic reduction for inplace operators > # printf(...s) -> now leads to error, since s is not declared > thread-private but is read > else: > # tmp still available here...looks a bit strange, but useful > s += tmp * 10 > globtmp = tmp # we save tmp for later > # tmp not available here, globtmp is > return s > > Or, we just drop support for the else block on these loops. I think since we are disallowing break (yet) we shouldn't support the else clause. Basically, I think we can make the CEP a tad more simple. I think we could declare everything outside of the prange body. Then, in the prange loop body: if a variable is assigned to anywhere -> make it lastprivate - if a variable is read before assigned to -> make it firstprivate in addition to lastprivate (raise compiler error if the variable is not initialized outside of the loop body) if a variable is only ever read -> make it shared (the default for OpenMP) if a variable has an inplace operator -> make it a reduction There is really no reason to disallow reading of the reduction variable (in e.g. a printf). The reduction should also be initialized outside of the prange body. Then prange() could be implemented in pure mode as simply the sequential version, i.e. range() which some more arguments. For any scratch space buffers etc, I'd prefer something like with cython.parallel: cdef char *buf = malloc(100) for i in prange(n): use buf free(buf) At least it fits my brain pretty well :) (this code does however assume that malloc is thread-safe). Anyway, I'm not sure I just covered all cases, but what do you think? > Dag Sverre > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/
Re: [Cython] CEP: prange for parallel loops
On 04/04/2011 03:04 PM, Stefan Behnel wrote: That's what I thought, yes. It looks unexpected, sure. That's the clear advantage of using inner functions, which do not add anything new at all. But if we want to add something that looks more like a loop, we should at least make it behave like something that's easy to explain. Sorry for not taking the opportunity to articulate my scepticism in the workshop discussion. Skipping through the CEP now, I think this feature adds quite some complexity to the language, and I'm not sure it's worth that when compared to the existing closures. The equivalent closure+decorator syntax is certainly easier to explain, and could translate into exactly the same code. But with the clear advantage that the scope of local, nonlocal and thread-configuring variables is immediately obvious. Basically, your example would become def f(np.ndarray[double] x, double alpha): cdef double s = 0 with cython.nogil: @cython.run_parallel_for_loop( range(x.shape[0]) ) cdef threaded_loop(i):# 'nogil' is inherited cdef double tmp = alpha * i nonlocal s s += x[i] * tmp s += alpha * (x.shape[0] - 1) return s We likely agree that this is not beautiful. It's also harder to implement than a "simple" for-in-prange loop. But I find it at least easier to explain and semantically 'obvious'. And it would allow us to write a pure mode implementation for this based on the threading module. Short clarification on this example: There is still magic going on here in the reduction variable -- one must have a version of "s" for each thread, and then reduce at the end. (Stefan: I realize that you may know this, I'm just making sure everything is stated clearly in this discussion.) Dag Sverre ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP: prange for parallel loops
On 04/04/2011 05:22 PM, mark florisson wrote: On 4 April 2011 13:53, Dag Sverre Seljebotn wrote: On 04/04/2011 01:23 PM, Stefan Behnel wrote: Dag Sverre Seljebotn, 04.04.2011 12:17: CEP up at http://wiki.cython.org/enhancements/prange """ Variable handling Rather than explicit declaration of shared/private variables we rely on conventions: * Thread-shared: Variables that are only read and not written in the loop body are shared across threads. Variables that are only used in the else block are considered shared as well. * Thread-private: Variables that are assigned to in the loop body are thread-private. Obviously, the iteration counter is thread-private as well. * Reduction: Variables that only used on the LHS of an inplace operator, such as s above, are marked as targets for reduction. If the variable is also used in other ways (LHS of assignment or in an expression) it does instead turn into a thread-private variable. Note: This means that if one, e.g., inserts printf(... s) above, s is turned into a thread-local variable. OTOH, there is simply no way to correctly emulate the effect printf(... s) would have in a sequential loop, so such code must be discouraged anyway. """ What about simply (ab-)using Python semantics and creating a new inner scope for the prange loop body? That would basically make the loop behave like a closure function, but with the looping header at the 'right' place rather than after the closure. I'm not quite sure what the concrete changes to the CEP this would lead to (assuming you mean this as a proposal for alternative semantics, and not an implementation detail). How would we treat reduction variables? They need to be supported, and there's nothing in Python semantics to support reduction variables, they are a rather special case everywhere. I suppose keeping the reduction clause above, or use the "nonlocal" keyword in the loop body... Also there's the else:-block, although we could make that part of the scope. And the "lastprivate" functionality, although that could be dropped without much loss. Also, in the example, the local variable declaration of "tmp" outside of the loop looks somewhat misplaced, although it's precedented by comprehensions (which also have their own local scope in Cython). Well, depending on the decision of lastprivate, the declaration would need to be outside; I really like the idea of moving "cdef", and am prepared to drop lastprivate for this. Being explicit about thread-local variables does make things a lot safer to use. (One problem is that switching between serial and parallel one needs to move variable declarations. But that only happens once, and one can use "nthreads=1" to disable parallel after that.) An example would then be: def f(np.ndarray[double] x, double alpha): cdef double s = 0, globtmp with nogil: for i in prange(x.shape[0]): cdef double tmp # thread-private tmp = alpha * i # alpha available from global scope s += x[i] * tmp # still automatic reduction for inplace operators # printf(...s) -> now leads to error, since s is not declared thread-private but is read else: # tmp still available here...looks a bit strange, but useful s += tmp * 10 globtmp = tmp # we save tmp for later # tmp not available here, globtmp is return s Or, we just drop support for the else block on these loops. I think since we are disallowing break (yet) we shouldn't support the else clause. Basically, I think we can make the CEP a tad more simple. I think we could declare everything outside of the prange body. Then, in the prange loop body: if a variable is assigned to anywhere -> make it lastprivate - if a variable is read before assigned to -> make it firstprivate in addition to lastprivate (raise compiler error if the variable is not initialized outside of the loop body) if a variable is only ever read -> make it shared (the default for OpenMP) if a variable has an inplace operator -> make it a reduction There is really no reason to disallow reading of the reduction variable (in e.g. a printf). The reduction should also be initialized outside of the prange body. The reason for disallowing reading the reduction variable is that otherwise you have a contradiction above, since a reduction variable may also be a thread-local variable. Or, you disable inplace operators for thread-local variables? (ugh) That's the main reason I'm leaning towards explicit declaring local variables using "cdef". If we're reducing complexity BTW, I'd rather remove firstprivate/lastprivate alltogether, see below. Then prange() could be implemented in pure mode as simply the sequential version, i.e. range() which some more arguments. For any scratch space buffers etc, I'd prefer something like with cython.parallel: cdef char *buf = malloc(100) for i in prange(n):
Re: [Cython] CEP: prange for parallel loops
On 4 April 2011 19:18, Dag Sverre Seljebotn wrote: > On 04/04/2011 05:22 PM, mark florisson wrote: >> >> On 4 April 2011 13:53, Dag Sverre Seljebotn >> wrote: >>> >>> On 04/04/2011 01:23 PM, Stefan Behnel wrote: Dag Sverre Seljebotn, 04.04.2011 12:17: > > CEP up at http://wiki.cython.org/enhancements/prange """ Variable handling Rather than explicit declaration of shared/private variables we rely on conventions: * Thread-shared: Variables that are only read and not written in the loop body are shared across threads. Variables that are only used in the else block are considered shared as well. * Thread-private: Variables that are assigned to in the loop body are thread-private. Obviously, the iteration counter is thread-private as well. * Reduction: Variables that only used on the LHS of an inplace operator, such as s above, are marked as targets for reduction. If the variable is also used in other ways (LHS of assignment or in an expression) it does instead turn into a thread-private variable. Note: This means that if one, e.g., inserts printf(... s) above, s is turned into a thread-local variable. OTOH, there is simply no way to correctly emulate the effect printf(... s) would have in a sequential loop, so such code must be discouraged anyway. """ What about simply (ab-)using Python semantics and creating a new inner scope for the prange loop body? That would basically make the loop behave like a closure function, but with the looping header at the 'right' place rather than after the closure. >>> >>> I'm not quite sure what the concrete changes to the CEP this would lead >>> to >>> (assuming you mean this as a proposal for alternative semantics, and not >>> an >>> implementation detail). >>> >>> How would we treat reduction variables? They need to be supported, and >>> there's nothing in Python semantics to support reduction variables, they >>> are >>> a rather special case everywhere. I suppose keeping the reduction clause >>> above, or use the "nonlocal" keyword in the loop body... >>> >>> Also there's the else:-block, although we could make that part of the >>> scope. >>> And the "lastprivate" functionality, although that could be dropped >>> without >>> much loss. >>> Also, in the example, the local variable declaration of "tmp" outside of the loop looks somewhat misplaced, although it's precedented by comprehensions (which also have their own local scope in Cython). >>> >>> Well, depending on the decision of lastprivate, the declaration would >>> need >>> to be outside; I really like the idea of moving "cdef", and am prepared >>> to >>> drop lastprivate for this. >>> >>> Being explicit about thread-local variables does make things a lot safer >>> to >>> use. >>> >>> (One problem is that switching between serial and parallel one needs to >>> move >>> variable declarations. But that only happens once, and one can use >>> "nthreads=1" to disable parallel after that.) >>> >>> An example would then be: >>> >>> def f(np.ndarray[double] x, double alpha): >>> cdef double s = 0, globtmp >>> with nogil: >>> for i in prange(x.shape[0]): >>> cdef double tmp # thread-private >>> tmp = alpha * i # alpha available from global scope >>> s += x[i] * tmp # still automatic reduction for inplace >>> operators >>> # printf(...s) -> now leads to error, since s is not declared >>> thread-private but is read >>> else: >>> # tmp still available here...looks a bit strange, but useful >>> s += tmp * 10 >>> globtmp = tmp # we save tmp for later >>> # tmp not available here, globtmp is >>> return s >>> >>> Or, we just drop support for the else block on these loops. >> >> I think since we are disallowing break (yet) we shouldn't support the >> else clause. Basically, I think we can make the CEP a tad more simple. >> >> I think we could declare everything outside of the prange body. Then, >> in the prange loop body: >> >> if a variable is assigned to anywhere -> make it lastprivate >> - if a variable is read before assigned to -> make it >> firstprivate in addition to lastprivate (raise compiler error if the >> variable is not initialized outside of the loop body) >> >> if a variable is only ever read -> make it shared (the default for >> OpenMP) >> >> if a variable has an inplace operator -> make it a reduction >> >> There is really no reason to disallow reading of the reduction >> variable (in e.g. a printf). The reduction should also be initialized >> outside of the prange body. > > The reason for disallowing reading the reduction variable is that otherwise > you have a contradiction above, since a reduction variable may also be a > thread-local variable. Or, you disable inplace operators
Re: [Cython] problem building master with python3
2011/4/4 Darren Dale : > On Mon, Apr 4, 2011 at 3:32 PM, Darren Dale wrote: >> I'm attempting to install cython from the git repository to benefit >> from this fix: http://trac.cython.org/cython_trac/ticket/597 . When I >> run "python3 setup.py install --user", I get an error: >> >> cythoning /Users/darren/Projects/cython/Cython/Compiler/Code.py to >> /Users/darren/Projects/cython/Cython/Compiler/Code.c >> >> Error compiling Cython file: >> >> ... >> self.cname = cname >> self.text = text >> self.escaped_value = StringEncoding.escape_byte_string(byte_string) >> self.py_strings = None >> >> def get_py_string_const(self, encoding, identifier=None, is_str=False): >> ^ >> >> >> Cython/Compiler/Code.py:320:4: Signature not compatible with previous >> declaration >> >> Error compiling Cython file: >> >> ... >> cdef public object text >> cdef public object escaped_value >> cdef public dict py_strings >> >> @cython.locals(intern=bint, is_str=bint, is_unicode=bint) >> cpdef get_py_string_const(self, encoding, identifier=*, is_str=*) >> ^ >> >> >> Cython/Compiler/Code.pxd:64:30: Previous declaration is here >> building 'Cython.Compiler.Code' extension >> /usr/bin/gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g >> -fwrapv -O3 -Wall -Wstrict-prototypes -O2 >> -I/opt/local/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m >> -c /Users/darren/Projects/cython/Cython/Compiler/Code.c -o >> build/temp.macosx-10.6-x86_64-3.2/Users/darren/Projects/cython/Cython/Compiler/Code.o >> /Users/darren/Projects/cython/Cython/Compiler/Code.c:1:2: error: >> #error Do not use this file, it is the result of a failed Cython >> compilation. >> error: command '/usr/bin/gcc-4.2' failed with exit status 1 >> > > Actually, I get this same error when I try to build with python-2.7 as well. > > Darren This one fails too :( Generators branch is okay. But upstream after merge isn't :( vitja@vitja-laptop:~/work/cython.git$ cat ttt.py def foo(is_str=False): pass vitja@vitja-laptop:~/work/cython.git$ cat ttt.pxd cimport cython @cython.locals(is_str=cython.bint) cdef foo(is_str=*) -- vitja. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP: prange for parallel loops
On 04/04/2011 09:26 PM, mark florisson wrote: On 4 April 2011 19:18, Dag Sverre Seljebotn wrote: On 04/04/2011 05:22 PM, mark florisson wrote: On 4 April 2011 13:53, Dag Sverre Seljebotn wrote: On 04/04/2011 01:23 PM, Stefan Behnel wrote: Dag Sverre Seljebotn, 04.04.2011 12:17: CEP up at http://wiki.cython.org/enhancements/prange """ Variable handling Rather than explicit declaration of shared/private variables we rely on conventions: * Thread-shared: Variables that are only read and not written in the loop body are shared across threads. Variables that are only used in the else block are considered shared as well. * Thread-private: Variables that are assigned to in the loop body are thread-private. Obviously, the iteration counter is thread-private as well. * Reduction: Variables that only used on the LHS of an inplace operator, such as s above, are marked as targets for reduction. If the variable is also used in other ways (LHS of assignment or in an expression) it does instead turn into a thread-private variable. Note: This means that if one, e.g., inserts printf(... s) above, s is turned into a thread-local variable. OTOH, there is simply no way to correctly emulate the effect printf(... s) would have in a sequential loop, so such code must be discouraged anyway. """ What about simply (ab-)using Python semantics and creating a new inner scope for the prange loop body? That would basically make the loop behave like a closure function, but with the looping header at the 'right' place rather than after the closure. I'm not quite sure what the concrete changes to the CEP this would lead to (assuming you mean this as a proposal for alternative semantics, and not an implementation detail). How would we treat reduction variables? They need to be supported, and there's nothing in Python semantics to support reduction variables, they are a rather special case everywhere. I suppose keeping the reduction clause above, or use the "nonlocal" keyword in the loop body... Also there's the else:-block, although we could make that part of the scope. And the "lastprivate" functionality, although that could be dropped without much loss. Also, in the example, the local variable declaration of "tmp" outside of the loop looks somewhat misplaced, although it's precedented by comprehensions (which also have their own local scope in Cython). Well, depending on the decision of lastprivate, the declaration would need to be outside; I really like the idea of moving "cdef", and am prepared to drop lastprivate for this. Being explicit about thread-local variables does make things a lot safer to use. (One problem is that switching between serial and parallel one needs to move variable declarations. But that only happens once, and one can use "nthreads=1" to disable parallel after that.) An example would then be: def f(np.ndarray[double] x, double alpha): cdef double s = 0, globtmp with nogil: for i in prange(x.shape[0]): cdef double tmp # thread-private tmp = alpha * i # alpha available from global scope s += x[i] * tmp # still automatic reduction for inplace operators # printf(...s) ->now leads to error, since s is not declared thread-private but is read else: # tmp still available here...looks a bit strange, but useful s += tmp * 10 globtmp = tmp # we save tmp for later # tmp not available here, globtmp is return s Or, we just drop support for the else block on these loops. I think since we are disallowing break (yet) we shouldn't support the else clause. Basically, I think we can make the CEP a tad more simple. I think we could declare everything outside of the prange body. Then, in the prange loop body: if a variable is assigned to anywhere ->make it lastprivate - if a variable is read before assigned to ->make it firstprivate in addition to lastprivate (raise compiler error if the variable is not initialized outside of the loop body) if a variable is only ever read ->make it shared (the default for OpenMP) if a variable has an inplace operator ->make it a reduction There is really no reason to disallow reading of the reduction variable (in e.g. a printf). The reduction should also be initialized outside of the prange body. The reason for disallowing reading the reduction variable is that otherwise you have a contradiction above, since a reduction variable may also be a thread-local variable. Or, you disable inplace operators for thread-local variables? (ugh) Yes, an inplace operator would make it a reduction variable, just like assigning something makes it lastprivate, only reading makes it shared and reading before writing makes it firstprivate in addition to lastprivate. This is all implicit. Alternatively, if you want it more explicit, then instead of the inplace operator you could allow something like sum = cyt
Re: [Cython] CEP: prange for parallel loops
Nathaniel Smith wrote: Surely it should be 'pfor i in range(...)'. Or 'pfhor', just to let you know it's really something out of this world. http://marathongame.wikia.com/wiki/Pfhor_%28Race%29 -- Greg ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP: prange for parallel loops
On Mon, Apr 4, 2011 at 6:04 AM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 04.04.2011 13:53: >> >> On 04/04/2011 01:23 PM, Stefan Behnel wrote: >>> >>> Dag Sverre Seljebotn, 04.04.2011 12:17: CEP up at http://wiki.cython.org/enhancements/prange >>> >>> """ >>> Variable handling >>> >>> Rather than explicit declaration of shared/private variables we rely on >>> conventions: >>> >>> * Thread-shared: Variables that are only read and not written in the loop >>> body are shared across threads. Variables that are only used in the else >>> block are considered shared as well. >>> >>> * Thread-private: Variables that are assigned to in the loop body are >>> thread-private. Obviously, the iteration counter is thread-private as >>> well. >>> >>> * Reduction: Variables that only used on the LHS of an inplace operator, >>> such as s above, are marked as targets for reduction. If the variable is >>> also used in other ways (LHS of assignment or in an expression) it does >>> instead turn into a thread-private variable. Note: This means that if >>> one, e.g., inserts printf(... s) above, s is turned into a thread-local >>> variable. OTOH, there is simply no way to correctly emulate the effect >>> printf(... s) would have in a sequential loop, so such code must be >>> discouraged anyway. >>> """ >>> >>> What about simply (ab-)using Python semantics and creating a new inner >>> scope for the prange loop body? That would basically make the loop behave >>> like a closure function, but with the looping header at the 'right' place >>> rather than after the closure. >> >> I'm not quite sure what the concrete changes to the CEP this would lead to >> (assuming you mean this as a proposal for alternative semantics, and not >> an >> implementation detail). > > What I would like to avoid is having to tell users "and now for something > completely different". It looks like a loop, but then there's a whole page > of new semantics for it. And this also cannot be used in plain Python code > due to the differing scoping behaviour. The same could be said of OpenMP--it looks exactly like a loop except for a couple of pragmas. The proposed (as I'm reading the CEP now) semantics of what's shared and first/last private and reduction would give it the semantics of a normal, sequential loop (and if your final result changes based on how many threads were involved then you've got incorrect code). Perhaps reading of the reduction variable could be fine (though obviously ill-defined, suitable only for debugging). >> How would we treat reduction variables? They need to be supported, and >> there's nothing in Python semantics to support reduction variables, they >> are a rather special case everywhere. I suppose keeping the reduction >> clause above, or use the "nonlocal" keyword in the loop body... > > That's what I thought, yes. It looks unexpected, sure. That's the clear > advantage of using inner functions, which do not add anything new at all. > But if we want to add something that looks more like a loop, we should at > least make it behave like something that's easy to explain. > > Sorry for not taking the opportunity to articulate my scepticism in the > workshop discussion. Skipping through the CEP now, I think this feature adds > quite some complexity to the language, and I'm not sure it's worth that when > compared to the existing closures. The equivalent closure+decorator syntax > is certainly easier to explain, and could translate into exactly the same > code. But with the clear advantage that the scope of local, nonlocal and > thread-configuring variables is immediately obvious. > > Basically, your example would become > > def f(np.ndarray[double] x, double alpha): > cdef double s = 0 > > with cython.nogil: > @cython.run_parallel_for_loop( range(x.shape[0]) ) > cdef threaded_loop(i): # 'nogil' is inherited > cdef double tmp = alpha * i > nonlocal s > s += x[i] * tmp > s += alpha * (x.shape[0] - 1) > return s > > We likely agree that this is not beautiful. It's also harder to implement > than a "simple" for-in-prange loop. But I find it at least easier to explain > and semantically 'obvious'. And it would allow us to write a pure mode > implementation for this based on the threading module. I'm not opposed to having something like this, it's a whole lot of code and extra refactoring for the basic usecase. I think a nice, clean syntax is worthwhile and requires at lest some level of language support. In some ways it's like buffer support--what goes on under the hood does take some explaining, but most of the time it works as expected (i.e. as if you hadn't declared the type), just faster. The inner workings of prange may be a bit magical, but the intent is not, and the latter is what users care about. - Robert ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-deve
Re: [Cython] Interest in contributing to the project
Thanks for clarification Sturla, that's just the way I was thinking about some things... I realized that you used some C code that is in _math.h header file, I mean, I was thinking that in the project I should rewrite code that belongs to this file too right? I started coding but I got stucked in functions like Py_Is_Infinite and Py_Is_NaN... I saw Sturla e-mail but I thought this would be wrote in a different way, was I wrong? I also started to write a proposal for this project and hope to publish it here tomorrow for your evaluation. Another point that I'm thinking about is how the profile results should be organized. Is there any template for this? Best Regards []s Arthur 2011/4/3 Sturla Molden > Den 04.04.2011 01:49, skrev Sturla Molden: > > Also observe that we do not release the GIL here. That is not because >> these functions are not thread-safe, they are, but yielding the GIL will >> slow things terribly. >> > > Oh, actually they are not thread-safe because we set errno... Sorry. > > Sturla > > > > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel > ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP: prange for parallel loops
On 04/05/2011 07:05 AM, Robert Bradshaw wrote: On Mon, Apr 4, 2011 at 6:04 AM, Stefan Behnel wrote: Dag Sverre Seljebotn, 04.04.2011 13:53: On 04/04/2011 01:23 PM, Stefan Behnel wrote: Dag Sverre Seljebotn, 04.04.2011 12:17: CEP up at http://wiki.cython.org/enhancements/prange """ Variable handling Rather than explicit declaration of shared/private variables we rely on conventions: * Thread-shared: Variables that are only read and not written in the loop body are shared across threads. Variables that are only used in the else block are considered shared as well. * Thread-private: Variables that are assigned to in the loop body are thread-private. Obviously, the iteration counter is thread-private as well. * Reduction: Variables that only used on the LHS of an inplace operator, such as s above, are marked as targets for reduction. If the variable is also used in other ways (LHS of assignment or in an expression) it does instead turn into a thread-private variable. Note: This means that if one, e.g., inserts printf(... s) above, s is turned into a thread-local variable. OTOH, there is simply no way to correctly emulate the effect printf(... s) would have in a sequential loop, so such code must be discouraged anyway. """ What about simply (ab-)using Python semantics and creating a new inner scope for the prange loop body? That would basically make the loop behave like a closure function, but with the looping header at the 'right' place rather than after the closure. I'm not quite sure what the concrete changes to the CEP this would lead to (assuming you mean this as a proposal for alternative semantics, and not an implementation detail). What I would like to avoid is having to tell users "and now for something completely different". It looks like a loop, but then there's a whole page of new semantics for it. And this also cannot be used in plain Python code due to the differing scoping behaviour. The same could be said of OpenMP--it looks exactly like a loop except for a couple of pragmas. The proposed (as I'm reading the CEP now) semantics of what's shared and first/last private and reduction would give it the semantics of a normal, sequential loop (and if your final result changes based on how many threads were involved then you've got incorrect code). Perhaps reading of the reduction variable could be fine (though obviously ill-defined, suitable only for debugging). So would you disable inplace operators for thread-private variables? Otherwise a variable could be both a reduction variable and thread-private... There's a reason I disabled reading the reduction variable (which I should have written down). Dag Sverre ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel