[Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

CEP up at http://wiki.cython.org/enhancements/prange

"""
This spec is the result of a number of discussions at Cython workshop 1. 
Quite a few different ways of expressing parallelism was looked at, and 
finally we decided to split the problem in two:


 * A simple and friendly solution that covers, perhaps, 80% of the 
cases, based on simply replacing range with prange.


 * Less friendly solutions for the remaining cases. These cases may 
well not even require language support in Cython, or only in indirect 
ways (e.g., cdef closures if normal closures are too expensive).


This document focuses exclusively on the former solution and does not 
intend to cover all use-cases for parallel programming, only the most 
common ones.

"""

Note that me and Mark talked some more on the way to the airport, and 
also I got a couple of more ideas afterwards, so everybody interested 
should probably take a read even if you were there for discussions.


Main post-workshop changes:

 * cython.parallel.firstiteration()/lastiteration # for in-loop if-test 
for thread setup/teardown blocks


 * An idea for how to implement numthreads(), so that we can drop the 
rather complex Context idea.


 * More thoughts on firstprivate/lastprivate


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


[Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

CEP up at http://wiki.cython.org/enhancements/prange

"""
This spec is the result of a number of discussions at Cython workshop 1. 
Quite a few different ways of expressing parallelism was looked at, and 
finally we decided to split the problem in two:


 * A simple and friendly solution that covers, perhaps, 80% of the 
cases, based on simply replacing range with prange.


 * Less friendly solutions for the remaining cases. These cases may 
well not even require language support in Cython, or only in indirect 
ways (e.g., cdef closures if normal closures are too expensive).


This document focuses exclusively on the former solution and does not 
intend to cover all use-cases for parallel programming, only the most 
common ones.

"""

Note that me and Mark talked some more on the way to the airport, and 
also I got a couple of more ideas afterwards, so everybody interested 
should probably take a read even if you were there for discussions.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/04/2011 11:43 AM, Dag Sverre Seljebotn wrote:

CEP up at http://wiki.cython.org/enhancements/prange

"""
This spec is the result of a number of discussions at Cython workshop 
1. Quite a few different ways of expressing parallelism was looked at, 
and finally we decided to split the problem in two:


 * A simple and friendly solution that covers, perhaps, 80% of the 
cases, based on simply replacing range with prange.


 * Less friendly solutions for the remaining cases. These cases may 
well not even require language support in Cython, or only in indirect 
ways (e.g., cdef closures if normal closures are too expensive).


This document focuses exclusively on the former solution and does not 
intend to cover all use-cases for parallel programming, only the most 
common ones.

"""

Note that me and Mark talked some more on the way to the airport, and 
also I got a couple of more ideas afterwards, so everybody interested 
should probably take a read even if you were there for discussions.


To be more specific, here's the main post-workshop changes:

 * if cython.parallel.firstthreaditer()/lastthreaditer() # Use if-test 
in loop for thread setup/teardown


 * An idea for implementing threadnum() in a way so that we can drop 
the rather complex Context idea.


 * More thoughts on firstprivate/lastprivate

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Stefan Behnel

Dag Sverre Seljebotn, 04.04.2011 12:17:

CEP up at http://wiki.cython.org/enhancements/prange


"""
Variable handling

Rather than explicit declaration of shared/private variables we rely on 
conventions:


* Thread-shared: Variables that are only read and not written in the 
loop body are shared across threads. Variables that are only used in the 
else block are considered shared as well.


* Thread-private: Variables that are assigned to in the loop body are 
thread-private. Obviously, the iteration counter is thread-private as well.


* Reduction: Variables that only used on the LHS of an inplace 
operator, such as s above, are marked as targets for reduction. If the 
variable is also used in other ways (LHS of assignment or in an expression) 
it does instead turn into a thread-private variable. Note: This means that 
if one, e.g., inserts printf(... s) above, s is turned into a thread-local 
variable. OTOH, there is simply no way to correctly emulate the effect 
printf(... s) would have in a sequential loop, so such code must be 
discouraged anyway.

"""

What about simply (ab-)using Python semantics and creating a new inner 
scope for the prange loop body? That would basically make the loop behave 
like a closure function, but with the looping header at the 'right' place 
rather than after the closure.


Also, in the example, the local variable declaration of "tmp" outside of 
the loop looks somewhat misplaced, although it's precedented by 
comprehensions (which also have their own local scope in Cython).


Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/04/2011 01:23 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 04.04.2011 12:17:

CEP up at http://wiki.cython.org/enhancements/prange


"""
Variable handling

Rather than explicit declaration of shared/private variables we rely 
on conventions:


* Thread-shared: Variables that are only read and not written in 
the loop body are shared across threads. Variables that are only used 
in the else block are considered shared as well.


* Thread-private: Variables that are assigned to in the loop body 
are thread-private. Obviously, the iteration counter is thread-private 
as well.


* Reduction: Variables that only used on the LHS of an inplace 
operator, such as s above, are marked as targets for reduction. If the 
variable is also used in other ways (LHS of assignment or in an 
expression) it does instead turn into a thread-private variable. Note: 
This means that if one, e.g., inserts printf(... s) above, s is turned 
into a thread-local variable. OTOH, there is simply no way to 
correctly emulate the effect printf(... s) would have in a sequential 
loop, so such code must be discouraged anyway.

"""

What about simply (ab-)using Python semantics and creating a new inner 
scope for the prange loop body? That would basically make the loop 
behave like a closure function, but with the looping header at the 
'right' place rather than after the closure.


I'm not quite sure what the concrete changes to the CEP this would lead 
to (assuming you mean this as a proposal for alternative semantics, and 
not an implementation detail).


How would we treat reduction variables? They need to be supported, and 
there's nothing in Python semantics to support reduction variables, they 
are a rather special case everywhere. I suppose keeping the reduction 
clause above, or use the "nonlocal" keyword in the loop body...


Also there's the else:-block, although we could make that part of the 
scope. And the "lastprivate" functionality, although that could be 
dropped without much loss.




Also, in the example, the local variable declaration of "tmp" outside 
of the loop looks somewhat misplaced, although it's precedented by 
comprehensions (which also have their own local scope in Cython).


Well, depending on the decision of lastprivate, the declaration would 
need to be outside; I really like the idea of moving "cdef", and am 
prepared to drop lastprivate for this.


Being explicit about thread-local variables does make things a lot safer 
to use.


(One problem is that switching between serial and parallel one needs to 
move variable declarations. But that only happens once, and one can use 
"nthreads=1" to disable parallel after that.)


An example would then be:

def f(np.ndarray[double] x, double alpha):
cdef double s = 0, globtmp
with nogil:
for i in prange(x.shape[0]):
cdef double tmp # thread-private
tmp = alpha * i # alpha available from global scope
s += x[i] * tmp # still automatic reduction for inplace 
operators
# printf(...s) -> now leads to error, since s is not 
declared thread-private but is read

else:
# tmp still available here...looks a bit strange, but useful
s += tmp * 10
globtmp = tmp # we save tmp for later
# tmp not available here, globtmp is
return s

Or, we just drop support for the else block on these loops.

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Stefan Behnel

Dag Sverre Seljebotn, 04.04.2011 13:53:

On 04/04/2011 01:23 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 04.04.2011 12:17:

CEP up at http://wiki.cython.org/enhancements/prange


"""
Variable handling

Rather than explicit declaration of shared/private variables we rely on
conventions:

* Thread-shared: Variables that are only read and not written in the loop
body are shared across threads. Variables that are only used in the else
block are considered shared as well.

* Thread-private: Variables that are assigned to in the loop body are
thread-private. Obviously, the iteration counter is thread-private as well.

* Reduction: Variables that only used on the LHS of an inplace operator,
such as s above, are marked as targets for reduction. If the variable is
also used in other ways (LHS of assignment or in an expression) it does
instead turn into a thread-private variable. Note: This means that if
one, e.g., inserts printf(... s) above, s is turned into a thread-local
variable. OTOH, there is simply no way to correctly emulate the effect
printf(... s) would have in a sequential loop, so such code must be
discouraged anyway.
"""

What about simply (ab-)using Python semantics and creating a new inner
scope for the prange loop body? That would basically make the loop behave
like a closure function, but with the looping header at the 'right' place
rather than after the closure.


I'm not quite sure what the concrete changes to the CEP this would lead to
(assuming you mean this as a proposal for alternative semantics, and not an
implementation detail).


What I would like to avoid is having to tell users "and now for something 
completely different". It looks like a loop, but then there's a whole page 
of new semantics for it. And this also cannot be used in plain Python code 
due to the differing scoping behaviour.




How would we treat reduction variables? They need to be supported, and
there's nothing in Python semantics to support reduction variables, they
are a rather special case everywhere. I suppose keeping the reduction
clause above, or use the "nonlocal" keyword in the loop body...


That's what I thought, yes. It looks unexpected, sure. That's the clear 
advantage of using inner functions, which do not add anything new at all. 
But if we want to add something that looks more like a loop, we should at 
least make it behave like something that's easy to explain.


Sorry for not taking the opportunity to articulate my scepticism in the 
workshop discussion. Skipping through the CEP now, I think this feature 
adds quite some complexity to the language, and I'm not sure it's worth 
that when compared to the existing closures. The equivalent 
closure+decorator syntax is certainly easier to explain, and could 
translate into exactly the same code. But with the clear advantage that the 
scope of local, nonlocal and thread-configuring variables is immediately 
obvious.


Basically, your example would become

def f(np.ndarray[double] x, double alpha):
cdef double s = 0

with cython.nogil:
@cython.run_parallel_for_loop( range(x.shape[0]) )
cdef threaded_loop(i):# 'nogil' is inherited
cdef double tmp = alpha * i
nonlocal s
s += x[i] * tmp
s += alpha * (x.shape[0] - 1)
return s

We likely agree that this is not beautiful. It's also harder to implement 
than a "simple" for-in-prange loop. But I find it at least easier to 
explain and semantically 'obvious'. And it would allow us to write a pure 
mode implementation for this based on the threading module.




Also there's the else:-block, although we could make that part of the
scope.


Since that's supposed to run single-threaded anyway, it can be written 
after the loop, right? Or is there really a use case where one of the 
threads has to do something in parallel, especially based on its local 
thread state, that the others don't do?




And the "lastprivate" functionality, although that could be dropped
without much loss.


I'm not sure how the "else" block and "lastprivate" could be integrated 
into the closures approach.


Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Nathaniel Smith
On Mon, Apr 4, 2011 at 3:17 AM, Dag Sverre Seljebotn
 wrote:
>  * A simple and friendly solution that covers, perhaps, 80% of the cases,
> based on simply replacing range with prange.

This is a "merely" aesthetic objection, while remaining agnostic on
the larger discussion, but -- 'for i in prange(...)' looks Just Wrong.
This is not a regular loop over a funny range, it's a funny loop over
a regular range. Surely it should be 'pfor i in range(...)'. Or better
yet, spell it 'parallel_for'.

-- Nathaniel
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/04/2011 03:04 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 04.04.2011 13:53:

On 04/04/2011 01:23 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 04.04.2011 12:17:

CEP up at http://wiki.cython.org/enhancements/prange


"""
Variable handling

Rather than explicit declaration of shared/private variables we rely on
conventions:

* Thread-shared: Variables that are only read and not written in the 
loop
body are shared across threads. Variables that are only used in the 
else

block are considered shared as well.

* Thread-private: Variables that are assigned to in the loop body are
thread-private. Obviously, the iteration counter is thread-private 
as well.


* Reduction: Variables that only used on the LHS of an inplace 
operator,
such as s above, are marked as targets for reduction. If the 
variable is

also used in other ways (LHS of assignment or in an expression) it does
instead turn into a thread-private variable. Note: This means that if
one, e.g., inserts printf(... s) above, s is turned into a thread-local
variable. OTOH, there is simply no way to correctly emulate the effect
printf(... s) would have in a sequential loop, so such code must be
discouraged anyway.
"""

What about simply (ab-)using Python semantics and creating a new inner
scope for the prange loop body? That would basically make the loop 
behave
like a closure function, but with the looping header at the 'right' 
place

rather than after the closure.


I'm not quite sure what the concrete changes to the CEP this would 
lead to
(assuming you mean this as a proposal for alternative semantics, and 
not an

implementation detail).


What I would like to avoid is having to tell users "and now for 
something completely different". It looks like a loop, but then 
there's a whole page of new semantics for it. And this also cannot be 
used in plain Python code due to the differing scoping behaviour.


Well, at least it's better than the 300 pages of semantics for OpenMP :-)





How would we treat reduction variables? They need to be supported, and
there's nothing in Python semantics to support reduction variables, they
are a rather special case everywhere. I suppose keeping the reduction
clause above, or use the "nonlocal" keyword in the loop body...


That's what I thought, yes. It looks unexpected, sure. That's the 
clear advantage of using inner functions, which do not add anything 
new at all. But if we want to add something that looks more like a 
loop, we should at least make it behave like something that's easy to 
explain.


Sorry for not taking the opportunity to articulate my scepticism in 
the workshop discussion.



I like the idea of considering cdef/nonlocal in the prange blocks. But, 
yes, I do feel that opposing a parallel loop construct in general is 
rather late, or at least could have been done at a more convenient time...


All I know and care about is that a decorator-and-closure solution will 
be a lot more obscure among non-CS people who have no clue what a 
closure or decorator is, and those are exactly the people who need this 
kind of simple 80%-solution.  You and me don't really need any support 
from Cython at all to write multithreaded apps (leaving aesthetics and 
number of keystrokes to the side).


It'd be good to hear Robert's and Mark's opinions before going further, 
let's economise this thread a bit.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Sturla Molden

Den 04.04.2011 15:04, skrev Stefan Behnel:


What I would like to avoid is having to tell users "and now for 
something completely different". It looks like a loop, but then 
there's a whole page of new semantics for it. And this also cannot be 
used in plain Python code due to the differing scoping behaviour.




I've been working on something similar, which does not involve any 
changes to Cython, and will work from Python as well. It's been 
discussed before, basically it involves wrapping a loop in a closure, 
and then normal Python scoping rules applies.


cdef int n
@parallel
def _parallel_loop(parallel_env):
 cdef int i, s0, s1
 for s0,s1 in parallel_env.range(n):
 for i in range(s0,s1):
 pass

I am not happy about the verbosity of the wrapper compared to

for i in prange(n):
pass

but this is the best I can do without changing the compiler. Notice e.g. 
that the loop becomes two nested loops, which is required for efficient 
work scheduling.


Progress is mainly limited by lack of time and personal need. If I ned 
parallel computing I use Fortran or an optimized LAPACK library (e.g. ACML).


Sturla
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/04/2011 03:27 PM, Nathaniel Smith wrote:

On Mon, Apr 4, 2011 at 3:17 AM, Dag Sverre Seljebotn
  wrote:

  * A simple and friendly solution that covers, perhaps, 80% of the cases,
based on simply replacing range with prange.

This is a "merely" aesthetic objection, while remaining agnostic on
the larger discussion, but -- 'for i in prange(...)' looks Just Wrong.
This is not a regular loop over a funny range, it's a funny loop over
a regular range. Surely it should be 'pfor i in range(...)'. Or better
yet, spell it 'parallel_for'.


I don't mind calling it "parallel_for" myself, if only a good place to 
provide scheduling parameters (numthreads, dynamic vs. static 
scheduling, chunksize) can be found. That would make it more obvious 
that scoping rules are different too.


No sense in discussing this further until the higher-level discussion on 
whether to do it or not has completed though.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread mark florisson
On 4 April 2011 13:53, Dag Sverre Seljebotn  wrote:
> On 04/04/2011 01:23 PM, Stefan Behnel wrote:
>>
>> Dag Sverre Seljebotn, 04.04.2011 12:17:
>>>
>>> CEP up at http://wiki.cython.org/enhancements/prange
>>
>> """
>> Variable handling
>>
>> Rather than explicit declaration of shared/private variables we rely on
>> conventions:
>>
>>    * Thread-shared: Variables that are only read and not written in the
>> loop body are shared across threads. Variables that are only used in the
>> else block are considered shared as well.
>>
>>    * Thread-private: Variables that are assigned to in the loop body are
>> thread-private. Obviously, the iteration counter is thread-private as well.
>>
>>    * Reduction: Variables that only used on the LHS of an inplace
>> operator, such as s above, are marked as targets for reduction. If the
>> variable is also used in other ways (LHS of assignment or in an expression)
>> it does instead turn into a thread-private variable. Note: This means that
>> if one, e.g., inserts printf(... s) above, s is turned into a thread-local
>> variable. OTOH, there is simply no way to correctly emulate the effect
>> printf(... s) would have in a sequential loop, so such code must be
>> discouraged anyway.
>> """
>>
>> What about simply (ab-)using Python semantics and creating a new inner
>> scope for the prange loop body? That would basically make the loop behave
>> like a closure function, but with the looping header at the 'right' place
>> rather than after the closure.
>
> I'm not quite sure what the concrete changes to the CEP this would lead to
> (assuming you mean this as a proposal for alternative semantics, and not an
> implementation detail).
>
> How would we treat reduction variables? They need to be supported, and
> there's nothing in Python semantics to support reduction variables, they are
> a rather special case everywhere. I suppose keeping the reduction clause
> above, or use the "nonlocal" keyword in the loop body...
>
> Also there's the else:-block, although we could make that part of the scope.
> And the "lastprivate" functionality, although that could be dropped without
> much loss.
>
>>
>> Also, in the example, the local variable declaration of "tmp" outside of
>> the loop looks somewhat misplaced, although it's precedented by
>> comprehensions (which also have their own local scope in Cython).
>
> Well, depending on the decision of lastprivate, the declaration would need
> to be outside; I really like the idea of moving "cdef", and am prepared to
> drop lastprivate for this.
>
> Being explicit about thread-local variables does make things a lot safer to
> use.
>
> (One problem is that switching between serial and parallel one needs to move
> variable declarations. But that only happens once, and one can use
> "nthreads=1" to disable parallel after that.)
>
> An example would then be:
>
> def f(np.ndarray[double] x, double alpha):
>    cdef double s = 0, globtmp
>    with nogil:
>        for i in prange(x.shape[0]):
>            cdef double tmp # thread-private
>            tmp = alpha * i # alpha available from global scope
>            s += x[i] * tmp # still automatic reduction for inplace operators
>            # printf(...s) -> now leads to error, since s is not declared
> thread-private but is read
>        else:
>            # tmp still available here...looks a bit strange, but useful
>            s += tmp * 10
>            globtmp = tmp # we save tmp for later
>        # tmp not available here, globtmp is
>    return s
>
> Or, we just drop support for the else block on these loops.

I think since we are disallowing break (yet) we shouldn't support the
else clause. Basically, I think we can make the CEP a tad more simple.

I think we could declare everything outside of the prange body. Then,
in the prange loop body:

if a variable is assigned to anywhere -> make it lastprivate
- if a variable is read before assigned to -> make it
firstprivate in addition to lastprivate (raise compiler error if the
variable is not initialized outside of the loop body)

if a variable is only ever read -> make it shared (the default for OpenMP)

if a variable has an inplace operator -> make it a reduction

There is really no reason to disallow reading of the reduction
variable (in e.g. a printf). The reduction should also be initialized
outside of the prange body.

Then prange() could be implemented in pure mode as simply the
sequential version, i.e. range() which some more arguments.

For any scratch space buffers etc, I'd prefer something like


with cython.parallel:
cdef char *buf = malloc(100)

for i in prange(n):
use buf

free(buf)

At least it fits my brain pretty well :) (this code does however
assume that malloc is thread-safe).

Anyway, I'm not sure I just covered all cases, but what do you think?

> Dag Sverre
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/

Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/04/2011 03:04 PM, Stefan Behnel wrote:


That's what I thought, yes. It looks unexpected, sure. That's the 
clear advantage of using inner functions, which do not add anything 
new at all. But if we want to add something that looks more like a 
loop, we should at least make it behave like something that's easy to 
explain.


Sorry for not taking the opportunity to articulate my scepticism in 
the workshop discussion. Skipping through the CEP now, I think this 
feature adds quite some complexity to the language, and I'm not sure 
it's worth that when compared to the existing closures. The equivalent 
closure+decorator syntax is certainly easier to explain, and could 
translate into exactly the same code. But with the clear advantage 
that the scope of local, nonlocal and thread-configuring variables is 
immediately obvious.


Basically, your example would become

def f(np.ndarray[double] x, double alpha):
cdef double s = 0

with cython.nogil:
@cython.run_parallel_for_loop( range(x.shape[0]) )
cdef threaded_loop(i):# 'nogil' is inherited
cdef double tmp = alpha * i
nonlocal s
s += x[i] * tmp
s += alpha * (x.shape[0] - 1)
return s

We likely agree that this is not beautiful. It's also harder to 
implement than a "simple" for-in-prange loop. But I find it at least 
easier to explain and semantically 'obvious'. And it would allow us to 
write a pure mode implementation for this based on the threading module.


Short clarification on this example: There is still magic going on here 
in the reduction variable -- one must have a version of "s" for each 
thread, and then reduce at the end.


(Stefan: I realize that you may know this, I'm just making sure 
everything is stated clearly in this discussion.)


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/04/2011 05:22 PM, mark florisson wrote:

On 4 April 2011 13:53, Dag Sverre Seljebotn  wrote:

On 04/04/2011 01:23 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 04.04.2011 12:17:

CEP up at http://wiki.cython.org/enhancements/prange

"""
Variable handling

Rather than explicit declaration of shared/private variables we rely on
conventions:

* Thread-shared: Variables that are only read and not written in the
loop body are shared across threads. Variables that are only used in the
else block are considered shared as well.

* Thread-private: Variables that are assigned to in the loop body are
thread-private. Obviously, the iteration counter is thread-private as well.

* Reduction: Variables that only used on the LHS of an inplace
operator, such as s above, are marked as targets for reduction. If the
variable is also used in other ways (LHS of assignment or in an expression)
it does instead turn into a thread-private variable. Note: This means that
if one, e.g., inserts printf(... s) above, s is turned into a thread-local
variable. OTOH, there is simply no way to correctly emulate the effect
printf(... s) would have in a sequential loop, so such code must be
discouraged anyway.
"""

What about simply (ab-)using Python semantics and creating a new inner
scope for the prange loop body? That would basically make the loop behave
like a closure function, but with the looping header at the 'right' place
rather than after the closure.

I'm not quite sure what the concrete changes to the CEP this would lead to
(assuming you mean this as a proposal for alternative semantics, and not an
implementation detail).

How would we treat reduction variables? They need to be supported, and
there's nothing in Python semantics to support reduction variables, they are
a rather special case everywhere. I suppose keeping the reduction clause
above, or use the "nonlocal" keyword in the loop body...

Also there's the else:-block, although we could make that part of the scope.
And the "lastprivate" functionality, although that could be dropped without
much loss.


Also, in the example, the local variable declaration of "tmp" outside of
the loop looks somewhat misplaced, although it's precedented by
comprehensions (which also have their own local scope in Cython).

Well, depending on the decision of lastprivate, the declaration would need
to be outside; I really like the idea of moving "cdef", and am prepared to
drop lastprivate for this.

Being explicit about thread-local variables does make things a lot safer to
use.

(One problem is that switching between serial and parallel one needs to move
variable declarations. But that only happens once, and one can use
"nthreads=1" to disable parallel after that.)

An example would then be:

def f(np.ndarray[double] x, double alpha):
cdef double s = 0, globtmp
with nogil:
for i in prange(x.shape[0]):
cdef double tmp # thread-private
tmp = alpha * i # alpha available from global scope
s += x[i] * tmp # still automatic reduction for inplace operators
# printf(...s) ->  now leads to error, since s is not declared
thread-private but is read
else:
# tmp still available here...looks a bit strange, but useful
s += tmp * 10
globtmp = tmp # we save tmp for later
# tmp not available here, globtmp is
return s

Or, we just drop support for the else block on these loops.

I think since we are disallowing break (yet) we shouldn't support the
else clause. Basically, I think we can make the CEP a tad more simple.

I think we could declare everything outside of the prange body. Then,
in the prange loop body:

 if a variable is assigned to anywhere ->  make it lastprivate
 - if a variable is read before assigned to ->  make it
firstprivate in addition to lastprivate (raise compiler error if the
variable is not initialized outside of the loop body)

 if a variable is only ever read ->  make it shared (the default for OpenMP)

 if a variable has an inplace operator ->  make it a reduction

There is really no reason to disallow reading of the reduction
variable (in e.g. a printf). The reduction should also be initialized
outside of the prange body.


The reason for disallowing reading the reduction variable is that 
otherwise you have a contradiction above, since a reduction variable may 
also be a thread-local variable. Or, you disable inplace operators for 
thread-local variables? (ugh)


That's the main reason I'm leaning towards explicit declaring local 
variables using "cdef".


If we're reducing complexity BTW, I'd rather remove 
firstprivate/lastprivate alltogether, see below.



Then prange() could be implemented in pure mode as simply the
sequential version, i.e. range() which some more arguments.

For any scratch space buffers etc, I'd prefer something like


with cython.parallel:
 cdef char *buf = malloc(100)

 for i in prange(n):

Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread mark florisson
On 4 April 2011 19:18, Dag Sverre Seljebotn  wrote:
> On 04/04/2011 05:22 PM, mark florisson wrote:
>>
>> On 4 April 2011 13:53, Dag Sverre Seljebotn
>>  wrote:
>>>
>>> On 04/04/2011 01:23 PM, Stefan Behnel wrote:

 Dag Sverre Seljebotn, 04.04.2011 12:17:
>
> CEP up at http://wiki.cython.org/enhancements/prange

 """
 Variable handling

 Rather than explicit declaration of shared/private variables we rely on
 conventions:

    * Thread-shared: Variables that are only read and not written in the
 loop body are shared across threads. Variables that are only used in the
 else block are considered shared as well.

    * Thread-private: Variables that are assigned to in the loop body are
 thread-private. Obviously, the iteration counter is thread-private as
 well.

    * Reduction: Variables that only used on the LHS of an inplace
 operator, such as s above, are marked as targets for reduction. If the
 variable is also used in other ways (LHS of assignment or in an
 expression)
 it does instead turn into a thread-private variable. Note: This means
 that
 if one, e.g., inserts printf(... s) above, s is turned into a
 thread-local
 variable. OTOH, there is simply no way to correctly emulate the effect
 printf(... s) would have in a sequential loop, so such code must be
 discouraged anyway.
 """

 What about simply (ab-)using Python semantics and creating a new inner
 scope for the prange loop body? That would basically make the loop
 behave
 like a closure function, but with the looping header at the 'right'
 place
 rather than after the closure.
>>>
>>> I'm not quite sure what the concrete changes to the CEP this would lead
>>> to
>>> (assuming you mean this as a proposal for alternative semantics, and not
>>> an
>>> implementation detail).
>>>
>>> How would we treat reduction variables? They need to be supported, and
>>> there's nothing in Python semantics to support reduction variables, they
>>> are
>>> a rather special case everywhere. I suppose keeping the reduction clause
>>> above, or use the "nonlocal" keyword in the loop body...
>>>
>>> Also there's the else:-block, although we could make that part of the
>>> scope.
>>> And the "lastprivate" functionality, although that could be dropped
>>> without
>>> much loss.
>>>
 Also, in the example, the local variable declaration of "tmp" outside of
 the loop looks somewhat misplaced, although it's precedented by
 comprehensions (which also have their own local scope in Cython).
>>>
>>> Well, depending on the decision of lastprivate, the declaration would
>>> need
>>> to be outside; I really like the idea of moving "cdef", and am prepared
>>> to
>>> drop lastprivate for this.
>>>
>>> Being explicit about thread-local variables does make things a lot safer
>>> to
>>> use.
>>>
>>> (One problem is that switching between serial and parallel one needs to
>>> move
>>> variable declarations. But that only happens once, and one can use
>>> "nthreads=1" to disable parallel after that.)
>>>
>>> An example would then be:
>>>
>>> def f(np.ndarray[double] x, double alpha):
>>>    cdef double s = 0, globtmp
>>>    with nogil:
>>>        for i in prange(x.shape[0]):
>>>            cdef double tmp # thread-private
>>>            tmp = alpha * i # alpha available from global scope
>>>            s += x[i] * tmp # still automatic reduction for inplace
>>> operators
>>>            # printf(...s) ->  now leads to error, since s is not declared
>>> thread-private but is read
>>>        else:
>>>            # tmp still available here...looks a bit strange, but useful
>>>            s += tmp * 10
>>>            globtmp = tmp # we save tmp for later
>>>        # tmp not available here, globtmp is
>>>    return s
>>>
>>> Or, we just drop support for the else block on these loops.
>>
>> I think since we are disallowing break (yet) we shouldn't support the
>> else clause. Basically, I think we can make the CEP a tad more simple.
>>
>> I think we could declare everything outside of the prange body. Then,
>> in the prange loop body:
>>
>>     if a variable is assigned to anywhere ->  make it lastprivate
>>         - if a variable is read before assigned to ->  make it
>> firstprivate in addition to lastprivate (raise compiler error if the
>> variable is not initialized outside of the loop body)
>>
>>     if a variable is only ever read ->  make it shared (the default for
>> OpenMP)
>>
>>     if a variable has an inplace operator ->  make it a reduction
>>
>> There is really no reason to disallow reading of the reduction
>> variable (in e.g. a printf). The reduction should also be initialized
>> outside of the prange body.
>
> The reason for disallowing reading the reduction variable is that otherwise
> you have a contradiction above, since a reduction variable may also be a
> thread-local variable. Or, you disable inplace operators

Re: [Cython] problem building master with python3

2011-04-04 Thread Vitja Makarov
2011/4/4 Darren Dale :
> On Mon, Apr 4, 2011 at 3:32 PM, Darren Dale  wrote:
>> I'm attempting to install cython from the git repository to benefit
>> from this fix: http://trac.cython.org/cython_trac/ticket/597 . When I
>> run "python3 setup.py install --user", I get an error:
>>
>> cythoning /Users/darren/Projects/cython/Cython/Compiler/Code.py to
>> /Users/darren/Projects/cython/Cython/Compiler/Code.c
>>
>> Error compiling Cython file:
>> 
>> ...
>>        self.cname = cname
>>        self.text = text
>>        self.escaped_value = StringEncoding.escape_byte_string(byte_string)
>>        self.py_strings = None
>>
>>    def get_py_string_const(self, encoding, identifier=None, is_str=False):
>>   ^
>> 
>>
>> Cython/Compiler/Code.py:320:4: Signature not compatible with previous
>> declaration
>>
>> Error compiling Cython file:
>> 
>> ...
>>    cdef public object text
>>    cdef public object escaped_value
>>    cdef public dict py_strings
>>
>>    @cython.locals(intern=bint, is_str=bint, is_unicode=bint)
>>    cpdef get_py_string_const(self, encoding, identifier=*, is_str=*)
>>                             ^
>> 
>>
>> Cython/Compiler/Code.pxd:64:30: Previous declaration is here
>> building 'Cython.Compiler.Code' extension
>> /usr/bin/gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g
>> -fwrapv -O3 -Wall -Wstrict-prototypes -O2
>> -I/opt/local/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m
>> -c /Users/darren/Projects/cython/Cython/Compiler/Code.c -o
>> build/temp.macosx-10.6-x86_64-3.2/Users/darren/Projects/cython/Cython/Compiler/Code.o
>> /Users/darren/Projects/cython/Cython/Compiler/Code.c:1:2: error:
>> #error Do not use this file, it is the result of a failed Cython
>> compilation.
>> error: command '/usr/bin/gcc-4.2' failed with exit status 1
>>
>
> Actually, I get this same error when I try to build with python-2.7 as well.
>
> Darren

This one fails too :(

Generators branch is okay. But upstream after merge isn't :(

vitja@vitja-laptop:~/work/cython.git$ cat ttt.py
def foo(is_str=False):
pass
vitja@vitja-laptop:~/work/cython.git$ cat ttt.pxd
cimport cython

@cython.locals(is_str=cython.bint)
cdef foo(is_str=*)





-- 
vitja.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/04/2011 09:26 PM, mark florisson wrote:

On 4 April 2011 19:18, Dag Sverre Seljebotn  wrote:

On 04/04/2011 05:22 PM, mark florisson wrote:

On 4 April 2011 13:53, Dag Sverre Seljebotn
  wrote:

On 04/04/2011 01:23 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 04.04.2011 12:17:

CEP up at http://wiki.cython.org/enhancements/prange

"""
Variable handling

Rather than explicit declaration of shared/private variables we rely on
conventions:

* Thread-shared: Variables that are only read and not written in the
loop body are shared across threads. Variables that are only used in the
else block are considered shared as well.

* Thread-private: Variables that are assigned to in the loop body are
thread-private. Obviously, the iteration counter is thread-private as
well.

* Reduction: Variables that only used on the LHS of an inplace
operator, such as s above, are marked as targets for reduction. If the
variable is also used in other ways (LHS of assignment or in an
expression)
it does instead turn into a thread-private variable. Note: This means
that
if one, e.g., inserts printf(... s) above, s is turned into a
thread-local
variable. OTOH, there is simply no way to correctly emulate the effect
printf(... s) would have in a sequential loop, so such code must be
discouraged anyway.
"""

What about simply (ab-)using Python semantics and creating a new inner
scope for the prange loop body? That would basically make the loop
behave
like a closure function, but with the looping header at the 'right'
place
rather than after the closure.

I'm not quite sure what the concrete changes to the CEP this would lead
to
(assuming you mean this as a proposal for alternative semantics, and not
an
implementation detail).

How would we treat reduction variables? They need to be supported, and
there's nothing in Python semantics to support reduction variables, they
are
a rather special case everywhere. I suppose keeping the reduction clause
above, or use the "nonlocal" keyword in the loop body...

Also there's the else:-block, although we could make that part of the
scope.
And the "lastprivate" functionality, although that could be dropped
without
much loss.


Also, in the example, the local variable declaration of "tmp" outside of
the loop looks somewhat misplaced, although it's precedented by
comprehensions (which also have their own local scope in Cython).

Well, depending on the decision of lastprivate, the declaration would
need
to be outside; I really like the idea of moving "cdef", and am prepared
to
drop lastprivate for this.

Being explicit about thread-local variables does make things a lot safer
to
use.

(One problem is that switching between serial and parallel one needs to
move
variable declarations. But that only happens once, and one can use
"nthreads=1" to disable parallel after that.)

An example would then be:

def f(np.ndarray[double] x, double alpha):
cdef double s = 0, globtmp
with nogil:
for i in prange(x.shape[0]):
cdef double tmp # thread-private
tmp = alpha * i # alpha available from global scope
s += x[i] * tmp # still automatic reduction for inplace
operators
# printf(...s) ->now leads to error, since s is not declared
thread-private but is read
else:
# tmp still available here...looks a bit strange, but useful
s += tmp * 10
globtmp = tmp # we save tmp for later
# tmp not available here, globtmp is
return s

Or, we just drop support for the else block on these loops.

I think since we are disallowing break (yet) we shouldn't support the
else clause. Basically, I think we can make the CEP a tad more simple.

I think we could declare everything outside of the prange body. Then,
in the prange loop body:

 if a variable is assigned to anywhere ->make it lastprivate
 - if a variable is read before assigned to ->make it
firstprivate in addition to lastprivate (raise compiler error if the
variable is not initialized outside of the loop body)

 if a variable is only ever read ->make it shared (the default for
OpenMP)

 if a variable has an inplace operator ->make it a reduction

There is really no reason to disallow reading of the reduction
variable (in e.g. a printf). The reduction should also be initialized
outside of the prange body.

The reason for disallowing reading the reduction variable is that otherwise
you have a contradiction above, since a reduction variable may also be a
thread-local variable. Or, you disable inplace operators for thread-local
variables? (ugh)

Yes, an inplace operator would make it a reduction variable, just like
assigning something makes it lastprivate, only reading makes it shared
and reading before writing makes it firstprivate in addition to
lastprivate. This is all implicit.

Alternatively, if you want it more explicit, then instead of the
inplace operator you could allow something like

 sum = cyt

Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Greg Ewing

Nathaniel Smith wrote:

Surely it should be 'pfor i in range(...)'.


Or 'pfhor', just to let you know it's really something out of
this world.

http://marathongame.wikia.com/wiki/Pfhor_%28Race%29

--
Greg
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Robert Bradshaw
On Mon, Apr 4, 2011 at 6:04 AM, Stefan Behnel  wrote:
> Dag Sverre Seljebotn, 04.04.2011 13:53:
>>
>> On 04/04/2011 01:23 PM, Stefan Behnel wrote:
>>>
>>> Dag Sverre Seljebotn, 04.04.2011 12:17:

 CEP up at http://wiki.cython.org/enhancements/prange
>>>
>>> """
>>> Variable handling
>>>
>>> Rather than explicit declaration of shared/private variables we rely on
>>> conventions:
>>>
>>> * Thread-shared: Variables that are only read and not written in the loop
>>> body are shared across threads. Variables that are only used in the else
>>> block are considered shared as well.
>>>
>>> * Thread-private: Variables that are assigned to in the loop body are
>>> thread-private. Obviously, the iteration counter is thread-private as
>>> well.
>>>
>>> * Reduction: Variables that only used on the LHS of an inplace operator,
>>> such as s above, are marked as targets for reduction. If the variable is
>>> also used in other ways (LHS of assignment or in an expression) it does
>>> instead turn into a thread-private variable. Note: This means that if
>>> one, e.g., inserts printf(... s) above, s is turned into a thread-local
>>> variable. OTOH, there is simply no way to correctly emulate the effect
>>> printf(... s) would have in a sequential loop, so such code must be
>>> discouraged anyway.
>>> """
>>>
>>> What about simply (ab-)using Python semantics and creating a new inner
>>> scope for the prange loop body? That would basically make the loop behave
>>> like a closure function, but with the looping header at the 'right' place
>>> rather than after the closure.
>>
>> I'm not quite sure what the concrete changes to the CEP this would lead to
>> (assuming you mean this as a proposal for alternative semantics, and not
>> an
>> implementation detail).
>
> What I would like to avoid is having to tell users "and now for something
> completely different". It looks like a loop, but then there's a whole page
> of new semantics for it. And this also cannot be used in plain Python code
> due to the differing scoping behaviour.

The same could be said of OpenMP--it looks exactly like a loop except
for a couple of pragmas.

The proposed (as I'm reading the CEP now) semantics of what's shared
and first/last private and reduction would give it the semantics of a
normal, sequential loop (and if your final result changes based on how
many threads were involved then you've got incorrect code). Perhaps
reading of the reduction variable could be fine (though obviously
ill-defined, suitable only for debugging).

>> How would we treat reduction variables? They need to be supported, and
>> there's nothing in Python semantics to support reduction variables, they
>> are a rather special case everywhere. I suppose keeping the reduction
>> clause above, or use the "nonlocal" keyword in the loop body...
>
> That's what I thought, yes. It looks unexpected, sure. That's the clear
> advantage of using inner functions, which do not add anything new at all.
> But if we want to add something that looks more like a loop, we should at
> least make it behave like something that's easy to explain.
>
> Sorry for not taking the opportunity to articulate my scepticism in the
> workshop discussion. Skipping through the CEP now, I think this feature adds
> quite some complexity to the language, and I'm not sure it's worth that when
> compared to the existing closures. The equivalent closure+decorator syntax
> is certainly easier to explain, and could translate into exactly the same
> code. But with the clear advantage that the scope of local, nonlocal and
> thread-configuring variables is immediately obvious.
>
> Basically, your example would become
>
> def f(np.ndarray[double] x, double alpha):
>    cdef double s = 0
>
>    with cython.nogil:
>        @cython.run_parallel_for_loop( range(x.shape[0]) )
>        cdef threaded_loop(i):    # 'nogil' is inherited
>            cdef double tmp = alpha * i
>            nonlocal s
>            s += x[i] * tmp
>        s += alpha * (x.shape[0] - 1)
>    return s
>
> We likely agree that this is not beautiful. It's also harder to implement
> than a "simple" for-in-prange loop. But I find it at least easier to explain
> and semantically 'obvious'. And it would allow us to write a pure mode
> implementation for this based on the threading module.

I'm not opposed to having something like this, it's a whole lot of
code and extra refactoring for the basic usecase. I think a nice,
clean syntax is worthwhile and requires at lest some level of language
support. In some ways it's like buffer support--what goes on under the
hood does take some explaining, but most of the time it works as
expected (i.e. as if you hadn't declared the type), just faster. The
inner workings of prange may be a bit magical, but the intent is not,
and the latter is what users care about.

- Robert
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-deve

Re: [Cython] Interest in contributing to the project

2011-04-04 Thread Arthur de Souza Ribeiro
Thanks for clarification Sturla, that's just the way I was thinking about
some things...

I realized that you used some C code that is in _math.h header file, I mean,
I was thinking that in the project I should rewrite code that belongs to
this file too right?

I started coding but I got stucked in functions like Py_Is_Infinite and
Py_Is_NaN... I saw Sturla e-mail but I thought this would be wrote in a
different way, was I wrong?

I also started to write a proposal for this project and hope to publish it
here tomorrow for your evaluation.

Another point that I'm thinking about is how the profile results should be
organized. Is there any template for this?

Best Regards

[]s

Arthur

2011/4/3 Sturla Molden 

> Den 04.04.2011 01:49, skrev Sturla Molden:
>
>  Also observe that we do not release the GIL here. That is not because
>> these functions are not thread-safe, they are, but yielding the GIL will
>> slow things terribly.
>>
>
> Oh, actually they are not thread-safe because we set errno... Sorry.
>
> Sturla
>
>
>
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/05/2011 07:05 AM, Robert Bradshaw wrote:

On Mon, Apr 4, 2011 at 6:04 AM, Stefan Behnel  wrote:

Dag Sverre Seljebotn, 04.04.2011 13:53:

On 04/04/2011 01:23 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 04.04.2011 12:17:

CEP up at http://wiki.cython.org/enhancements/prange

"""
Variable handling

Rather than explicit declaration of shared/private variables we rely on
conventions:

* Thread-shared: Variables that are only read and not written in the loop
body are shared across threads. Variables that are only used in the else
block are considered shared as well.

* Thread-private: Variables that are assigned to in the loop body are
thread-private. Obviously, the iteration counter is thread-private as
well.

* Reduction: Variables that only used on the LHS of an inplace operator,
such as s above, are marked as targets for reduction. If the variable is
also used in other ways (LHS of assignment or in an expression) it does
instead turn into a thread-private variable. Note: This means that if
one, e.g., inserts printf(... s) above, s is turned into a thread-local
variable. OTOH, there is simply no way to correctly emulate the effect
printf(... s) would have in a sequential loop, so such code must be
discouraged anyway.
"""

What about simply (ab-)using Python semantics and creating a new inner
scope for the prange loop body? That would basically make the loop behave
like a closure function, but with the looping header at the 'right' place
rather than after the closure.

I'm not quite sure what the concrete changes to the CEP this would lead to
(assuming you mean this as a proposal for alternative semantics, and not
an
implementation detail).

What I would like to avoid is having to tell users "and now for something
completely different". It looks like a loop, but then there's a whole page
of new semantics for it. And this also cannot be used in plain Python code
due to the differing scoping behaviour.

The same could be said of OpenMP--it looks exactly like a loop except
for a couple of pragmas.

The proposed (as I'm reading the CEP now) semantics of what's shared
and first/last private and reduction would give it the semantics of a
normal, sequential loop (and if your final result changes based on how
many threads were involved then you've got incorrect code). Perhaps
reading of the reduction variable could be fine (though obviously
ill-defined, suitable only for debugging).


So would you disable inplace operators for thread-private variables? 
Otherwise a variable could be both a reduction variable and 
thread-private...


There's a reason I disabled reading the reduction variable (which I 
should have written down).


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel