[Cython] Another CEP: Parallel block

2011-04-05 Thread Dag Sverre Seljebotn
There's a (much shorter) proposal for a more explicit parallelism 
construct at


http://wiki.cython.org/enhancements/parallelblock

This is a little more verbose for the simplest case, but makes the 
medium-cases that needs work buffers much simpler, and is also more 
explicit and difficult to get wrong.


I am not sure myself which one I prefer of this and prange.

Justification for Cython-specific syntax: This is something that is 
really only useful if you can release the GIL *outside* of the loop. So 
I feel this is an area where a custom Cython solution is natural, sort 
of like "cdef extern", and the buffer access.


Since a similar pure-Python solution is rather useless, I also think 
there's less incentive for making something that works well in 
pure-Python mode.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Another CEP: Parallel block

2011-04-05 Thread mark florisson
On 5 April 2011 09:21, Dag Sverre Seljebotn  wrote:
> There's a (much shorter) proposal for a more explicit parallelism construct
> at
>
> http://wiki.cython.org/enhancements/parallelblock
>
> This is a little more verbose for the simplest case, but makes the
> medium-cases that needs work buffers much simpler, and is also more explicit
> and difficult to get wrong.

I actually think your else block really complicates matters. In this
example even your index variable is not well-defined right after the
loop, because it's not "declared lastprivate through the else block".
There is really no reason to make variables private instead of
lastprivate (and additionally firstprivate if needed) by default.

I think we should allow at least both options, so if the variables are
declared in the parallel nogil block they can only be used inside that
block (but are still lastprivate, as the first loop may be followed by
other code). But the user will also still be able to declare and
define stuff outside of the block and omit the with parallel block
entirely.

And again, you will want something like cython.parallel.range instead
of just range, as you will want to pass scheduling parameters to the
range(), and not the parallel.

So e.g. you can still write something like this:

cdef Py_ssize_t i
for i in cython.parallel.range(..., schedule='dynamic', nogil=True):
do something

print i # i is well-defined here

My point is, implicit first- and lastprivate can be implicit because
it works the exact same way as the sequential python version does. The
only remaining pitfall is the in-place operator which declares a
reduction.

> I am not sure myself which one I prefer of this and prange.
>
> Justification for Cython-specific syntax: This is something that is really
> only useful if you can release the GIL *outside* of the loop. So I feel this
> is an area where a custom Cython solution is natural, sort of like "cdef
> extern", and the buffer access.
>
> Since a similar pure-Python solution is rather useless, I also think there's
> less incentive for making something that works well in pure-Python mode.

Which feature is Cython specific here? The 'with a, b as c:' thing?

> Dag Sverre
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Another CEP: Parallel block

2011-04-05 Thread Stefan Behnel

mark florisson, 05.04.2011 10:26:

On 5 April 2011 09:21, Dag Sverre Seljebotn wrote:

Justification for Cython-specific syntax: This is something that is really
only useful if you can release the GIL *outside* of the loop. So I feel this
is an area where a custom Cython solution is natural, sort of like "cdef
extern", and the buffer access.

Since a similar pure-Python solution is rather useless, I also think there's
less incentive for making something that works well in pure-Python mode.


Which feature is Cython specific here? The 'with a, b as c:' thing?


No, the syntax is just Python. It's the scoping that's Cython specific, 
including the local variable declarations inside of the "with" block.


Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Another CEP: Parallel block

2011-04-05 Thread mark florisson
On 5 April 2011 10:34, Stefan Behnel  wrote:
> mark florisson, 05.04.2011 10:26:
>>
>> On 5 April 2011 09:21, Dag Sverre Seljebotn wrote:
>>>
>>> Justification for Cython-specific syntax: This is something that is
>>> really
>>> only useful if you can release the GIL *outside* of the loop. So I feel
>>> this
>>> is an area where a custom Cython solution is natural, sort of like "cdef
>>> extern", and the buffer access.
>>>
>>> Since a similar pure-Python solution is rather useless, I also think
>>> there's
>>> less incentive for making something that works well in pure-Python mode.
>>
>> Which feature is Cython specific here? The 'with a, b as c:' thing?
>
> No, the syntax is just Python. It's the scoping that's Cython specific,
> including the local variable declarations inside of the "with" block.

Hmm, but you can use cython.declare() for that, no?

> Stefan
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Another CEP: Parallel block

2011-04-05 Thread mark florisson
On 5 April 2011 10:44, mark florisson  wrote:
> On 5 April 2011 10:34, Stefan Behnel  wrote:
>> mark florisson, 05.04.2011 10:26:
>>>
>>> On 5 April 2011 09:21, Dag Sverre Seljebotn wrote:

 Justification for Cython-specific syntax: This is something that is
 really
 only useful if you can release the GIL *outside* of the loop. So I feel
 this
 is an area where a custom Cython solution is natural, sort of like "cdef
 extern", and the buffer access.

 Since a similar pure-Python solution is rather useless, I also think
 there's
 less incentive for making something that works well in pure-Python mode.
>>>
>>> Which feature is Cython specific here? The 'with a, b as c:' thing?
>>
>> No, the syntax is just Python. It's the scoping that's Cython specific,
>> including the local variable declarations inside of the "with" block.
>
> Hmm, but you can use cython.declare() for that, no?

(disregarding the malloc() and pointer arithmetic, of course :)

>> Stefan
>> ___
>> cython-devel mailing list
>> cython-devel@python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Another CEP: Parallel block

2011-04-05 Thread Stefan Behnel

mark florisson, 05.04.2011 10:44:

On 5 April 2011 10:34, Stefan Behnel wrote:

mark florisson, 05.04.2011 10:26:


On 5 April 2011 09:21, Dag Sverre Seljebotn wrote:


Justification for Cython-specific syntax: This is something that is
really
only useful if you can release the GIL *outside* of the loop. So I feel
this
is an area where a custom Cython solution is natural, sort of like "cdef
extern", and the buffer access.

Since a similar pure-Python solution is rather useless, I also think
there's
less incentive for making something that works well in pure-Python mode.


Which feature is Cython specific here? The 'with a, b as c:' thing?


No, the syntax is just Python. It's the scoping that's Cython specific,
including the local variable declarations inside of the "with" block.


Hmm, but you can use cython.declare() for that, no?


cython.declare() is a no-op (or just a plain assignment) in Python. But the 
thread-local scoping of these variables cannot be emulated in Python. So 
this would be a feature that cannot be used in pure Python mode, unlike 
closures.


Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Another CEP: Parallel block

2011-04-05 Thread mark florisson
On 5 April 2011 11:01, Stefan Behnel  wrote:
> mark florisson, 05.04.2011 10:44:
>>
>> On 5 April 2011 10:34, Stefan Behnel wrote:
>>>
>>> mark florisson, 05.04.2011 10:26:

 On 5 April 2011 09:21, Dag Sverre Seljebotn wrote:
>
> Justification for Cython-specific syntax: This is something that is
> really
> only useful if you can release the GIL *outside* of the loop. So I feel
> this
> is an area where a custom Cython solution is natural, sort of like
> "cdef
> extern", and the buffer access.
>
> Since a similar pure-Python solution is rather useless, I also think
> there's
> less incentive for making something that works well in pure-Python
> mode.

 Which feature is Cython specific here? The 'with a, b as c:' thing?
>>>
>>> No, the syntax is just Python. It's the scoping that's Cython specific,
>>> including the local variable declarations inside of the "with" block.
>>
>> Hmm, but you can use cython.declare() for that, no?
>
> cython.declare() is a no-op (or just a plain assignment) in Python. But the
> thread-local scoping of these variables cannot be emulated in Python. So
> this would be a feature that cannot be used in pure Python mode, unlike
> closures.

Sure, but the Python version would just be serial, it wouldn't use
threads at all. That's the great thing about OpenMP's philosophy is
that it can be either serial or parallel, the only difference is
speed. If you want speed, use Cython.

> Stefan
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Another CEP: Parallel block

2011-04-05 Thread Dag Sverre Seljebotn

On 04/05/2011 11:01 AM, Stefan Behnel wrote:

mark florisson, 05.04.2011 10:44:

On 5 April 2011 10:34, Stefan Behnel wrote:

mark florisson, 05.04.2011 10:26:


On 5 April 2011 09:21, Dag Sverre Seljebotn wrote:


Justification for Cython-specific syntax: This is something that is
really
only useful if you can release the GIL *outside* of the loop. So I 
feel

this
is an area where a custom Cython solution is natural, sort of like 
"cdef

extern", and the buffer access.

Since a similar pure-Python solution is rather useless, I also think
there's
less incentive for making something that works well in pure-Python 
mode.


Which feature is Cython specific here? The 'with a, b as c:' thing?


No, the syntax is just Python. It's the scoping that's Cython specific,
including the local variable declarations inside of the "with" block.


Hmm, but you can use cython.declare() for that, no?


cython.declare() is a no-op (or just a plain assignment) in Python. 
But the thread-local scoping of these variables cannot be emulated in 
Python. So this would be a feature that cannot be used in pure Python 
mode, unlike closures.


The intention of prange was certainly to fall back to a normal 
single-threaded range in Python mode.


Because of the GIL there would rarely be any benefit in running the loop 
in parallel -- only if you immediately dispatch to a long-running task 
that itself releases the GIL, but in those cases you should rather stick 
to pure Python in the first place and not bother with prange.


I think the chance of seeing real-life code that both requires prange to 
run optimally in Cython, and that would not be made slower by more than 
one thread in Python, is pretty close to zero.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread Stefan Behnel

mark florisson, 04.04.2011 21:26:

For clarity, I'll add an example:

def f(np.ndarray[double] x, double alpha):
 cdef double s = 0
 cdef double tmp = 2
 cdef double other = 6.6

 with nogil:
 for i in prange(x.shape[0]):
 # reading 'tmp' makes it firstprivate in addition to lastprivate
 # 'other' is only ever read, so it's shared
 printf("%lf %lf %lf\n", tmp, s, other)


So, adding a printf() to your code can change the semantics of your 
variables? That sounds like a really bad design to me.


Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread mark florisson
On 5 April 2011 12:51, Stefan Behnel  wrote:
> mark florisson, 04.04.2011 21:26:
>>
>> For clarity, I'll add an example:
>>
>> def f(np.ndarray[double] x, double alpha):
>>     cdef double s = 0
>>     cdef double tmp = 2
>>     cdef double other = 6.6
>>
>>     with nogil:
>>         for i in prange(x.shape[0]):
>>             # reading 'tmp' makes it firstprivate in addition to
>> lastprivate
>>             # 'other' is only ever read, so it's shared
>>             printf("%lf %lf %lf\n", tmp, s, other)
>
> So, adding a printf() to your code can change the semantics of your
> variables? That sounds like a really bad design to me.

I agree, I think we should refrain from the firstprivate() entirely,
as it wouldn't have the same semantics as serial execution (as 'tmp'
would have the original value with parallel execution and the value
from previous iterations with serial execution). So basically we
should allow reading of private variables only after they are assigned
to in the loop body.

> Stefan
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread Pauli Virtanen
Mon, 04 Apr 2011 21:26:34 +0200, mark florisson wrote:
[clip]
> For clarity, I'll add an example:
[clip]

How about making all the special declarations explicit? The automatic 
inference of variables has a problem in that a small change in a part of 
the code can have somewhat unintuitive non-local effects, as the private/
shared/reduction status of the variable changes in the whole function 
scope (if Python scoping is retained).

Like so with explicit declarations:

def f(np.ndarray[double] x, double alpha):
cdef double alpha = 6.6
cdef char *ptr = something()

# Parallel variables are declared beforehand;
# the exact syntax could also be something else
cdef cython.parallel.private[int] tmp = 2, tmp2
cdef cython.parallel.reduction[int] s = 0

# Act like ordinary cdef outside prange(); in the prange they are
# firstprivate if initialized or written to outside the loop anywhere
# in the scope. Or, they could be firstprivate always, if this
# has a negligible performance impact.
tmp = 3

with nogil:
s = 9

for i in prange(x.shape[0]):
if cython.parallel.first_iteration(i):
# whatever initialization; Cython is in principle allowed
# to move this outside the loop, at least if it is
# the first thing here
pass

# tmp2 is not firstprivate, as it's not written to outside
# the loop body; also, it's also not lastprivate as it's not
# read outside the loop
tmp2 = 99

# Increment a private variable
tmp += 2*tmp

# Add stuff to reduction
s += alpha*i

# The following raise a compilation error -- the reduction
# variable cannot be assigned to, and can be only operated on
# with only a single reduction operation inside prange
s *= 9
s = 8

# It can be read, however, provided openmp supports this
tmp = s

# Assignment to non-private variables causes a compile-time
# error; this avoids common mistakes, such as forgetting to
# declare the reduction variable.
alpha += 42
alpha123 = 9
ptr = 94

# These, however, need to be allowed:
# the users are on their own to make sure they don't clobber
# non-local variables
x[i] = 123
(ptr + i)[0] = 123
some_routine(x, ptr, i)
else:
# private variables are lastprivate if read outside the loop
foo = tmp

# The else: block can be added, but actually has no effect
# as it is always executed --- the code here could as well
# be written after the for loop
foo = tmp  # <- same result

with nogil:
# Suppose Cython allowed cdef inside blocks with usual scoping
# rules
cdef cython.parallel.reduction[double] r = 0

# the same variables can be used again in a second parallel loop
for i in prange(x.shape[0]):
r += 1.5
s -= i
tmp = 9

# also the iteration variable is available after the loop
count = i

# As per usual Cython scoping rules
return r, s

What did I miss here? As far as I see, the above would have the same 
semantics and scoping as a single-threaded Python implementation.

The only change required to make things parallel is replacing range() by 
prange() and adding the variable declarations.

-- 
Pauli Virtanen

___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread Pauli Virtanen
Tue, 05 Apr 2011 12:55:36 +, Pauli Virtanen wrote:
[clip]
> # Assignment to non-private variables causes a compile-time
> # error; this avoids common mistakes, such as forgetting to
> # declare the reduction variable.
> alpha += 42
> alpha123 = 9
> ptr = 94

Actually, I'm not sure this is absolutely necessary -- life is tough, 
especially if you are programming in parallel, and there are limits to 
hand-holding.

However, an explicit declaration could be added for turning the error off 
for the (rare) cases where this makes sense (e.g. setting a shared flag)

cdef cython.parallel.shared[double] some_flag

-- 
Pauli Virtanen

___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread mark florisson
On 5 April 2011 14:55, Pauli Virtanen  wrote:
>
> Mon, 04 Apr 2011 21:26:34 +0200, mark florisson wrote:
> [clip]
> > For clarity, I'll add an example:
> [clip]
>
> How about making all the special declarations explicit? The automatic
> inference of variables has a problem in that a small change in a part of
> the code can have somewhat unintuitive non-local effects, as the private/
> shared/reduction status of the variable changes in the whole function
> scope (if Python scoping is retained).
>
> Like so with explicit declarations:
>
> def f(np.ndarray[double] x, double alpha):
>    cdef double alpha = 6.6
>    cdef char *ptr = something()
>
>    # Parallel variables are declared beforehand;
>    # the exact syntax could also be something else
>    cdef cython.parallel.private[int] tmp = 2, tmp2
>    cdef cython.parallel.reduction[int] s = 0
>
>    # Act like ordinary cdef outside prange(); in the prange they are
>    # firstprivate if initialized or written to outside the loop anywhere
>    # in the scope. Or, they could be firstprivate always, if this
>    # has a negligible performance impact.
>    tmp = 3

The problem with firstprivate() is that it doesn't give you the same
semantics as in the sequential version. That's why I think it would be
best to forget about firstprivate entirely and allow reading of
private variables only after they are assigned to in the loop body.

>
>    with nogil:
>        s = 9
>
>        for i in prange(x.shape[0]):
>            if cython.parallel.first_iteration(i):
>                # whatever initialization; Cython is in principle allowed
>                # to move this outside the loop, at least if it is
>                # the first thing here
>                pass

For this I prefer the aforementioned 'with cython.parallel:' block.

>
>            # tmp2 is not firstprivate, as it's not written to outside
>            # the loop body; also, it's also not lastprivate as it's not
>            # read outside the loop
>            tmp2 = 99
>
>            # Increment a private variable
>            tmp += 2*tmp
>
>            # Add stuff to reduction
>            s += alpha*i
>
>            # The following raise a compilation error -- the reduction
>            # variable cannot be assigned to, and can be only operated on
>            # with only a single reduction operation inside prange
>            s *= 9
>            s = 8

I think OpenMP allows arbitrary assignments and expressions to the
reduction variable, all the spec says "usually it will be of the form
'x = ...'".

>
>            # It can be read, however, provided openmp supports this
>            tmp = s
>
>            # Assignment to non-private variables causes a compile-time
>            # error; this avoids common mistakes, such as forgetting to
>            # declare the reduction variable.
>            alpha += 42
>            alpha123 = 9
>            ptr = 94
>
>            # These, however, need to be allowed:
>            # the users are on their own to make sure they don't clobber
>            # non-local variables
>            x[i] = 123
>            (ptr + i)[0] = 123
>            some_routine(x, ptr, i)

Indeed. They could be either shared or firstprivate (as the pointer
would be firstprivate, and not the entire array, unless it was
declared as a C array of certain size).

>        else:
>            # private variables are lastprivate if read outside the loop
>            foo = tmp
>
>        # The else: block can be added, but actually has no effect
>        # as it is always executed --- the code here could as well
>        # be written after the for loop
>        foo = tmp  # <- same result
>
>    with nogil:
>        # Suppose Cython allowed cdef inside blocks with usual scoping
>        # rules
>        cdef cython.parallel.reduction[double] r = 0
>
>        # the same variables can be used again in a second parallel loop
>        for i in prange(x.shape[0]):
>            r += 1.5
>            s -= i
>            tmp = 9
>
>        # also the iteration variable is available after the loop
>        count = i
>
>    # As per usual Cython scoping rules
>    return r, s
>
> What did I miss here? As far as I see, the above would have the same
> semantics and scoping as a single-threaded Python implementation.
>
> The only change required to make things parallel is replacing range() by
> prange() and adding the variable declarations.

Basically, I like your approach. It's only slightly more verbose as
the implicit way, as you need to declare the type of each variable
anyway.

I also still like the implicit way, but it has a couple of problems:
 - inplace operators suddenly declare a reduction
 - assigning to a variable has implicit (last)private semantics,
whereas assigning to an element in a buffer has shared semantics

Your explicit version solves both these problems. So I'm +1.

> --
> Pauli Virtanen
>
> ___
> cython-devel mailing list
> cyt

Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread mark florisson
On 5 April 2011 15:10, Pauli Virtanen  wrote:
> Tue, 05 Apr 2011 12:55:36 +, Pauli Virtanen wrote:
> [clip]
>>             # Assignment to non-private variables causes a compile-time
>>             # error; this avoids common mistakes, such as forgetting to
>>             # declare the reduction variable.
>>             alpha += 42
>>             alpha123 = 9
>>             ptr = 94
>
> Actually, I'm not sure this is absolutely necessary -- life is tough,
> especially if you are programming in parallel, and there are limits to
> hand-holding.
>
> However, an explicit declaration could be added for turning the error off
> for the (rare) cases where this makes sense (e.g. setting a shared flag)
>
>        cdef cython.parallel.shared[double] some_flag

I think that unless we add support for critical, single or master
sections, or the atomic construct, we should also disallow assigning
to shared variables entirely.

> --
> Pauli Virtanen
>
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread Robert Bradshaw
On Tue, Apr 5, 2011 at 3:51 AM, Stefan Behnel  wrote:
> mark florisson, 04.04.2011 21:26:
>>
>> For clarity, I'll add an example:
>>
>> def f(np.ndarray[double] x, double alpha):
>>     cdef double s = 0
>>     cdef double tmp = 2
>>     cdef double other = 6.6
>>
>>     with nogil:
>>         for i in prange(x.shape[0]):
>>             # reading 'tmp' makes it firstprivate in addition to
>> lastprivate
>>             # 'other' is only ever read, so it's shared
>>             printf("%lf %lf %lf\n", tmp, s, other)
>
> So, adding a printf() to your code can change the semantics of your
> variables? That sounds like a really bad design to me.

That's what I was thinking. Basically, if you do an inlace operation,
then it's a reduction variable, no matter what else you do to it
(including possibly a direct assignment, though we could make that a
compile-time error).

- Robert
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread Dag Sverre Seljebotn

On 04/05/2011 04:53 PM, Robert Bradshaw wrote:

On Tue, Apr 5, 2011 at 3:51 AM, Stefan Behnel  wrote:

mark florisson, 04.04.2011 21:26:

For clarity, I'll add an example:

def f(np.ndarray[double] x, double alpha):
 cdef double s = 0
 cdef double tmp = 2
 cdef double other = 6.6

 with nogil:
 for i in prange(x.shape[0]):
 # reading 'tmp' makes it firstprivate in addition to
lastprivate
 # 'other' is only ever read, so it's shared
 printf("%lf %lf %lf\n", tmp, s, other)

So, adding a printf() to your code can change the semantics of your
variables? That sounds like a really bad design to me.

That's what I was thinking. Basically, if you do an inlace operation,
then it's a reduction variable, no matter what else you do to it
(including possibly a direct assignment, though we could make that a
compile-time error).


-1, I think that's too obscure. Not being able to use inplace operators 
for certain variables will be at the very least be nagging.


I think we need to explicitly declare something. Either a simple 
prange(..., reduce="s:+"), or all-out declaration of thread-local variables.


Reduction isn't *that* common, so perhaps that is what should be 
explicit, unlike my other proposal...


Dag
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread Robert Bradshaw
On Tue, Apr 5, 2011 at 4:52 AM, mark florisson
 wrote:
> On 5 April 2011 12:51, Stefan Behnel  wrote:
>> mark florisson, 04.04.2011 21:26:
>>>
>>> For clarity, I'll add an example:
>>>
>>> def f(np.ndarray[double] x, double alpha):
>>>     cdef double s = 0
>>>     cdef double tmp = 2
>>>     cdef double other = 6.6
>>>
>>>     with nogil:
>>>         for i in prange(x.shape[0]):
>>>             # reading 'tmp' makes it firstprivate in addition to
>>> lastprivate
>>>             # 'other' is only ever read, so it's shared
>>>             printf("%lf %lf %lf\n", tmp, s, other)
>>
>> So, adding a printf() to your code can change the semantics of your
>> variables? That sounds like a really bad design to me.
>
> I agree, I think we should refrain from the firstprivate() entirely,
> as it wouldn't have the same semantics as serial execution (as 'tmp'
> would have the original value with parallel execution and the value
> from previous iterations with serial execution). So basically we
> should allow reading of private variables only after they are assigned
> to in the loop body.

Unless I'm miss-understanding the meaning of firstprivate (it's
initialized per-thread, not per-iteration), for single-threaded
execution, it would have exactly the same semantics as serial
execution. As I mentioned before, if your code functions differently
for single or multiple threads, then it's incorrect. I think it's
natural that a parallel loop would behave like

tmp = global_value
if fork():
   # do first half of the loop, with tmp starting as global_value
else:
   # do last half of the loop, with tmp starting as global_value
# reduction magic

- Robert
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread Dag Sverre Seljebotn

On 04/05/2011 04:58 PM, Dag Sverre Seljebotn wrote:

On 04/05/2011 04:53 PM, Robert Bradshaw wrote:
On Tue, Apr 5, 2011 at 3:51 AM, Stefan Behnel  
wrote:

mark florisson, 04.04.2011 21:26:

For clarity, I'll add an example:

def f(np.ndarray[double] x, double alpha):
 cdef double s = 0
 cdef double tmp = 2
 cdef double other = 6.6

 with nogil:
 for i in prange(x.shape[0]):
 # reading 'tmp' makes it firstprivate in addition to
lastprivate
 # 'other' is only ever read, so it's shared
 printf("%lf %lf %lf\n", tmp, s, other)

So, adding a printf() to your code can change the semantics of your
variables? That sounds like a really bad design to me.

That's what I was thinking. Basically, if you do an inlace operation,
then it's a reduction variable, no matter what else you do to it
(including possibly a direct assignment, though we could make that a
compile-time error).


-1, I think that's too obscure. Not being able to use inplace 
operators for certain variables will be at the very least be nagging.


I think we need to explicitly declare something. Either a simple 
prange(..., reduce="s:+"), or all-out declaration of thread-local 
variables.


Sorry: prange(..., reduce="s"), or perhaps &s or cython.address(s). The 
+ is of course still specified in code.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread Robert Bradshaw
On Tue, Apr 5, 2011 at 5:55 AM, Pauli Virtanen  wrote:
> Mon, 04 Apr 2011 21:26:34 +0200, mark florisson wrote:
> [clip]
>> For clarity, I'll add an example:
> [clip]
>
> How about making all the special declarations explicit? The automatic
> inference of variables has a problem in that a small change in a part of
> the code can have somewhat unintuitive non-local effects, as the private/
> shared/reduction status of the variable changes in the whole function
> scope (if Python scoping is retained).
>
> Like so with explicit declarations:

That's an interesting idea. It's a bit odd specifying the scope as
part of the type, but may work. However, I'm still not convinced that
we can't safely infer this information.

> def f(np.ndarray[double] x, double alpha):
>    cdef double alpha = 6.6
>    cdef char *ptr = something()
>
>    # Parallel variables are declared beforehand;
>    # the exact syntax could also be something else
>    cdef cython.parallel.private[int] tmp = 2, tmp2
>    cdef cython.parallel.reduction[int] s = 0
>
>    # Act like ordinary cdef outside prange(); in the prange they are
>    # firstprivate if initialized or written to outside the loop anywhere
>    # in the scope. Or, they could be firstprivate always, if this
>    # has a negligible performance impact.
>    tmp = 3
>
>    with nogil:
>        s = 9
>
>        for i in prange(x.shape[0]):
>            if cython.parallel.first_iteration(i):
>                # whatever initialization; Cython is in principle allowed
>                # to move this outside the loop, at least if it is
>                # the first thing here
>                pass
>
>            # tmp2 is not firstprivate, as it's not written to outside
>            # the loop body; also, it's also not lastprivate as it's not
>            # read outside the loop
>            tmp2 = 99
>
>            # Increment a private variable
>            tmp += 2*tmp
>
>            # Add stuff to reduction
>            s += alpha*i
>
>            # The following raise a compilation error -- the reduction
>            # variable cannot be assigned to, and can be only operated on
>            # with only a single reduction operation inside prange
>            s *= 9
>            s = 8
>
>            # It can be read, however, provided openmp supports this
>            tmp = s
>
>            # Assignment to non-private variables causes a compile-time
>            # error; this avoids common mistakes, such as forgetting to
>            # declare the reduction variable.
>            alpha += 42
>            alpha123 = 9
>            ptr = 94
>
>            # These, however, need to be allowed:
>            # the users are on their own to make sure they don't clobber
>            # non-local variables
>            x[i] = 123
>            (ptr + i)[0] = 123
>            some_routine(x, ptr, i)
>        else:
>            # private variables are lastprivate if read outside the loop
>            foo = tmp
>
>        # The else: block can be added, but actually has no effect
>        # as it is always executed --- the code here could as well
>        # be written after the for loop
>        foo = tmp  # <- same result
>
>    with nogil:
>        # Suppose Cython allowed cdef inside blocks with usual scoping
>        # rules
>        cdef cython.parallel.reduction[double] r = 0
>
>        # the same variables can be used again in a second parallel loop
>        for i in prange(x.shape[0]):
>            r += 1.5
>            s -= i
>            tmp = 9
>
>        # also the iteration variable is available after the loop
>        count = i
>
>    # As per usual Cython scoping rules
>    return r, s
>
> What did I miss here? As far as I see, the above would have the same
> semantics and scoping as a single-threaded Python implementation.

One thing is that it's forcing the scope of the variable to be
consistant throughout the entire function body, so, for example, a
reduction variable in one loop could not be used as a shared in
another (without having to declare a new variable), which is a
different form of non-locality.

> The only change required to make things parallel is replacing range() by
> prange() and adding the variable declarations.
>
> --
> Pauli Virtanen
>
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread Robert Bradshaw
On Tue, Apr 5, 2011 at 8:02 AM, Dag Sverre Seljebotn
 wrote:
> On 04/05/2011 04:58 PM, Dag Sverre Seljebotn wrote:
>>
>> On 04/05/2011 04:53 PM, Robert Bradshaw wrote:
>>>
>>> On Tue, Apr 5, 2011 at 3:51 AM, Stefan Behnel
>>>  wrote:

 mark florisson, 04.04.2011 21:26:
>
> For clarity, I'll add an example:
>
> def f(np.ndarray[double] x, double alpha):
>     cdef double s = 0
>     cdef double tmp = 2
>     cdef double other = 6.6
>
>     with nogil:
>         for i in prange(x.shape[0]):
>             # reading 'tmp' makes it firstprivate in addition to
> lastprivate
>             # 'other' is only ever read, so it's shared
>             printf("%lf %lf %lf\n", tmp, s, other)

 So, adding a printf() to your code can change the semantics of your
 variables? That sounds like a really bad design to me.
>>>
>>> That's what I was thinking. Basically, if you do an inlace operation,
>>> then it's a reduction variable, no matter what else you do to it
>>> (including possibly a direct assignment, though we could make that a
>>> compile-time error).
>>
>> -1, I think that's too obscure. Not being able to use inplace operators
>> for certain variables will be at the very least be nagging.

You could still use inplace operators to your hearts content--just
don't bother using the reduced variable outside the loop. (I guess I'm
assuming reducing a variable has negligible performance overhead,
which it should.) For the rare cases that you want the non-aggregated
private, make an assignment to another variable, or use non-inplace
operations.

Not being able to mix inplace operators might be an annoyance. We
could also allow explicit declarations, as per Pauli's suggestion, but
not require them. Essentially, as long as we have

1) Sequential behavior == one thread scheduled (by semantics)
2) one thread scheduled == multiple threads scheduled (user's
responsibility, as it must be)

then I think we should be fine.

>> I think we need to explicitly declare something. Either a simple
>> prange(..., reduce="s:+"), or all-out declaration of thread-local variables.
>
> Sorry: prange(..., reduce="s"), or perhaps &s or cython.address(s). The + is
> of course still specified in code.
>
> Dag Sverre
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread Dag Sverre Seljebotn

On 04/05/2011 05:26 PM, Robert Bradshaw wrote:

On Tue, Apr 5, 2011 at 8:02 AM, Dag Sverre Seljebotn
  wrote:

On 04/05/2011 04:58 PM, Dag Sverre Seljebotn wrote:

On 04/05/2011 04:53 PM, Robert Bradshaw wrote:

On Tue, Apr 5, 2011 at 3:51 AM, Stefan Behnel
  wrote:

mark florisson, 04.04.2011 21:26:

For clarity, I'll add an example:

def f(np.ndarray[double] x, double alpha):
 cdef double s = 0
 cdef double tmp = 2
 cdef double other = 6.6

 with nogil:
 for i in prange(x.shape[0]):
 # reading 'tmp' makes it firstprivate in addition to
lastprivate
 # 'other' is only ever read, so it's shared
 printf("%lf %lf %lf\n", tmp, s, other)

So, adding a printf() to your code can change the semantics of your
variables? That sounds like a really bad design to me.

That's what I was thinking. Basically, if you do an inlace operation,
then it's a reduction variable, no matter what else you do to it
(including possibly a direct assignment, though we could make that a
compile-time error).

-1, I think that's too obscure. Not being able to use inplace operators
for certain variables will be at the very least be nagging.

You could still use inplace operators to your hearts content--just
don't bother using the reduced variable outside the loop. (I guess I'm
assuming reducing a variable has negligible performance overhead,
which it should.) For the rare cases that you want the non-aggregated
private, make an assignment to another variable, or use non-inplace
operations.


Ahh! Of course! With some control flow analysis we could even eliminate 
the reduction if the variable isn't used after the loop, although I 
agree the cost should be trivial.




Not being able to mix inplace operators might be an annoyance. We
could also allow explicit declarations, as per Pauli's suggestion, but
not require them. Essentially, as long as we have


I think you should be able to mix them, but if you do a reduction 
doesn't happen. This is slightly uncomfortable, but I believe control 
flow analysis and disabling firstprivate can solve it, see below.


I believe I'm back in the implicit-camp. And the CEP can probably be 
simplified a bit too, I'll try to do that tomorrow.


Two things:

 * It'd still be nice with something like a parallel block for thread 
setup/teardown rather than "if firstthreaditeration():". So, a prange 
for the 50% simplest cases, followed by a parallel-block for the next 30%.


 * Control flow analysis can help us tight it up a bit: For loops where 
you actually depend on values of thread-private variables computed in 
the previous iteration (beyond reduction), it'd be nice to raise a 
warning unless the variable is explicitly declared thread-local or 
similar. There are uses for such variables but they'd be rather rare, 
and such a hint could be very helpful.


I'm still not sure if we want firstprivate, even if we can do it. It'd 
be good to see a usecase for it. I'd rather have NaN and 0x7FFF 
personally, as relying on the firstprivate value is likely a bug -- yes, 
it makes the sequential case work, but that is exactly in the case where 
parallelizing the sequential case would be wrong!!


Grepping through 3 lines of heavily OpenMP-ified Fortran code here 
there's no mention of firstprivate or lastprivate (although we certainly 
want lastprivate to align with the sequential case).


Dag Sverre

___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread mark florisson
On 5 April 2011 18:32, Dag Sverre Seljebotn  wrote:
> On 04/05/2011 05:26 PM, Robert Bradshaw wrote:
>>
>> On Tue, Apr 5, 2011 at 8:02 AM, Dag Sverre Seljebotn
>>   wrote:
>>>
>>> On 04/05/2011 04:58 PM, Dag Sverre Seljebotn wrote:

 On 04/05/2011 04:53 PM, Robert Bradshaw wrote:
>
> On Tue, Apr 5, 2011 at 3:51 AM, Stefan Behnel
>  wrote:
>>
>> mark florisson, 04.04.2011 21:26:
>>>
>>> For clarity, I'll add an example:
>>>
>>> def f(np.ndarray[double] x, double alpha):
>>>     cdef double s = 0
>>>     cdef double tmp = 2
>>>     cdef double other = 6.6
>>>
>>>     with nogil:
>>>         for i in prange(x.shape[0]):
>>>             # reading 'tmp' makes it firstprivate in addition to
>>> lastprivate
>>>             # 'other' is only ever read, so it's shared
>>>             printf("%lf %lf %lf\n", tmp, s, other)
>>
>> So, adding a printf() to your code can change the semantics of your
>> variables? That sounds like a really bad design to me.
>
> That's what I was thinking. Basically, if you do an inlace operation,
> then it's a reduction variable, no matter what else you do to it
> (including possibly a direct assignment, though we could make that a
> compile-time error).

 -1, I think that's too obscure. Not being able to use inplace operators
 for certain variables will be at the very least be nagging.
>>
>> You could still use inplace operators to your hearts content--just
>> don't bother using the reduced variable outside the loop. (I guess I'm
>> assuming reducing a variable has negligible performance overhead,
>> which it should.) For the rare cases that you want the non-aggregated
>> private, make an assignment to another variable, or use non-inplace
>> operations.
>
> Ahh! Of course! With some control flow analysis we could even eliminate the
> reduction if the variable isn't used after the loop, although I agree the
> cost should be trivial.
>
>
>> Not being able to mix inplace operators might be an annoyance. We
>> could also allow explicit declarations, as per Pauli's suggestion, but
>> not require them. Essentially, as long as we have
>
> I think you should be able to mix them, but if you do a reduction doesn't
> happen. This is slightly uncomfortable, but I believe control flow analysis
> and disabling firstprivate can solve it, see below.
>
> I believe I'm back in the implicit-camp. And the CEP can probably be
> simplified a bit too, I'll try to do that tomorrow.
>
> Two things:
>
>  * It'd still be nice with something like a parallel block for thread
> setup/teardown rather than "if firstthreaditeration():". So, a prange for
> the 50% simplest cases, followed by a parallel-block for the next 30%.

Definitely, I think it could also make way for things such as sections
etc, but I'll bring that up later :)

>  * Control flow analysis can help us tight it up a bit: For loops where you
> actually depend on values of thread-private variables computed in the
> previous iteration (beyond reduction), it'd be nice to raise a warning
> unless the variable is explicitly declared thread-local or similar. There
> are uses for such variables but they'd be rather rare, and such a hint could
> be very helpful.
>
> I'm still not sure if we want firstprivate, even if we can do it. It'd be
> good to see a usecase for it. I'd rather have NaN and 0x7FFF personally,
> as relying on the firstprivate value is likely a bug -- yes, it makes the
> sequential case work, but that is exactly in the case where parallelizing
> the sequential case would be wrong!!

Yeah, I think if we go the implicit route then firstprivate might be
quite a surprise for users.

> Grepping through 3 lines of heavily OpenMP-ified Fortran code here
> there's no mention of firstprivate or lastprivate (although we certainly
> want lastprivate to align with the sequential case).
>
> Dag Sverre
>
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

Basically I'm fine with either implicit or explicit, although I think
the explicit case would be easier to understand for people that have
used OpenMP. In either case it would be nice to give prange a 'nogil'
option.

So to be clear, when we assign to a variable it will be lastprivate,
and when we assign to the subscript of a variable we make that
variable shared (unless it is declared inside the parallel with
block), right?
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


[Cython] prange CEP updated

2011-04-05 Thread Dag Sverre Seljebotn
I've done a pretty major revision to the prange CEP, bringing in a lot 
of the feedback.


Thread-private variables are now split in two cases:

 i) The safe cases, which really require very little technical 
knowledge -> automatically inferred


 ii) As an advanced feature, unsafe cases that requires some knowledge 
of threading -> must be explicitly declared


I think this split simplifies things a great deal.

I'm rather excited over this now; this could turn out to be a really 
user-friendly and safe feature that would not only allow us to support 
OpenMP-like threading, but be more convenient to use in a range of 
common cases.


http://wiki.cython.org/enhancements/prange 



Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-04-05 Thread Dag Sverre Seljebotn

On 04/05/2011 10:29 PM, Dag Sverre Seljebotn wrote:
I've done a pretty major revision to the prange CEP, bringing in a lot 
of the feedback.


Thread-private variables are now split in two cases:

 i) The safe cases, which really require very little technical 
knowledge -> automatically inferred


 ii) As an advanced feature, unsafe cases that requires some knowledge 
of threading -> must be explicitly declared


I think this split simplifies things a great deal.

I'm rather excited over this now; this could turn out to be a really 
user-friendly and safe feature that would not only allow us to support 
OpenMP-like threading, but be more convenient to use in a range of 
common cases.


http://wiki.cython.org/enhancements/prange 



As a digression: threadlocal(int)-variables could also be supported 
elsewhere as syntax candy for the pythread.h Thread Local Storage, which 
would work well for fast TLS for any kind of threads (e.g., when using 
threading module).


Dag Sverre

(Sorry about the previous HTML-mail.)
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel