Re: [Cython] compiler performance issue for extended utility code

2011-10-09 Thread mark florisson
On 8 October 2011 10:22, mark florisson  wrote:

> On 8 October 2011 08:03, Stefan Behnel  wrote:
> > Vitja Makarov, 07.10.2011 18:01:
> >>>
> >>> 2011/10/7 Stefan Behnel:
> 
>  Vitja Makarov, 06.10.2011 23:12:
> >
> > Here is small comparison on compiling urllib.py with cython:
> >
> > ((e8527c5...)) vitja@mchome:~/work/cython-vitek-git/zzz$ time python
> > ../cython.py urllib.py
> >
> > real0m1.699s
> > user0m1.650s
> > sys 0m0.040s
> > (master) vitja@mchome:~/work/cython-vitek-git/zzz$ time python
> > ../cython.py urllib.py
> >
> > real0m2.830s
> > user0m2.790s
> > sys 0m0.030s
> >
> >
> > It's about 1.5 times slower.
> 
>  That's a pretty serious regression for
>  plain Python code then. Again, this needs proper profiling.
> >>
> >> I've added return statement on top of CythonScope.test_cythonscope,
> >> now I have these timings:
> >>
> >> (master) vitja@mchome:~/work/cython-vitek-git/zzz$ time python
> >> ../cython.py urllib.py
> >>
> >> real0m1.764s
> >> user0m1.700s
> >> sys 0m0.060s
> >
> > Ok, then it's only a bug. "create_testscope" is on by default in Main.py,
> > Context.__init__(). I don't know what it does exactly, but my guess is
> that
> > the option should a) be off by default and b) should rather be passed in
> by
> > the test runner as part of the compile options rather than being a
> parameter
> > of the Context class. AFAICT, it's currently only used in
> TreeFragment.py,
> > where it is being switched off explicitly for parsing code snippets.
> >
> > Stefan
> > ___
> > cython-devel mailing list
> > cython-devel@python.org
> > http://mail.python.org/mailman/listinfo/cython-devel
> >
>
> It turns it off to avoid infinite recursion. This basically means that
> you cannot use stuf from the Cython scope in your Cython utilities. So
> in your Cython utilities, you have to declare the C version of it
> (which you declared with the @cname decorator).
>
> This is not really something that can just be avoided loading like
> this. Perhaps one solution could be to load the test scope when you do
> a lookup in the cython scope for which no entry is found. But really,
> libcython and serializing entries will solve all this, so I suppose
> the real question is, do we want to do a release before we support
> such functionality?
> Anyway, the cython scope lookup would be a simple hack worth a try.
>

I applied the hack, i.e. defer loading the scope until the first entry in
the cython scope can't be found:
https://github.com/markflorisson88/cython/commit/ad4cf6303d1bf8a81e3afccc9572559a34827a3b

[0] [11:16] ~  ➤ time cython urllib.py # conditionally load scope
cython urllib.py  2.75s user 0.14s system 99% cpu 2.893 total
[0] [11:17] ~  ➤ time cython urllib.py # always load scope
cython urllib.py  4.08s user 0.16s system 99% cpu 4.239 total
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


[Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-09 Thread mark florisson
Hey,

So far people have been enthusiastic about the cython.parallel features, I
think we should introduce some new features. I propose the following, assume
parallel has been imported from cython:

with parallel.master():
this is executed in the master thread in a parallel (non-prange) section

with parallel.single():
   same as master, except any thread may do the execution

An optional keyword argument 'nowait' specifies whether there will be a
barrier at the end. The default is to wait.

with parallel.task():
create a task to be executed by some thread in the team
once a thread takes up the task it shall only be executed by that thread
and no other thread (so the task will be tied to the thread)

C variables will be firstprivate
Python objects will be shared

parallel.taskwait() # wait on any direct descendent tasks to finish

with parallel.critical():
this section of code is mutually exclusive with other critical sections

optional keyword argument 'name' specifies a name for the critical
section,
which means all sections with that name will exclude each other, but not
critical sections with different names

Note: all threads that encounter the section will execute it, just not
at the same time

with parallel.barrier():
all threads wait until everyone has reached the barrier
either no one or everyone should encounter the barrier
shared variables are flushed

Unfortunately, gcc again manages to horribly break master and single
constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll first
file a bug report. Other (better) compilers like Portland (and I'm sure
Intel) work fine. I suppose a warning in the documentation will suffice
there.

If we at some point implement vector/SIMD operations we could also try out
the Fortran openmp workshare construct.

What do you guys think?

Mark
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-09 Thread Dag Sverre Seljebotn

On 10/09/2011 02:11 PM, mark florisson wrote:

Hey,

So far people have been enthusiastic about the cython.parallel features,
I think we should introduce some new features. I propose the following,


Great!!

I only have time for a very short feedback now, perhaps more will follow.


assume parallel has been imported from cython:

with parallel.master():
 this is executed in the master thread in a parallel (non-prange)
section

with parallel.single():
same as master, except any thread may do the execution

An optional keyword argument 'nowait' specifies whether there will be a
barrier at the end. The default is to wait.

with parallel.task():
 create a task to be executed by some thread in the team
 once a thread takes up the task it shall only be executed by that
thread and no other thread (so the task will be tied to the thread)

 C variables will be firstprivate
 Python objects will be shared

parallel.taskwait() # wait on any direct descendent tasks to finish


Regarding tasks, I think this is mapping OpenMP too close to Python. 
Closures are excellent for the notion of a task, so I think something 
based on the futures API would work better. I realize that makes the 
mapping to OpenMP and implementation a bit more difficult, but I think 
it is worth it in the long run.




with parallel.critical():
 this section of code is mutually exclusive with other critical sections
 optional keyword argument 'name' specifies a name for the critical
section,
 which means all sections with that name will exclude each other,
but not
 critical sections with different names

 Note: all threads that encounter the section will execute it, just
not at the same time

with parallel.barrier():
 all threads wait until everyone has reached the barrier
 either no one or everyone should encounter the barrier
 shared variables are flushed

Unfortunately, gcc again manages to horribly break master and single
constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll
first file a bug report. Other (better) compilers like Portland (and I'm
sure Intel) work fine. I suppose a warning in the documentation will
suffice there.

If we at some point implement vector/SIMD operations we could also try
out the Fortran openmp workshare construct.


I'm starting to learn myself OpenCL as part of a course. It's very neat 
for some kinds of parallelism. What I'm saying is that at least of the 
case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking 
too early, but also look forward to coming architectures (e.g., AMD's 
GPU-and-CPU on same die design).


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-09 Thread Dag Sverre Seljebotn

On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:

On 10/09/2011 02:11 PM, mark florisson wrote:

Hey,

So far people have been enthusiastic about the cython.parallel features,
I think we should introduce some new features. I propose the following,


Great!!

I only have time for a very short feedback now, perhaps more will follow.


assume parallel has been imported from cython:

with parallel.master():
this is executed in the master thread in a parallel (non-prange)
section

with parallel.single():
same as master, except any thread may do the execution

An optional keyword argument 'nowait' specifies whether there will be a
barrier at the end. The default is to wait.


I like

if parallel.is_master():
...
explicit_barrier_somehow() # see below

better as a Pythonization. One could easily support is_master to be used 
in other contexts as well, simply by assigning a status flag in the 
master block.


Using an if-test flows much better with Python I feel, but that 
naturally lead to making the barrier explicit. But I like the barrier 
always being explicit, rather than having it as a predicate on all the 
different constructs like in OpenMP


I'm less sure about single, since making it a function indicates one 
could use it in other contexts and the whole thing becomes too magic 
(since it's tied to the position of invocation). I'm tempted to suggest


for _ in prange(1):
...

as our syntax for single.



with parallel.task():
create a task to be executed by some thread in the team
once a thread takes up the task it shall only be executed by that
thread and no other thread (so the task will be tied to the thread)

C variables will be firstprivate
Python objects will be shared

parallel.taskwait() # wait on any direct descendent tasks to finish


Regarding tasks, I think this is mapping OpenMP too close to Python.
Closures are excellent for the notion of a task, so I think something
based on the futures API would work better. I realize that makes the
mapping to OpenMP and implementation a bit more difficult, but I think
it is worth it in the long run.



with parallel.critical():
this section of code is mutually exclusive with other critical sections
optional keyword argument 'name' specifies a name for the critical
section,
which means all sections with that name will exclude each other,
but not
critical sections with different names

Note: all threads that encounter the section will execute it, just
not at the same time


Yes, this works well as a with-statement...

..except that it is slightly magic in that it binds to call position 
(unlike anything in Python). I.e. this would be more "correct", or at 
least Pythonic:


with parallel.critical(__file__, __line__):
...




with parallel.barrier():
all threads wait until everyone has reached the barrier
either no one or everyone should encounter the barrier
shared variables are flushed


I have problems with requiring a noop with block...

I'd much rather write

parallel.barrier()

However, that ties a function call to the place of invocation, and 
suggests that one could do


if rand() > .5:
barrier()
else:
i += 3
barrier()

and have the same barrier in each case. Again,

barrier(__file__, __line__)

gets us purity at the cost of practicality. Another way is the pthreads 
approach (although one may have to use pthread rather then OpenMP to get 
it, unless there are named barriers?):


barrier_a = parallel.barrier()
barrier_b = parallel.barrier()
with parallel:
barrier_a.wait()
if rand() > .5:
barrier_b.wait()
else:
i += 3
barrier_b.wait()


I'm really not sure here.



Unfortunately, gcc again manages to horribly break master and single
constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll
first file a bug report. Other (better) compilers like Portland (and I'm
sure Intel) work fine. I suppose a warning in the documentation will
suffice there.

If we at some point implement vector/SIMD operations we could also try
out the Fortran openmp workshare construct.


I'm starting to learn myself OpenCL as part of a course. It's very neat
for some kinds of parallelism. What I'm saying is that at least of the
case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking
too early, but also look forward to coming architectures (e.g., AMD's
GPU-and-CPU on same die design).

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-09 Thread mark florisson
On 9 October 2011 13:18, Dag Sverre Seljebotn
 wrote:
>
> On 10/09/2011 02:11 PM, mark florisson wrote:
>>
>> Hey,
>>
>> So far people have been enthusiastic about the cython.parallel features,
>> I think we should introduce some new features. I propose the following,
>
> Great!!
>
> I only have time for a very short feedback now, perhaps more will follow.
>
>> assume parallel has been imported from cython:
>>
>> with parallel.master():
>>     this is executed in the master thread in a parallel (non-prange)
>> section
>>
>> with parallel.single():
>>    same as master, except any thread may do the execution
>>
>> An optional keyword argument 'nowait' specifies whether there will be a
>> barrier at the end. The default is to wait.
>>
>> with parallel.task():
>>     create a task to be executed by some thread in the team
>>     once a thread takes up the task it shall only be executed by that
>> thread and no other thread (so the task will be tied to the thread)
>>
>>     C variables will be firstprivate
>>     Python objects will be shared
>>
>> parallel.taskwait() # wait on any direct descendent tasks to finish
>
> Regarding tasks, I think this is mapping OpenMP too close to Python. Closures 
> are excellent for the notion of a task, so I think something based on the 
> futures API would work better. I realize that makes the mapping to OpenMP and 
> implementation a bit more difficult, but I think it is worth it in the long 
> run.

Hmm, that would be cool as well. Something like parallel.submit_task(myclosure)?

The problem I see with that is that parallel stuff can't have the GIL,
and you can only have 'def' closures at the moment. I realize that you
won't actually have to use closure support here though, and could just
transform the inner function to OpenMP task code. This would maybe
look inconsistent with other closures though, and you'd also have to
restrict the use of such a closure to parallel.submit_task().

Anyway, perhaps you have a concrete proposal that addresses these problems.

>>
>> with parallel.critical():
>>     this section of code is mutually exclusive with other critical sections
>>     optional keyword argument 'name' specifies a name for the critical
>> section,
>>     which means all sections with that name will exclude each other,
>> but not
>>     critical sections with different names
>>
>>     Note: all threads that encounter the section will execute it, just
>> not at the same time
>>
>> with parallel.barrier():
>>     all threads wait until everyone has reached the barrier
>>     either no one or everyone should encounter the barrier
>>     shared variables are flushed
>>
>> Unfortunately, gcc again manages to horribly break master and single
>> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll
>> first file a bug report. Other (better) compilers like Portland (and I'm
>> sure Intel) work fine. I suppose a warning in the documentation will
>> suffice there.
>>
>> If we at some point implement vector/SIMD operations we could also try
>> out the Fortran openmp workshare construct.
>
> I'm starting to learn myself OpenCL as part of a course. It's very neat for 
> some kinds of parallelism. What I'm saying is that at least of the case of 
> SIMD, we should not lock ourselves to Fortran+OpenMP thinking too early, but 
> also look forward to coming architectures (e.g., AMD's GPU-and-CPU on same 
> die design).

Oh, definitely. The good thing is that code generation backends
needn't be that hard. If you figure all semantics out in the Python
code you could based on a backend load a different utility template as
a string. It's probably not that easy, but the point is that as long
as your code semantics don't prevent other backends, you keep your
options open.

In the end I want to be able to write a parallel program almost
serially and have Cython compile it to OpenMP, MPI, GPU's or whatever
else I need. At the same time I need to stay in touch with reality, so
it's one step at a time :)

> Dag Sverre
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-09 Thread mark florisson
On 9 October 2011 13:57, Dag Sverre Seljebotn
 wrote:
> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>
>> On 10/09/2011 02:11 PM, mark florisson wrote:
>>>
>>> Hey,
>>>
>>> So far people have been enthusiastic about the cython.parallel features,
>>> I think we should introduce some new features. I propose the following,
>>
>> Great!!
>>
>> I only have time for a very short feedback now, perhaps more will follow.
>>
>>> assume parallel has been imported from cython:
>>>
>>> with parallel.master():
>>> this is executed in the master thread in a parallel (non-prange)
>>> section
>>>
>>> with parallel.single():
>>> same as master, except any thread may do the execution
>>>
>>> An optional keyword argument 'nowait' specifies whether there will be a
>>> barrier at the end. The default is to wait.
>
> I like
>
> if parallel.is_master():
>    ...
> explicit_barrier_somehow() # see below
>
> better as a Pythonization. One could easily support is_master to be used in
> other contexts as well, simply by assigning a status flag in the master
> block.
>
> Using an if-test flows much better with Python I feel, but that naturally
> lead to making the barrier explicit. But I like the barrier always being
> explicit, rather than having it as a predicate on all the different
> constructs like in OpenMP

Hmm, that might mean you also want the barrier for a prange in a
parallel to be explicit. I like the 'if' test though, although it
wouldn't make sense for 'single'.

> I'm less sure about single, since making it a function indicates one could
> use it in other contexts and the whole thing becomes too magic (since it's
> tied to the position of invocation). I'm tempted to suggest
>
> for _ in prange(1):
>    ...
>
> as our syntax for single.

I think that syntax is absolutely terrible :) Perhaps single is not so
important and one can just use master instead (or, if really needed,
master + a task with the actual work).

>>>
>>> with parallel.task():
>>> create a task to be executed by some thread in the team
>>> once a thread takes up the task it shall only be executed by that
>>> thread and no other thread (so the task will be tied to the thread)
>>>
>>> C variables will be firstprivate
>>> Python objects will be shared
>>>
>>> parallel.taskwait() # wait on any direct descendent tasks to finish
>>
>> Regarding tasks, I think this is mapping OpenMP too close to Python.
>> Closures are excellent for the notion of a task, so I think something
>> based on the futures API would work better. I realize that makes the
>> mapping to OpenMP and implementation a bit more difficult, but I think
>> it is worth it in the long run.
>>
>>>
>>> with parallel.critical():
>>> this section of code is mutually exclusive with other critical sections
>>> optional keyword argument 'name' specifies a name for the critical
>>> section,
>>> which means all sections with that name will exclude each other,
>>> but not
>>> critical sections with different names
>>>
>>> Note: all threads that encounter the section will execute it, just
>>> not at the same time
>
> Yes, this works well as a with-statement...
>
> ..except that it is slightly magic in that it binds to call position (unlike
> anything in Python). I.e. this would be more "correct", or at least
> Pythonic:
>
> with parallel.critical(__file__, __line__):
>    ...
>

I'm not entirely sure what you mean here. Critical is really about the
block contained within, not about a position in a file. Not all
threads have to encounter the critical region, and not specifying a
name means you exclude with *all other* unnamed critical sections (not
just this one).

>>>
>>> with parallel.barrier():
>>> all threads wait until everyone has reached the barrier
>>> either no one or everyone should encounter the barrier
>>> shared variables are flushed
>
> I have problems with requiring a noop with block...
>
> I'd much rather write
>
> parallel.barrier()

Although in OpenMP it doesn't have any associated code, but we could
give it those semantics: apply the barrier at the end of the block of
code. The con is that the barrier is at the top while it only affects
leaving the block, you would write:

with parallel.barrier():
if rand() > .5:
...
else:
...
# the barrier is here

> However, that ties a function call to the place of invocation, and suggests
> that one could do
>
> if rand() > .5:
>    barrier()
> else:
>    i += 3
>    barrier()
>
> and have the same barrier in each case. Again,
>
> barrier(__file__, __line__)
>
> gets us purity at the cost of practicality.

In this case (unlike the critical construct), yes. I think a warning
in the docs stating that either all or none of the threads must
encounter the barrier should suffice.

> Another way is the pthreads
> approach (although one may have to use pthread rather then OpenMP to get it,
> unless there are named barriers?):
>
> barrier_a = parallel.barrier()
> barrier_b = parallel.barrier()
> with parallel:
>    barrier_a.wa

Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-09 Thread mark florisson
On 9 October 2011 14:30, mark florisson  wrote:
> On 9 October 2011 13:57, Dag Sverre Seljebotn
>  wrote:
>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>>
>>> On 10/09/2011 02:11 PM, mark florisson wrote:

 Hey,

 So far people have been enthusiastic about the cython.parallel features,
 I think we should introduce some new features. I propose the following,
>>>
>>> Great!!
>>>
>>> I only have time for a very short feedback now, perhaps more will follow.
>>>
 assume parallel has been imported from cython:

 with parallel.master():
 this is executed in the master thread in a parallel (non-prange)
 section

 with parallel.single():
 same as master, except any thread may do the execution

 An optional keyword argument 'nowait' specifies whether there will be a
 barrier at the end. The default is to wait.
>>
>> I like
>>
>> if parallel.is_master():
>>    ...
>> explicit_barrier_somehow() # see below
>>
>> better as a Pythonization. One could easily support is_master to be used in
>> other contexts as well, simply by assigning a status flag in the master
>> block.
>>
>> Using an if-test flows much better with Python I feel, but that naturally
>> lead to making the barrier explicit. But I like the barrier always being
>> explicit, rather than having it as a predicate on all the different
>> constructs like in OpenMP
>
> Hmm, that might mean you also want the barrier for a prange in a
> parallel to be explicit. I like the 'if' test though, although it
> wouldn't make sense for 'single'.
>
>> I'm less sure about single, since making it a function indicates one could
>> use it in other contexts and the whole thing becomes too magic (since it's
>> tied to the position of invocation). I'm tempted to suggest
>>
>> for _ in prange(1):
>>    ...
>>
>> as our syntax for single.
>
> I think that syntax is absolutely terrible :) Perhaps single is not so
> important and one can just use master instead (or, if really needed,
> master + a task with the actual work).
>

 with parallel.task():
 create a task to be executed by some thread in the team
 once a thread takes up the task it shall only be executed by that
 thread and no other thread (so the task will be tied to the thread)

 C variables will be firstprivate
 Python objects will be shared

 parallel.taskwait() # wait on any direct descendent tasks to finish
>>>
>>> Regarding tasks, I think this is mapping OpenMP too close to Python.
>>> Closures are excellent for the notion of a task, so I think something
>>> based on the futures API would work better. I realize that makes the
>>> mapping to OpenMP and implementation a bit more difficult, but I think
>>> it is worth it in the long run.
>>>

 with parallel.critical():
 this section of code is mutually exclusive with other critical sections
 optional keyword argument 'name' specifies a name for the critical
 section,
 which means all sections with that name will exclude each other,
 but not
 critical sections with different names

 Note: all threads that encounter the section will execute it, just
 not at the same time
>>
>> Yes, this works well as a with-statement...
>>
>> ..except that it is slightly magic in that it binds to call position (unlike
>> anything in Python). I.e. this would be more "correct", or at least
>> Pythonic:
>>
>> with parallel.critical(__file__, __line__):
>>    ...
>>
>
> I'm not entirely sure what you mean here. Critical is really about the
> block contained within, not about a position in a file. Not all
> threads have to encounter the critical region, and not specifying a
> name means you exclude with *all other* unnamed critical sections (not
> just this one).
>

 with parallel.barrier():
 all threads wait until everyone has reached the barrier
 either no one or everyone should encounter the barrier
 shared variables are flushed
>>
>> I have problems with requiring a noop with block...
>>
>> I'd much rather write
>>
>> parallel.barrier()
>
> Although in OpenMP it doesn't have any associated code, but we could
> give it those semantics: apply the barrier at the end of the block of
> code. The con is that the barrier is at the top while it only affects
> leaving the block, you would write:
>
> with parallel.barrier():
>    if rand() > .5:
>        ...
>    else:
>        ...
> # the barrier is here
>
>> However, that ties a function call to the place of invocation, and suggests
>> that one could do
>>
>> if rand() > .5:
>>    barrier()
>> else:
>>    i += 3
>>    barrier()
>>
>> and have the same barrier in each case. Again,
>>
>> barrier(__file__, __line__)
>>
>> gets us purity at the cost of practicality.
>
> In this case (unlike the critical construct), yes. I think a warning
> in the docs stating that either all or none of the threads must
> encounter the barrier should suffice.
>
>> Another way is the pthreads

Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-09 Thread mark florisson
On 9 October 2011 14:39, mark florisson  wrote:
> On 9 October 2011 14:30, mark florisson  wrote:
>> On 9 October 2011 13:57, Dag Sverre Seljebotn
>>  wrote:
>>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:

 On 10/09/2011 02:11 PM, mark florisson wrote:
>
> Hey,
>
> So far people have been enthusiastic about the cython.parallel features,
> I think we should introduce some new features. I propose the following,

 Great!!

 I only have time for a very short feedback now, perhaps more will follow.

> assume parallel has been imported from cython:
>
> with parallel.master():
> this is executed in the master thread in a parallel (non-prange)
> section
>
> with parallel.single():
> same as master, except any thread may do the execution
>
> An optional keyword argument 'nowait' specifies whether there will be a
> barrier at the end. The default is to wait.
>>>
>>> I like
>>>
>>> if parallel.is_master():
>>>    ...
>>> explicit_barrier_somehow() # see below
>>>
>>> better as a Pythonization. One could easily support is_master to be used in
>>> other contexts as well, simply by assigning a status flag in the master
>>> block.
>>>
>>> Using an if-test flows much better with Python I feel, but that naturally
>>> lead to making the barrier explicit. But I like the barrier always being
>>> explicit, rather than having it as a predicate on all the different
>>> constructs like in OpenMP
>>
>> Hmm, that might mean you also want the barrier for a prange in a
>> parallel to be explicit. I like the 'if' test though, although it
>> wouldn't make sense for 'single'.
>>
>>> I'm less sure about single, since making it a function indicates one could
>>> use it in other contexts and the whole thing becomes too magic (since it's
>>> tied to the position of invocation). I'm tempted to suggest
>>>
>>> for _ in prange(1):
>>>    ...
>>>
>>> as our syntax for single.
>>
>> I think that syntax is absolutely terrible :) Perhaps single is not so
>> important and one can just use master instead (or, if really needed,
>> master + a task with the actual work).
>>
>
> with parallel.task():
> create a task to be executed by some thread in the team
> once a thread takes up the task it shall only be executed by that
> thread and no other thread (so the task will be tied to the thread)
>
> C variables will be firstprivate
> Python objects will be shared
>
> parallel.taskwait() # wait on any direct descendent tasks to finish

 Regarding tasks, I think this is mapping OpenMP too close to Python.
 Closures are excellent for the notion of a task, so I think something
 based on the futures API would work better. I realize that makes the
 mapping to OpenMP and implementation a bit more difficult, but I think
 it is worth it in the long run.

>
> with parallel.critical():
> this section of code is mutually exclusive with other critical sections
> optional keyword argument 'name' specifies a name for the critical
> section,
> which means all sections with that name will exclude each other,
> but not
> critical sections with different names
>
> Note: all threads that encounter the section will execute it, just
> not at the same time
>>>
>>> Yes, this works well as a with-statement...
>>>
>>> ..except that it is slightly magic in that it binds to call position (unlike
>>> anything in Python). I.e. this would be more "correct", or at least
>>> Pythonic:
>>>
>>> with parallel.critical(__file__, __line__):
>>>    ...
>>>
>>
>> I'm not entirely sure what you mean here. Critical is really about the
>> block contained within, not about a position in a file. Not all
>> threads have to encounter the critical region, and not specifying a
>> name means you exclude with *all other* unnamed critical sections (not
>> just this one).
>>
>
> with parallel.barrier():
> all threads wait until everyone has reached the barrier
> either no one or everyone should encounter the barrier
> shared variables are flushed
>>>
>>> I have problems with requiring a noop with block...
>>>
>>> I'd much rather write
>>>
>>> parallel.barrier()
>>
>> Although in OpenMP it doesn't have any associated code, but we could
>> give it those semantics: apply the barrier at the end of the block of
>> code. The con is that the barrier is at the top while it only affects
>> leaving the block, you would write:
>>
>> with parallel.barrier():
>>    if rand() > .5:
>>        ...
>>    else:
>>        ...
>> # the barrier is here
>>
>>> However, that ties a function call to the place of invocation, and suggests
>>> that one could do
>>>
>>> if rand() > .5:
>>>    barrier()
>>> else:
>>>    i += 3
>>>    barrier()
>>>
>>> and have the same barrier in each case. Again,
>>>
>>> barrier(__file__, __line__)
>>>
>>> gets us purity at the cost of practicality.
>>
>> In this case (

[Cython] PyCon-DE wrap-up by Kay Hayen

2011-10-09 Thread Stefan Behnel

Hi,

Kay Hayen wrote a blog post about his view of the first PyCon-DE, including 
a bit on the discussions I had with him about Nuitka.


http://www.nuitka.net/blog/2011/10/pycon-de-2011-my-report/

It was interesting to see that Nuitka actually comes from the other side, 
meaning that it tries to be a pure Python compiler, but should at some 
point start to support (Python) type hints for the compiler. Cython made 
static types a language feature from the very beginning and is now fixing 
up the Python compatibility. So both systems will eventually become rather 
similar in what they achieve, with Cython being essentially a superset of 
the feature set of Nuitka due to its additional focus on talking to 
external libraries efficiently and supporting things like parallel loops or 
the PEP-3118 buffer interface.


One of the impressions I took out of the technical discussions with Kay is 
that there isn't really a good reason why Cython should refuse to duplicate 
some of the inner mechanics of CPython for optimisation purposes. Nuitka 
appears to be somewhat more aggressive here, partly because Kay doesn't 
currently care all that much about portability (e.g. to Python 3).


I was previously very opposed to that (you may remember my opposition to 
the list.pop() optimisation), but now I think that we have to fix up the 
generated code for each new major CPython release anyway, so it won't make 
a difference if we have to rework some more of the code because a bit of 
those inner workings changed. They sure won't change for released CPython 
versions anymore, and many implementation details are unlikely enough to 
change for years to come. It's good to continue to be considerate about 
such changes, but some of them may well bring another serious bit of 
performance without introducing real portability risks. Changes like the 
Unicode string restructuring in PEP-393 show that even relying on official 
and long standing parts of the C-API isn't enough to guarantee that code 
still works as expected in new releases, so we may just as well start 
digging deeper.


Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] PyCon-DE wrap-up by Kay Hayen

2011-10-09 Thread mark florisson
On 9 October 2011 18:35, Stefan Behnel  wrote:
> Hi,
>
> Kay Hayen wrote a blog post about his view of the first PyCon-DE, including
> a bit on the discussions I had with him about Nuitka.
>
> http://www.nuitka.net/blog/2011/10/pycon-de-2011-my-report/
>
> It was interesting to see that Nuitka actually comes from the other side,
> meaning that it tries to be a pure Python compiler, but should at some point
> start to support (Python) type hints for the compiler. Cython made static
> types a language feature from the very beginning and is now fixing up the
> Python compatibility. So both systems will eventually become rather similar
> in what they achieve, with Cython being essentially a superset of the
> feature set of Nuitka due to its additional focus on talking to external
> libraries efficiently and supporting things like parallel loops or the
> PEP-3118 buffer interface.
>
> One of the impressions I took out of the technical discussions with Kay is
> that there isn't really a good reason why Cython should refuse to duplicate
> some of the inner mechanics of CPython for optimisation purposes. Nuitka
> appears to be somewhat more aggressive here, partly because Kay doesn't
> currently care all that much about portability (e.g. to Python 3).

Interesting. What kind of (significant) optimizations could be made by
duplicating code? Do you want to duplicate entire functions or do you
want to inline parts of those?

I actually think we should not get too tied to CPython, e.g. what if
PyPy gets a CPython compatible API, or possibly a subset like PEP 384?

> I was previously very opposed to that (you may remember my opposition to the
> list.pop() optimisation), but now I think that we have to fix up the
> generated code for each new major CPython release anyway, so it won't make a
> difference if we have to rework some more of the code because a bit of those
> inner workings changed. They sure won't change for released CPython versions
> anymore, and many implementation details are unlikely enough to change for
> years to come. It's good to continue to be considerate about such changes,
> but some of them may well bring another serious bit of performance without
> introducing real portability risks. Changes like the Unicode string
> restructuring in PEP-393 show that even relying on official and long
> standing parts of the C-API isn't enough to guarantee that code still works
> as expected in new releases, so we may just as well start digging deeper.
>
> Stefan
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-09 Thread Jon Olav Vik
On Sun, Oct 9, 2011 at 2:57 PM, Dag Sverre Seljebotn
 wrote:
>>> with parallel.single():
>>> same as master, except any thread may do the execution
>>>
>>> An optional keyword argument 'nowait' specifies whether there will be a
>>> barrier at the end. The default is to wait.
>
> I like
>
> if parallel.is_master():
>    ...
> explicit_barrier_somehow() # see below
>
> better as a Pythonization. One could easily support is_master to be used in
> other contexts as well, simply by assigning a status flag in the master
> block.
>
> Using an if-test flows much better with Python I feel, but that naturally
> lead to making the barrier explicit. But I like the barrier always being
> explicit, rather than having it as a predicate on all the different
> constructs like in OpenMP

Personally, I think I'd prefer find context managers as a very
readable way to deal with parallelism, similar to the "threading"
module:

http://docs.python.org/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-09 Thread mark florisson
On 9 October 2011 19:54, Jon Olav Vik  wrote:
> On Sun, Oct 9, 2011 at 2:57 PM, Dag Sverre Seljebotn
>  wrote:
 with parallel.single():
 same as master, except any thread may do the execution

 An optional keyword argument 'nowait' specifies whether there will be a
 barrier at the end. The default is to wait.
>>
>> I like
>>
>> if parallel.is_master():
>>    ...
>> explicit_barrier_somehow() # see below
>>
>> better as a Pythonization. One could easily support is_master to be used in
>> other contexts as well, simply by assigning a status flag in the master
>> block.
>>
>> Using an if-test flows much better with Python I feel, but that naturally
>> lead to making the barrier explicit. But I like the barrier always being
>> explicit, rather than having it as a predicate on all the different
>> constructs like in OpenMP
>
> Personally, I think I'd prefer find context managers as a very
> readable way to deal with parallelism, similar to the "threading"
> module:
>
> http://docs.python.org/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

Yeah it makes a lot of sense for mutual exclusion, but 'master' really
means "only the master thread executes this peace of code, even though
other threads encounter the same code", which is more akin to 'if'
than 'with'.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-09 Thread Jon Olav Vik
On Sun, Oct 9, 2011 at 9:01 PM, mark florisson
 wrote:
> On 9 October 2011 19:54, Jon Olav Vik  wrote:
>> Personally, I think I'd prefer context managers as a very
>> readable way to deal with parallelism
>
> Yeah it makes a lot of sense for mutual exclusion, but 'master' really
> means "only the master thread executes this peace of code, even though
> other threads encounter the same code", which is more akin to 'if'
> than 'with'.

I see your point. However, another similarity with "with" statements
as an encapsulated "try..finally" is when there's a barrier at the end
of the block. I can live with some magic if it saves me from having a
boilerplate line of "barrier" everywhere 8-)
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-09 Thread mark florisson
On 9 October 2011 21:48, Jon Olav Vik  wrote:
> On Sun, Oct 9, 2011 at 9:01 PM, mark florisson
>  wrote:
>> On 9 October 2011 19:54, Jon Olav Vik  wrote:
>>> Personally, I think I'd prefer context managers as a very
>>> readable way to deal with parallelism
>>
>> Yeah it makes a lot of sense for mutual exclusion, but 'master' really
>> means "only the master thread executes this peace of code, even though
>> other threads encounter the same code", which is more akin to 'if'
>> than 'with'.
>
> I see your point. However, another similarity with "with" statements
> as an encapsulated "try..finally" is when there's a barrier at the end
> of the block. I can live with some magic if it saves me from having a
> boilerplate line of "barrier" everywhere 8-)
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

Hm, indeed. I just noticed that unlike single constructs, master
constructs don't have barriers. Both are also not allowed to be
closely nested in worksharing constructs. I think the single directive
is more useful with respect to tasks, e.g. have a single thread
generate tasks and have other threads waiting at the barrier execute
them. In that sense I suppose 'if parallel.is_master():' makes sense
(no barrier, master thread) and 'with single():' (with barrier, any
thread).

We could still support single in prange though, if we simply have the
master thread execute it ('if (omp_get_thread_num() == 0)') and put a
barrier after the block. This makes me wonder what the point of master
was supposed to be...
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel