On 10/12/2011 09:55 AM, Robert Bradshaw wrote:
On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn
<d.s.seljeb...@astro.uio.no>  wrote:
On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:

On 10/09/2011 02:11 PM, mark florisson wrote:

Hey,

So far people have been enthusiastic about the cython.parallel features,
I think we should introduce some new features.

Excellent. I think this is going to become a killer feature like
buffer support.

I propose the following,

Great!!

I only have time for a very short feedback now, perhaps more will follow.

assume parallel has been imported from cython:

with parallel.master():
this is executed in the master thread in a parallel (non-prange)
section

with parallel.single():
same as master, except any thread may do the execution

An optional keyword argument 'nowait' specifies whether there will be a
barrier at the end. The default is to wait.

I like

if parallel.is_master():
    ...
explicit_barrier_somehow() # see below

better as a Pythonization. One could easily support is_master to be used in
other contexts as well, simply by assigning a status flag in the master
block.

+1, the if statement feels a lot more natural.

Using an if-test flows much better with Python I feel, but that naturally
lead to making the barrier explicit. But I like the barrier always being
explicit, rather than having it as a predicate on all the different
constructs like in OpenMP....

I'm less sure about single, since making it a function indicates one could
use it in other contexts and the whole thing becomes too magic (since it's
tied to the position of invocation). I'm tempted to suggest

for _ in prange(1):
    ...

as our syntax for single.

Just to be clear: My point was that the above implements single behaviour even now, without any extra effort.


The idea here is that you want a block of code executed once,
presumably by the first thread that gets here? I think this could also
be handled by a if statement, perhaps "if parallel.first()" or
something like that. Is there anything special about this construct
that couldn't simply be done by flushing/checking a variable?

Good point. I think there's a problem with OpenMP that it has too many primitives for similar things.

I'm -1 on single -- either using a for loop or flag+flush is more to type, but more readable to people who don't know cython.parallel (look: Python even makes "self." explicit -- the bias in language design is clearly on readability rather than writability).

I thought of "if is_first()" as well, but my problem is again that it binds to the location of the call.

if foo:
    if parallel.is_first():
        ...
else:
    if parallel.is_first():
        ...

can not be refactored to:

if parallel.is_first():
    if foo:
        ...
    else:
        ...

which I think is highly confusing for people who didn't write the code and don't know the details of cython.parallel. (Unlike is_master(), which works the same either way).

I think we should aim for something that's as easy to read as possible for Python users with no cython.parallel knowledge.


with parallel.task():
create a task to be executed by some thread in the team
once a thread takes up the task it shall only be executed by that
thread and no other thread (so the task will be tied to the thread)

C variables will be firstprivate
Python objects will be shared

parallel.taskwait() # wait on any direct descendent tasks to finish

Regarding tasks, I think this is mapping OpenMP too close to Python.
Closures are excellent for the notion of a task, so I think something
based on the futures API would work better. I realize that makes the
mapping to OpenMP and implementation a bit more difficult, but I think
it is worth it in the long run.

It's almost as if you're reading my thoughts. There are much more
natural task APIs, e.g. futures or the way the Python
threading/multiprocessing does things.

with parallel.critical():
this section of code is mutually exclusive with other critical sections
optional keyword argument 'name' specifies a name for the critical
section,
which means all sections with that name will exclude each other,
but not
critical sections with different names

Note: all threads that encounter the section will execute it, just
not at the same time

Yes, this works well as a with-statement...

..except that it is slightly magic in that it binds to call position (unlike
anything in Python). I.e. this would be more "correct", or at least
Pythonic:

with parallel.critical(__file__, __line__):
    ...

Mark: I stand corrected on this point. +1 on your critical proposal.

This feels a lot like a lock, which of course fits well with the with
statement.

with parallel.barrier():
all threads wait until everyone has reached the barrier
either no one or everyone should encounter the barrier
shared variables are flushed

I have problems with requiring a noop with block...

I'd much rather write

parallel.barrier()

However, that ties a function call to the place of invocation, and suggests
that one could do

if rand()>  .5:
    barrier()
else:
    i += 3
    barrier()

and have the same barrier in each case. Again,

barrier(__file__, __line__)

gets us purity at the cost of practicality. Another way is the pthreads
approach (although one may have to use pthread rather then OpenMP to get it,
unless there are named barriers?):

barrier_a = parallel.barrier()
barrier_b = parallel.barrier()
with parallel:
    barrier_a.wait()
    if rand()>  .5:
        barrier_b.wait()
    else:
        i += 3
        barrier_b.wait()


I'm really not sure here.

I agree, the barrier doesn't seem like it belongs in a context. For
example, it's ambiguous whether the block is supposed to proceed or
succeed the barrier. I like the named barrier idea, but if that's not
feasible we could perhaps use control flow to disallow conditionally
calling barriers (or that every path calls the barrier (an equal
number of times?)).

It is always an option to go beyond OpenMP. Pthread barriers are a lot more powerful in this way, and with pthread and Windows covered I think we should be good...

IIUC, you can't have different path calling the barrier the same number of times, it's merely

#pragma omp barrier

and a seperate barrier statement gets another counter. Which is why I think it is not powerful enough and we should use pthreads.

+1. I like the idea of providing more parallelism constructs, but
rather than risk fixating on OpenMP's model, perhaps we should look at
the problem we're trying to solve (e.g., what can't one do well now)
and create (or more likely borrow) the right Pythonic API to do it.

Also, quick and flexible message-passing between threads/processes through channels is becoming an increasingly popular concept. Go even has a seperate syntax for channel communication, and zeromq is becoming popular for distributed work.

The is a problem Cython may need to solve here, since one currently has to use very low-level C to do it quickly (either zeromq or pthreads in most cases -- I guess, an OpenMP critical section would help in implementing a queue though).

I wouldn't resist a builtin "channel" type in Cython (since we don't have full templating/generics, it would be the only way of sending typed data conveniently?).

I ultimately feel things like that is more important than 100% coverage of the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.

Dag Sverre
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Reply via email to