Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-12 Thread Robert Bradshaw
On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn
 wrote:
> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>
>> On 10/09/2011 02:11 PM, mark florisson wrote:
>>>
>>> Hey,
>>>
>>> So far people have been enthusiastic about the cython.parallel features,
>>> I think we should introduce some new features.

Excellent. I think this is going to become a killer feature like
buffer support.

>>> I propose the following,
>>
>> Great!!
>>
>> I only have time for a very short feedback now, perhaps more will follow.
>>
>>> assume parallel has been imported from cython:
>>>
>>> with parallel.master():
>>> this is executed in the master thread in a parallel (non-prange)
>>> section
>>>
>>> with parallel.single():
>>> same as master, except any thread may do the execution
>>>
>>> An optional keyword argument 'nowait' specifies whether there will be a
>>> barrier at the end. The default is to wait.
>
> I like
>
> if parallel.is_master():
>    ...
> explicit_barrier_somehow() # see below
>
> better as a Pythonization. One could easily support is_master to be used in
> other contexts as well, simply by assigning a status flag in the master
> block.

+1, the if statement feels a lot more natural.

> Using an if-test flows much better with Python I feel, but that naturally
> lead to making the barrier explicit. But I like the barrier always being
> explicit, rather than having it as a predicate on all the different
> constructs like in OpenMP
>
> I'm less sure about single, since making it a function indicates one could
> use it in other contexts and the whole thing becomes too magic (since it's
> tied to the position of invocation). I'm tempted to suggest
>
> for _ in prange(1):
>    ...
>
> as our syntax for single.

The idea here is that you want a block of code executed once,
presumably by the first thread that gets here? I think this could also
be handled by a if statement, perhaps "if parallel.first()" or
something like that. Is there anything special about this construct
that couldn't simply be done by flushing/checking a variable?

>>> with parallel.task():
>>> create a task to be executed by some thread in the team
>>> once a thread takes up the task it shall only be executed by that
>>> thread and no other thread (so the task will be tied to the thread)
>>>
>>> C variables will be firstprivate
>>> Python objects will be shared
>>>
>>> parallel.taskwait() # wait on any direct descendent tasks to finish
>>
>> Regarding tasks, I think this is mapping OpenMP too close to Python.
>> Closures are excellent for the notion of a task, so I think something
>> based on the futures API would work better. I realize that makes the
>> mapping to OpenMP and implementation a bit more difficult, but I think
>> it is worth it in the long run.

It's almost as if you're reading my thoughts. There are much more
natural task APIs, e.g. futures or the way the Python
threading/multiprocessing does things.

>>> with parallel.critical():
>>> this section of code is mutually exclusive with other critical sections
>>> optional keyword argument 'name' specifies a name for the critical
>>> section,
>>> which means all sections with that name will exclude each other,
>>> but not
>>> critical sections with different names
>>>
>>> Note: all threads that encounter the section will execute it, just
>>> not at the same time
>
> Yes, this works well as a with-statement...
>
> ..except that it is slightly magic in that it binds to call position (unlike
> anything in Python). I.e. this would be more "correct", or at least
> Pythonic:
>
> with parallel.critical(__file__, __line__):
>    ...

This feels a lot like a lock, which of course fits well with the with
statement.

>>> with parallel.barrier():
>>> all threads wait until everyone has reached the barrier
>>> either no one or everyone should encounter the barrier
>>> shared variables are flushed
>
> I have problems with requiring a noop with block...
>
> I'd much rather write
>
> parallel.barrier()
>
> However, that ties a function call to the place of invocation, and suggests
> that one could do
>
> if rand() > .5:
>    barrier()
> else:
>    i += 3
>    barrier()
>
> and have the same barrier in each case. Again,
>
> barrier(__file__, __line__)
>
> gets us purity at the cost of practicality. Another way is the pthreads
> approach (although one may have to use pthread rather then OpenMP to get it,
> unless there are named barriers?):
>
> barrier_a = parallel.barrier()
> barrier_b = parallel.barrier()
> with parallel:
>    barrier_a.wait()
>    if rand() > .5:
>        barrier_b.wait()
>    else:
>        i += 3
>        barrier_b.wait()
>
>
> I'm really not sure here.

I agree, the barrier doesn't seem like it belongs in a context. For
example, it's ambiguous whether the block is supposed to proceed or
succeed the barrier. I like the named barrier idea, but if that's not
feasible we could perhaps use control flow to disallow conditionally
calling barriers (or that every path calls the barr

Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-12 Thread Robert Bradshaw
On Mon, Oct 10, 2011 at 1:12 AM, mark florisson
 wrote:
> On 9 October 2011 22:27, mark florisson  wrote:
>>
>> On 9 October 2011 21:48, Jon Olav Vik  wrote:
>> > On Sun, Oct 9, 2011 at 9:01 PM, mark florisson
>> >  wrote:
>> >> On 9 October 2011 19:54, Jon Olav Vik  wrote:
>> >>> Personally, I think I'd prefer context managers as a very
>> >>> readable way to deal with parallelism
>> >>
>> >> Yeah it makes a lot of sense for mutual exclusion, but 'master' really
>> >> means "only the master thread executes this peace of code, even though
>> >> other threads encounter the same code", which is more akin to 'if'
>> >> than 'with'.
>> >
>> > I see your point. However, another similarity with "with" statements
>> > as an encapsulated "try..finally" is when there's a barrier at the end
>> > of the block. I can live with some magic if it saves me from having a
>> > boilerplate line of "barrier" everywhere 8-)
>> > ___
>> > cython-devel mailing list
>> > cython-devel@python.org
>> > http://mail.python.org/mailman/listinfo/cython-devel
>> >
>>
>> Hm, indeed. I just noticed that unlike single constructs, master
>> constructs don't have barriers. Both are also not allowed to be
>> closely nested in worksharing constructs. I think the single directive
>> is more useful with respect to tasks, e.g. have a single thread
>> generate tasks and have other threads waiting at the barrier execute
>> them. In that sense I suppose 'if parallel.is_master():' makes sense
>> (no barrier, master thread) and 'with single():' (with barrier, any
>> thread).
>>
>> We could still support single in prange though, if we simply have the
>> master thread execute it ('if (omp_get_thread_num() == 0)') and put a
>> barrier after the block. This makes me wonder what the point of master
>> was supposed to be...
>
> Scratch that last part about master/single in parallel sections, it
> doesn't make sense. It only makes sense if you think of those sections
> as tasks you submit that would be immediately taken up by a (certain)
> thread. But that's not quite what it means. I do like 'if is_master()'
> and 'with single', though.
>
> Another thing we could support is arbitrary reductions. In OpenMP 3.1
> you get reduction operators 'and', 'max' and 'min', but it wouldn't be
> hard to support arbitrary user functions. e.g.
>
> @cython.reduction
> cdef int func(int a, int b):
>    ...
>
> for i in prange(...):
>    a = func(a, b)

Interesting idea. An alternative syntax could be

a = cython.parallel.reduce(func, a, b)

> I'm not sure how common this is though. You probably have your
> reduction data in an array so you're already using numpy so you'll
> likely already have your functionality.
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-12 Thread Dag Sverre Seljebotn

On 10/12/2011 09:55 AM, Robert Bradshaw wrote:

On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn
  wrote:

On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:


On 10/09/2011 02:11 PM, mark florisson wrote:


Hey,

So far people have been enthusiastic about the cython.parallel features,
I think we should introduce some new features.


Excellent. I think this is going to become a killer feature like
buffer support.


I propose the following,


Great!!

I only have time for a very short feedback now, perhaps more will follow.


assume parallel has been imported from cython:

with parallel.master():
this is executed in the master thread in a parallel (non-prange)
section

with parallel.single():
same as master, except any thread may do the execution

An optional keyword argument 'nowait' specifies whether there will be a
barrier at the end. The default is to wait.


I like

if parallel.is_master():
...
explicit_barrier_somehow() # see below

better as a Pythonization. One could easily support is_master to be used in
other contexts as well, simply by assigning a status flag in the master
block.


+1, the if statement feels a lot more natural.


Using an if-test flows much better with Python I feel, but that naturally
lead to making the barrier explicit. But I like the barrier always being
explicit, rather than having it as a predicate on all the different
constructs like in OpenMP

I'm less sure about single, since making it a function indicates one could
use it in other contexts and the whole thing becomes too magic (since it's
tied to the position of invocation). I'm tempted to suggest

for _ in prange(1):
...

as our syntax for single.


Just to be clear: My point was that the above implements single 
behaviour even now, without any extra effort.




The idea here is that you want a block of code executed once,
presumably by the first thread that gets here? I think this could also
be handled by a if statement, perhaps "if parallel.first()" or
something like that. Is there anything special about this construct
that couldn't simply be done by flushing/checking a variable?


Good point. I think there's a problem with OpenMP that it has too many 
primitives for similar things.


I'm -1 on single -- either using a for loop or flag+flush is more to 
type, but more readable to people who don't know cython.parallel (look: 
Python even makes "self." explicit -- the bias in language design is 
clearly on readability rather than writability).


I thought of "if is_first()" as well, but my problem is again that it 
binds to the location of the call.


if foo:
if parallel.is_first():
...
else:
if parallel.is_first():
...

can not be refactored to:

if parallel.is_first():
if foo:
...
else:
...

which I think is highly confusing for people who didn't write the code 
and don't know the details of cython.parallel. (Unlike is_master(), 
which works the same either way).


I think we should aim for something that's as easy to read as possible 
for Python users with no cython.parallel knowledge.





with parallel.task():
create a task to be executed by some thread in the team
once a thread takes up the task it shall only be executed by that
thread and no other thread (so the task will be tied to the thread)

C variables will be firstprivate
Python objects will be shared

parallel.taskwait() # wait on any direct descendent tasks to finish


Regarding tasks, I think this is mapping OpenMP too close to Python.
Closures are excellent for the notion of a task, so I think something
based on the futures API would work better. I realize that makes the
mapping to OpenMP and implementation a bit more difficult, but I think
it is worth it in the long run.


It's almost as if you're reading my thoughts. There are much more
natural task APIs, e.g. futures or the way the Python
threading/multiprocessing does things.


with parallel.critical():
this section of code is mutually exclusive with other critical sections
optional keyword argument 'name' specifies a name for the critical
section,
which means all sections with that name will exclude each other,
but not
critical sections with different names

Note: all threads that encounter the section will execute it, just
not at the same time


Yes, this works well as a with-statement...

..except that it is slightly magic in that it binds to call position (unlike
anything in Python). I.e. this would be more "correct", or at least
Pythonic:

with parallel.critical(__file__, __line__):
...


Mark: I stand corrected on this point. +1 on your critical proposal.


This feels a lot like a lock, which of course fits well with the with
statement.


with parallel.barrier():
all threads wait until everyone has reached the barrier
either no one or everyone should encounter the barrier
shared variables are flushed


I have problems with requiring a noop with block...

I'd much rather write

parallel.barrier()

However, that ties a functio

Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-12 Thread Dag Sverre Seljebotn

On 10/12/2011 10:36 AM, Dag Sverre Seljebotn wrote:

On 10/12/2011 09:55 AM, Robert Bradshaw wrote:

On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn
 wrote:

On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:


On 10/09/2011 02:11 PM, mark florisson wrote:

with parallel.critical():
this section of code is mutually exclusive with other critical
sections
optional keyword argument 'name' specifies a name for the critical
section,
which means all sections with that name will exclude each other,
but not
critical sections with different names

Note: all threads that encounter the section will execute it, just
not at the same time




On critical sections, I do feel string naming is rather un-Pythonic. I'd 
rather have


lock_a = parallel.Mutex()
lock_b = parallel.Mutex()
with cython.parallel:
with lock_a:
...
with lock_b:
...

This maps well to pthread mutexes, though much harder to map it to OpenMP...

So my proposal is:

 a) parallel.Mutex() can take a string argument and then returns the 
same mutex each time for the same string, meaning you can do


with parallel.Mutex("somename"):

which maps directly to OpenMP.

 b) However, this does not make sense:

with parallel.Mutex():

because each thread would instantiate a *seperate* mutex. So raise 
compiler error ("Redundant code, thread will never block on fresh mutex")


 c) However, one can use a default global Mutex instance:

with parallel.global_mutex

(mapping to an un-named critical in OpenMP)

This seems to be simple enough to implement, and allows generalizing to 
the advanced case above later (probably using pthreads/Windows directly).


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-12 Thread Robert Bradshaw
On Wed, Oct 12, 2011 at 1:36 AM, Dag Sverre Seljebotn
 wrote:
> On 10/12/2011 09:55 AM, Robert Bradshaw wrote:
>>> I'm less sure about single, since making it a function indicates one
>>> could
>>> use it in other contexts and the whole thing becomes too magic (since
>>> it's
>>> tied to the position of invocation). I'm tempted to suggest
>>>
>>> for _ in prange(1):
>>>    ...
>>>
>>> as our syntax for single.
>
> Just to be clear: My point was that the above implements single behaviour
> even now, without any extra effort.
>
>>
>> The idea here is that you want a block of code executed once,
>> presumably by the first thread that gets here? I think this could also
>> be handled by a if statement, perhaps "if parallel.first()" or
>> something like that. Is there anything special about this construct
>> that couldn't simply be done by flushing/checking a variable?
>
> Good point. I think there's a problem with OpenMP that it has too many
> primitives for similar things.
>
> I'm -1 on single -- either using a for loop or flag+flush is more to type,
> but more readable to people who don't know cython.parallel (look: Python
> even makes "self." explicit -- the bias in language design is clearly on
> readability rather than writability).
>
> I thought of "if is_first()" as well, but my problem is again that it binds
> to the location of the call.
>
> if foo:
>    if parallel.is_first():
>        ...
> else:
>    if parallel.is_first():
>        ...
>
> can not be refactored to:
>
> if parallel.is_first():
>    if foo:
>        ...
>    else:
>        ...
>
> which I think is highly confusing for people who didn't write the code and
> don't know the details of cython.parallel. (Unlike is_master(), which works
> the same either way).
>
> I think we should aim for something that's as easy to read as possible for
> Python users with no cython.parallel knowledge.

Exactly. This is what's so beautiful about prange.

> with parallel.barrier():
> all threads wait until everyone has reached the barrier
> either no one or everyone should encounter the barrier
> shared variables are flushed
>>>
>>> I have problems with requiring a noop with block...
>>>
>>> I'd much rather write
>>>
>>> parallel.barrier()
>>>
>>> However, that ties a function call to the place of invocation, and
>>> suggests
>>> that one could do
>>>
>>> if rand()>  .5:
>>>    barrier()
>>> else:
>>>    i += 3
>>>    barrier()
>>>
>>> and have the same barrier in each case. Again,
>>>
>>> barrier(__file__, __line__)
>>>
>>> gets us purity at the cost of practicality. Another way is the pthreads
>>> approach (although one may have to use pthread rather then OpenMP to get
>>> it,
>>> unless there are named barriers?):
>>>
>>> barrier_a = parallel.barrier()
>>> barrier_b = parallel.barrier()
>>> with parallel:
>>>    barrier_a.wait()
>>>    if rand()>  .5:
>>>        barrier_b.wait()
>>>    else:
>>>        i += 3
>>>        barrier_b.wait()
>>>
>>>
>>> I'm really not sure here.
>>
>> I agree, the barrier doesn't seem like it belongs in a context. For
>> example, it's ambiguous whether the block is supposed to proceed or
>> succeed the barrier. I like the named barrier idea, but if that's not
>> feasible we could perhaps use control flow to disallow conditionally
>> calling barriers (or that every path calls the barrier (an equal
>> number of times?)).
>
> It is always an option to go beyond OpenMP. Pthread barriers are a lot more
> powerful in this way, and with pthread and Windows covered I think we should
> be good...
>
> IIUC, you can't have different path calling the barrier the same number of
> times, it's merely
>
> #pragma omp barrier
>
> and a seperate barrier statement gets another counter.

Makes sense, but this greatly restricts where we could use the OpenMP version.

> Which is why I think
> it is not powerful enough and we should use pthreads.
>
>> +1. I like the idea of providing more parallelism constructs, but
>> rather than risk fixating on OpenMP's model, perhaps we should look at
>> the problem we're trying to solve (e.g., what can't one do well now)
>> and create (or more likely borrow) the right Pythonic API to do it.
>
> Also, quick and flexible message-passing between threads/processes through
> channels is becoming an increasingly popular concept. Go even has a seperate
> syntax for channel communication, and zeromq is becoming popular for
> distributed work.
>
> The is a problem Cython may need to solve here, since one currently has to
> use very low-level C to do it quickly (either zeromq or pthreads in most
> cases -- I guess, an OpenMP critical section would help in implementing a
> queue though).
>
> I wouldn't resist a builtin "channel" type in Cython (since we don't have
> full templating/generics, it would be the only way of sending typed data
> conveniently?).

zeromq seems to be a nice level of abstraction--we could probably get
far with a zeromq "overlay" module that didn't require the GIL. Or is
the C API 

Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-12 Thread Robert Bradshaw
On Wed, Oct 12, 2011 at 1:49 AM, Dag Sverre Seljebotn
 wrote:
> On 10/12/2011 10:36 AM, Dag Sverre Seljebotn wrote:
>>
>> On 10/12/2011 09:55 AM, Robert Bradshaw wrote:
>>>
>>> On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn
>>>  wrote:

 On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>
> On 10/09/2011 02:11 PM, mark florisson wrote:
>>
>> with parallel.critical():
>> this section of code is mutually exclusive with other critical
>> sections
>> optional keyword argument 'name' specifies a name for the critical
>> section,
>> which means all sections with that name will exclude each other,
>> but not
>> critical sections with different names
>>
>> Note: all threads that encounter the section will execute it, just
>> not at the same time

>
> On critical sections, I do feel string naming is rather un-Pythonic. I'd
> rather have
>
> lock_a = parallel.Mutex()
> lock_b = parallel.Mutex()
> with cython.parallel:
>    with lock_a:
>        ...
>    with lock_b:
>        ...
>
> This maps well to pthread mutexes, though much harder to map it to OpenMP...

For this low level, perhaps people should just be using the pthreads
library directly? Here I'm showing my ignorance: can that work with
OpenMP spawned threads? (Maybe a compatibility layer is required for
transparent Windows support.) Suppose one could write a context object
that did not require the GIL, then one could do

with MyContext():
   ...

in a nogil block, MyContext could be implemented by whoever on
whatever thread library, no special language support required.

> So my proposal is:
>
>  a) parallel.Mutex() can take a string argument and then returns the same
> mutex each time for the same string, meaning you can do
>
> with parallel.Mutex("somename"):
>
> which maps directly to OpenMP.
>
>  b) However, this does not make sense:
>
> with parallel.Mutex():
>
> because each thread would instantiate a *seperate* mutex. So raise compiler
> error ("Redundant code, thread will never block on fresh mutex")
>
>  c) However, one can use a default global Mutex instance:
>
> with parallel.global_mutex
>
> (mapping to an un-named critical in OpenMP)
>
> This seems to be simple enough to implement, and allows generalizing to the
> advanced case above later (probably using pthreads/Windows directly).

Alternatively, let parallel.Mutex() be the global mutex, with some
other way of getting a new, unique mutex to pass around and use in
multiple places.

- Robert
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-12 Thread Dag Sverre Seljebotn

On 10/12/2011 11:08 AM, Robert Bradshaw wrote:

On Wed, Oct 12, 2011 at 1:36 AM, Dag Sverre Seljebotn

I wouldn't resist a builtin "channel" type in Cython (since we don't have
full templating/generics, it would be the only way of sending typed data
conveniently?).


zeromq seems to be a nice level of abstraction--we could probably get
far with a zeromq "overlay" module that didn't require the GIL. Or is
the C API easy enough to use if we could provide convenient mechanisms
to initialize the tasks/threads. I think perhaps the communication
model could be solved by a library more easily than the treading
model.


Ah, zeromq even has an in-process transport, so should work nicely for 
multithreading as well.


The main problem is that I'd like something like

ctypedef struct Msg:
int what
double when

cdef Msg msg
cdef channel[Msg] mychan = channel[msg](blocking=True, in_process=True)
with cython.parallel:
...
if is_master():
mychan.send(what=1, when=2.3)
else:
msg = mychan.recv()


Which one can't really do without either builtin support or templating 
support. One *could* implement it in C++...


C-level API just sends char* around, e.g.,

int zmq_msg_init_data (zmq_msg_t *msg, void *data, size_t size, 
zmq_free_fn *ffn, void *hint);


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Utilities, cython.h, libcython

2011-10-12 Thread Stefan Behnel

mark florisson, 06.10.2011 11:45:

On 6 October 2011 01:05, Robert Bradshaw wrote:

I'm not sure what the overhead is, if any, in calling function pointers vs.
actually linking things together at the C level (which is essentially the
same idea, but perhaps addresses are resolved at library load time rather
than requiring a dereference on each call?)


I think there isn't any difference with dynamic linking and having a
pointer. My understanding (of ELF shared libraries) is that the
procedure lookup table will contain the actual address of the symbol
(likely after the first reference to it has been made, it may have a
stub that resolves the symbol and replaces it's own address with the
actual address), which to me sounds like the same thing as a pointer.
I think only static linking can prevent this, i.e. directly encode the
static address into the call opcode, but I'm not an expert.


Even if it makes a slight difference that the CPU's branch prediction 
cannot cope with, it's still up to us to decide which code must be inside 
the module for performance reasons and which we can afford to move outside. 
Generally speaking, any code section that is large enough to be worth being 
moved into a separate library shouldn't notice any performance difference 
through an indirect call.


Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-12 Thread mark florisson
On 12 October 2011 10:20, Robert Bradshaw  wrote:
> On Wed, Oct 12, 2011 at 1:49 AM, Dag Sverre Seljebotn
>  wrote:
>> On 10/12/2011 10:36 AM, Dag Sverre Seljebotn wrote:
>>>
>>> On 10/12/2011 09:55 AM, Robert Bradshaw wrote:

 On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn
  wrote:
>
> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>
>> On 10/09/2011 02:11 PM, mark florisson wrote:
>>>
>>> with parallel.critical():
>>> this section of code is mutually exclusive with other critical
>>> sections
>>> optional keyword argument 'name' specifies a name for the critical
>>> section,
>>> which means all sections with that name will exclude each other,
>>> but not
>>> critical sections with different names
>>>
>>> Note: all threads that encounter the section will execute it, just
>>> not at the same time
>
>>
>> On critical sections, I do feel string naming is rather un-Pythonic. I'd
>> rather have
>>
>> lock_a = parallel.Mutex()
>> lock_b = parallel.Mutex()
>> with cython.parallel:
>>    with lock_a:
>>        ...
>>    with lock_b:
>>        ...
>>
>> This maps well to pthread mutexes, though much harder to map it to OpenMP...
>
> For this low level, perhaps people should just be using the pthreads
> library directly? Here I'm showing my ignorance: can that work with
> OpenMP spawned threads? (Maybe a compatibility layer is required for
> transparent Windows support.) Suppose one could write a context object
> that did not require the GIL, then one could do
>
> with MyContext():
>   ...
>
> in a nogil block, MyContext could be implemented by whoever on
> whatever thread library, no special language support required.

Exactly, that's always possible. I myself very much like how critical
works, but if you want a more Pythonic-looking mutex, it might be
better to make that the user's burden. Otherwise we'd also have to
give it a type, make it compatible with code that doesn't have the
GIL, acquisition count it when passing it around, etc.

If your program doesn't even have other Python threads running, you
could even use 'with gil:' as a global synchronization.

The only good thing about named and unnamed critical sections is
really the convenience of writing it, and the resulting conciseness
(which imho, if you know how critical works, only adds to the code
readability).

However, not providing parallel.Mutex would mean people probably want
to resort to the goodies from the threading module, which would
ironically not be impossible because you'd need to GIL to use them :)
But we could recommend the PyThread_*_lock stuff in the documentation.

>> So my proposal is:
>>
>>  a) parallel.Mutex() can take a string argument and then returns the same
>> mutex each time for the same string, meaning you can do
>>
>> with parallel.Mutex("somename"):
>>
>> which maps directly to OpenMP.
>>
>>  b) However, this does not make sense:
>>
>> with parallel.Mutex():
>>
>> because each thread would instantiate a *seperate* mutex. So raise compiler
>> error ("Redundant code, thread will never block on fresh mutex")
>>
>>  c) However, one can use a default global Mutex instance:
>>
>> with parallel.global_mutex
>>
>> (mapping to an un-named critical in OpenMP)
>>
>> This seems to be simple enough to implement, and allows generalizing to the
>> advanced case above later (probably using pthreads/Windows directly).
>
> Alternatively, let parallel.Mutex() be the global mutex, with some
> other way of getting a new, unique mutex to pass around and use in
> multiple places.
>
> - Robert
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-12 Thread mark florisson
On 12 October 2011 09:36, Dag Sverre Seljebotn
 wrote:
> On 10/12/2011 09:55 AM, Robert Bradshaw wrote:
>>
>> On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn
>>   wrote:
>>>
>>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:

 On 10/09/2011 02:11 PM, mark florisson wrote:
>
> Hey,
>
> So far people have been enthusiastic about the cython.parallel
> features,
> I think we should introduce some new features.
>>
>> Excellent. I think this is going to become a killer feature like
>> buffer support.
>>
> I propose the following,

 Great!!

 I only have time for a very short feedback now, perhaps more will
 follow.

> assume parallel has been imported from cython:
>
> with parallel.master():
> this is executed in the master thread in a parallel (non-prange)
> section
>
> with parallel.single():
> same as master, except any thread may do the execution
>
> An optional keyword argument 'nowait' specifies whether there will be a
> barrier at the end. The default is to wait.
>>>
>>> I like
>>>
>>> if parallel.is_master():
>>>    ...
>>> explicit_barrier_somehow() # see below
>>>
>>> better as a Pythonization. One could easily support is_master to be used
>>> in
>>> other contexts as well, simply by assigning a status flag in the master
>>> block.
>>
>> +1, the if statement feels a lot more natural.
>>
>>> Using an if-test flows much better with Python I feel, but that naturally
>>> lead to making the barrier explicit. But I like the barrier always being
>>> explicit, rather than having it as a predicate on all the different
>>> constructs like in OpenMP
>>>
>>> I'm less sure about single, since making it a function indicates one
>>> could
>>> use it in other contexts and the whole thing becomes too magic (since
>>> it's
>>> tied to the position of invocation). I'm tempted to suggest
>>>
>>> for _ in prange(1):
>>>    ...
>>>
>>> as our syntax for single.
>
> Just to be clear: My point was that the above implements single behaviour
> even now, without any extra effort.

Right I got that. In the same way you could use

for _ in prange(0): pass

to get a barrier. I'm just saying that it looks pretty weird.

>>
>> The idea here is that you want a block of code executed once,
>> presumably by the first thread that gets here? I think this could also
>> be handled by a if statement, perhaps "if parallel.first()" or
>> something like that. Is there anything special about this construct
>> that couldn't simply be done by flushing/checking a variable?
>
> Good point. I think there's a problem with OpenMP that it has too many
> primitives for similar things.

Definitely.

> I'm -1 on single -- either using a for loop or flag+flush is more to type,
> but more readable to people who don't know cython.parallel (look: Python
> even makes "self." explicit -- the bias in language design is clearly on
> readability rather than writability).
>
> I thought of "if is_first()" as well, but my problem is again that it binds
> to the location of the call.
>
> if foo:
>    if parallel.is_first():
>        ...
> else:
>    if parallel.is_first():
>        ...
>
> can not be refactored to:
>
> if parallel.is_first():
>    if foo:
>        ...
>    else:
>        ...
>
> which I think is highly confusing for people who didn't write the code and
> don't know the details of cython.parallel. (Unlike is_master(), which works
> the same either way).
>
> I think we should aim for something that's as easy to read as possible for
> Python users with no cython.parallel knowledge.

That's a good point. I suppose single and master is not really needed,
so just master ("is_master") could be sufficient there.

>>
> with parallel.task():
> create a task to be executed by some thread in the team
> once a thread takes up the task it shall only be executed by that
> thread and no other thread (so the task will be tied to the thread)
>
> C variables will be firstprivate
> Python objects will be shared
>
> parallel.taskwait() # wait on any direct descendent tasks to finish

 Regarding tasks, I think this is mapping OpenMP too close to Python.
 Closures are excellent for the notion of a task, so I think something
 based on the futures API would work better. I realize that makes the
 mapping to OpenMP and implementation a bit more difficult, but I think
 it is worth it in the long run.
>>
>> It's almost as if you're reading my thoughts. There are much more
>> natural task APIs, e.g. futures or the way the Python
>> threading/multiprocessing does things.
>>
> with parallel.critical():
> this section of code is mutually exclusive with other critical sections
> optional keyword argument 'name' specifies a name for the critical
> section,
> which means all sections with that name will exclude each other,
> but not
> critical sections with different names
>
> 

Re: [Cython] cython.parallel tasks, single, master, critical, barriers

2011-10-12 Thread mark florisson
On 12 October 2011 10:08, Robert Bradshaw  wrote:
> On Wed, Oct 12, 2011 at 1:36 AM, Dag Sverre Seljebotn
>  wrote:
>> On 10/12/2011 09:55 AM, Robert Bradshaw wrote:
 I'm less sure about single, since making it a function indicates one
 could
 use it in other contexts and the whole thing becomes too magic (since
 it's
 tied to the position of invocation). I'm tempted to suggest

 for _ in prange(1):
    ...

 as our syntax for single.
>>
>> Just to be clear: My point was that the above implements single behaviour
>> even now, without any extra effort.
>>
>>>
>>> The idea here is that you want a block of code executed once,
>>> presumably by the first thread that gets here? I think this could also
>>> be handled by a if statement, perhaps "if parallel.first()" or
>>> something like that. Is there anything special about this construct
>>> that couldn't simply be done by flushing/checking a variable?
>>
>> Good point. I think there's a problem with OpenMP that it has too many
>> primitives for similar things.
>>
>> I'm -1 on single -- either using a for loop or flag+flush is more to type,
>> but more readable to people who don't know cython.parallel (look: Python
>> even makes "self." explicit -- the bias in language design is clearly on
>> readability rather than writability).
>>
>> I thought of "if is_first()" as well, but my problem is again that it binds
>> to the location of the call.
>>
>> if foo:
>>    if parallel.is_first():
>>        ...
>> else:
>>    if parallel.is_first():
>>        ...
>>
>> can not be refactored to:
>>
>> if parallel.is_first():
>>    if foo:
>>        ...
>>    else:
>>        ...
>>
>> which I think is highly confusing for people who didn't write the code and
>> don't know the details of cython.parallel. (Unlike is_master(), which works
>> the same either way).
>>
>> I think we should aim for something that's as easy to read as possible for
>> Python users with no cython.parallel knowledge.
>
> Exactly. This is what's so beautiful about prange.
>
>> with parallel.barrier():
>> all threads wait until everyone has reached the barrier
>> either no one or everyone should encounter the barrier
>> shared variables are flushed

 I have problems with requiring a noop with block...

 I'd much rather write

 parallel.barrier()

 However, that ties a function call to the place of invocation, and
 suggests
 that one could do

 if rand()>  .5:
    barrier()
 else:
    i += 3
    barrier()

 and have the same barrier in each case. Again,

 barrier(__file__, __line__)

 gets us purity at the cost of practicality. Another way is the pthreads
 approach (although one may have to use pthread rather then OpenMP to get
 it,
 unless there are named barriers?):

 barrier_a = parallel.barrier()
 barrier_b = parallel.barrier()
 with parallel:
    barrier_a.wait()
    if rand()>  .5:
        barrier_b.wait()
    else:
        i += 3
        barrier_b.wait()


 I'm really not sure here.
>>>
>>> I agree, the barrier doesn't seem like it belongs in a context. For
>>> example, it's ambiguous whether the block is supposed to proceed or
>>> succeed the barrier. I like the named barrier idea, but if that's not
>>> feasible we could perhaps use control flow to disallow conditionally
>>> calling barriers (or that every path calls the barrier (an equal
>>> number of times?)).
>>
>> It is always an option to go beyond OpenMP. Pthread barriers are a lot more
>> powerful in this way, and with pthread and Windows covered I think we should
>> be good...
>>
>> IIUC, you can't have different path calling the barrier the same number of
>> times, it's merely
>>
>> #pragma omp barrier
>>
>> and a seperate barrier statement gets another counter.
>
> Makes sense, but this greatly restricts where we could use the OpenMP version.
>
>> Which is why I think
>> it is not powerful enough and we should use pthreads.
>>
>>> +1. I like the idea of providing more parallelism constructs, but
>>> rather than risk fixating on OpenMP's model, perhaps we should look at
>>> the problem we're trying to solve (e.g., what can't one do well now)
>>> and create (or more likely borrow) the right Pythonic API to do it.
>>
>> Also, quick and flexible message-passing between threads/processes through
>> channels is becoming an increasingly popular concept. Go even has a seperate
>> syntax for channel communication, and zeromq is becoming popular for
>> distributed work.
>>
>> The is a problem Cython may need to solve here, since one currently has to
>> use very low-level C to do it quickly (either zeromq or pthreads in most
>> cases -- I guess, an OpenMP critical section would help in implementing a
>> queue though).
>>
>> I wouldn't resist a builtin "channel" type in Cython (since we don't have
>> full templating/generics, it would be the only

Re: [Cython] PyCon-DE wrap-up by Kay Hayen

2011-10-12 Thread Stefan Behnel

Robert Bradshaw, 11.10.2011 08:11:

Thanks for the update and link. Sounds like PyCon-DE went well.


More than that - here's my take on it:

http://blog.behnel.de/index.php?p=188

Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] test failure for cython-devel in Py2.4

2011-10-12 Thread Stefan Behnel

mark florisson, 12.10.2011 23:46:

On 10 October 2011 16:17, Stefan Behnel wrote:

Jenkins currently reports several failures, and this one seems to be
due to your tempita changes:


https://sage.math.washington.edu:8091/hudson/view/cython-devel/job/cython-devel-lxml-trunk/PYVERSION=py24/31/console

Thanks! I'll try to fix that somewhere this week.


We should really get to the habit of not pushing changes to the master 
branch that turn out to be broken in the personal branches, or, if they 
appear to be ok and only turn out to break the master branch *after* 
pushing them (which is ok, we have Jenkins to tell us), revert them if a 
fix cannot be applied shortly, i.e. within a day or two at most.


It's very annoying when the master branch is broken for weeks in a row, 
especially since that means that it will keep attracting new failures due 
to the cover of already broken tests, which makes it much harder to 
pinpoint the commits that triggered them.




Is it me or are other builds broken as well?

I pushed a fix for the tempita thing, but it seems the entire py3k build is
broken:

https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console


It's not only the py3k tests, the build is broken in general. The problem 
here is that it only *shows* in the py3k tests because the Py2 builds do 
not bail out when one of the Cython modules fails to build. That needs 
fixing as well.




I just cannot reproduce that error on my system, let me investigate it
further.


My guess was that it's due to the innocent looking change that Robert did 
to enable type inference for the GeneralCallNode. It seems that there was a 
bit more to do here.


Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] test failure for cython-devel in Py2.4

2011-10-12 Thread Stefan Behnel

Stefan Behnel, 13.10.2011 07:10:

mark florisson, 12.10.2011 23:46:

Is it me or are other builds broken as well?

I pushed a fix for the tempita thing, but it seems the entire py3k build is
broken:

https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console



It's not only the py3k tests, the build is broken in general. The problem
here is that it only *shows* in the py3k tests because the Py2 builds do
not bail out when one of the Cython modules fails to build. That needs
fixing as well.



I just cannot reproduce that error on my system, let me investigate it
further.


My guess was that it's due to the innocent looking change that Robert did
to enable type inference for the GeneralCallNode. It seems that there was a
bit more to do here.


Now that I think about it - remember that the Jenkins builds use a source 
distribution to build, not a plain checkout. Maybe there's something wrong 
with the sdist? At least, I see several warnings about file patterns in 
MANIFEST.in that are not matched by any files:


"""
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.pyx' under directory 
'Cython/Debugger/Tests'
warning: no files found matching '*.pxd' under directory 
'Cython/Debugger/Tests'

warning: no files found matching '*.h' under directory 'Cython/Debugger/Tests'
warning: no files found matching '*.pxd' under directory 'Cython/Utility'
warning: no files found matching '*.h' under directory 'Cython/Utility'
warning: no files found matching '.cpp' under directory 'Cython/Utility'
"""

https://sage.math.washington.edu:8091/hudson/job/cython-devel-sdist/678/console

Also note that the build appears to choke on test utility code:

"""
Error compiling Cython file:

...
cdef extern from *:
cdef object __pyx_test_dep(object)

@cname('__pyx_TestClass')
cdef class TestClass(object):
cdef public int value
   ^


TestClass:9:20: Compiler crash in AnalyseDeclarationsTransform
"""

https://sage.math.washington.edu:8091/hudson/job/cython-devel-build/56/PYVERSION=py3k/console

Mark, didn't you disable the loading of any test code during 'normal' 
builds? Maybe there's something broken on that front?


Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] test failure for cython-devel in Py2.4

2011-10-12 Thread Vitja Makarov
2011/10/13 Stefan Behnel :
> mark florisson, 12.10.2011 23:46:
>
> On 10 October 2011 16:17, Stefan Behnel wrote:
>>
>> Jenkins currently reports several failures, and this one seems to be
>> due to your tempita changes:
>>
>
> https://sage.math.washington.edu:8091/hudson/view/cython-devel/job/cython-devel-lxml-trunk/PYVERSION=py24/31/console
>
> Thanks! I'll try to fix that somewhere this week.
>
> We should really get to the habit of not pushing changes to the master
> branch that turn out to be broken in the personal branches, or, if they
> appear to be ok and only turn out to break the master branch *after* pushing
> them (which is ok, we have Jenkins to tell us), revert them if a fix cannot
> be applied shortly, i.e. within a day or two at most.
>
> It's very annoying when the master branch is broken for weeks in a row,
> especially since that means that it will keep attracting new failures due to
> the cover of already broken tests, which makes it much harder to pinpoint
> the commits that triggered them.
>

+1

>
>>> Is it me or are other builds broken as well?
>>>
>>> I pushed a fix for the tempita thing, but it seems the entire py3k build
>>> is
>>> broken:
>>>
>>>
>>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console
>
> It's not only the py3k tests, the build is broken in general. The problem
> here is that it only *shows* in the py3k tests because the Py2 builds do not
> bail out when one of the Cython modules fails to build. That needs fixing as
> well.
>
>
>> I just cannot reproduce that error on my system, let me investigate it
>> further.
>
> My guess was that it's due to the innocent looking change that Robert did to
> enable type inference for the GeneralCallNode. It seems that there was a bit
> more to do here.
>

I found that tempita bug goes away if you change language_level to 2.


-- 
vitja.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel