Re: [Cython] OpenMP support

mark florisson Sat, 12 Mar 2011 04:12:22 -0800

On 11 March 2011 08:56, Dag Sverre Seljebotn <d.s.seljeb...@astro.uio.no> wrote:
> On 03/11/2011 08:20 AM, Stefan Behnel wrote:
>>
>> Robert Bradshaw, 11.03.2011 01:46:
>>>
>>> On Tue, Mar 8, 2011 at 11:16 AM, Francesc Alted<fal...@pytables.org>
>>>  wrote:
>>>>
>>>> A Tuesday 08 March 2011 18:50:15 Stefan Behnel escrigué:
>>>>>
>>>>> mark florisson, 08.03.2011 18:00:
>>>>>>
>>>>>> What I meant was that the
>>>>>> wrapper returned by the decorator would have to call the closure
>>>>>> for every iteration, which introduces function call overhead.
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> I guess we just have to establish what we want to do: do we
>>>>>> want to support code with Python objects (and exceptions etc), or
>>>>>> just C code written in Cython?
>>>>>
>>>>> I like the approach that Sturla mentioned: using closures to
>>>>> implement worker threads. I think that's very pythonic. You could do
>>>>> something like this, for example:
>>>>>
>>>>>      def worker():
>>>>>          for item in queue:
>>>>>              with nogil:
>>>>>                  do_stuff(item)
>>>>>
>>>>>      queue.extend(work_items)
>>>>>      start_threads(worker, count)
>>>>>
>>>>> Note that the queue is only needed to tell the thread what to work
>>>>> on. A lot of things can be shared over the closure. So the queue may
>>>>> not even be required in many cases.
>>>>
>>>> I like this approach too.  I suppose that you will need to annotate the
>>>> items so that they are not Python objects, no?  Something like:
>>>>
>>>>     def worker():
>>>>         cdef int item  # tell that item is not a Python object!
>>>>         for item in queue:
>>>>             with nogil:
>>>>                 do_stuff(item)
>>>>
>>>>     queue.extend(work_items)
>>>>     start_threads(worker, count)
>>>
>>> On a slightly higher level, are we just trying to use OpenMP from
>>> Cython, or are we trying to build it into the language? If the former,
>>> it may make sense to stick closer than one might otherwise be tempted
>>> in terms of API to the underlying C to leverage the existing
>>> documentation. A library with a more Pythonic interface could perhaps
>>> be written on top of that. Alternatively, if we're building it into
>>> Cython itself, I'd it might be worth modeling it after the
>>> multiprocessing module (though I understand it would be implemented
>>> with threads), which I think is a decent enough model for managing
>>> embarrassingly parallel operations.
>>
>> +1
>>
>>
>>> The above code is similar to that,
>>> though I'd prefer the for loop implicit rather than as part of the
>>> worker method (or at least as an argument).
>>
>> It provides a simple way to write per-thread initialisation code, though.
>> And it's likely easier to make looping fast than to speed up the call into a
>> closure. However, eventually, both ways will need to be supported anyway.
>>
>>
>>> If we went this route,
>>> what are the advantages of using OpenMP over, say, pthreads in the
>>> background? (And could the latter be done with just a library + some
>>> fancy GIL specifications?)
>>
>> In the above example, basically everything is explicit and nothing more
>> than a simplified threading setup is needed. Even the implementation of
>> "start_threads()" could be done in a couple of lines of Python code,
>> including the collection of results and errors. If someone thinks we need
>> more than that, I'd like to see a couple of concrete use cases and code
>> examples first.
>>
>>
>>> One thing that's nice about OpenMP as
>>> implemented in C is that the serial code looks almost exactly like the
>>> parallel code; the code at http://wiki.cython.org/enhancements/openmp
>>> has this property too.
>>
>> Writing it with a closure isn't really that much different. You can put
>> the inner function right where it would normally get executed and add a bit
>> of calling/load distributing code below it. Not that bad IMO.
>>
>> It may be worth providing some ready-to-use decorators to do the load
>> balancing, but I don't really like the idea of having a decorator magically
>> invoke the function in-place that it decorates.
>>
>>
>>> Also, I like the idea of being able to hold the GIL by the invoking
>>> thread and having the "sharing" threads do the appropriate locking
>>> among themselves when needed if possible, e.g. for exception raising.
>>
>> I like the explicit "with nogil" block in my example above. It makes it
>> easy to use normal Python setup code, to synchronise based on the GIL if
>> desired (e.g. to use a normal Python queue for communication), and it's
>> simple enough not to get in the way.
>
> I'm supporting Robert here. Basically, I'm +1 to anything that can make me
> pretend the GIL doesn't exist, even if it comes with a 2x performance hit:
> Because that will make me write parallell code (which I can't be bothered to
> do in Cython currently), and I have 4 cores on the laptop I use for
> debugging, so I'd still get a 2x speedup.
>
> Perhaps the long-term solution is something like an "autogil" mode could
> work where Cython automatically releases the GIL on blocks where it can
> (such as a typed for-loop), and acquires it back when needed (an
> exception-raising if-block within said for-loop). And when doing
> multi-threading, GIL-requiring calls are dispatched to a master GIL-holding
> thread (which would not be a worker thread, i.e. on 4 cores you'd have 4
> workers + 1 GIL-holding support thread). So the advice for speeding up code
> is simply "make sure your code is all typed", just like before, but people
> can follow that advice without even having to learn about the GIL.
>
> It's all about a) lowering learning curve for trivial purposes, b) allow
> inserting temporary debug print statements using the GIL without having to
> rework the code.


Have we ever thought about supporting 'with gil' as an actual
statement instead of just as part of a function declaration or
definition? Then you could just say in a 'with nogil:' block: 'with
gil: print myvar'. On the other hand, we could also convert usages of
'print' (that is, of simple 'print a, b, c'-style printing) in 'nogil'
blocks or functions to C printf statements, where possible. Would that
be a wanted feature?

> As for the discussion we had on using the GIL for locking, I think that
> should be made explicit, even if it is a noop currently. I once wrote code
> relying on the GIL, and really missed something like "cython.gil.lock()" to
> put in there just for better code readability (yes, I used comments,
> but...).
>
>>
>> I think it simplifies things a lot when code can rely on the GIL being
>> held when entering the thread function. Threading is complicated enough to
>> keep it as explicit as possible.
>
> That's exactly the thing about OpenMP: It tends to hide the complexity of
> threading and allow you to get on with your life. When you say this, it
> sounds a bit like "people who don't want to learn the technical inner
> details of Python should just use another language than Cython".
>
> If I write code in Fortran it may get parallelized, whereas I almost never
> write parallel code in Cython (well, MPI, but not shared-memory), all the
> "is-the-gil-held-or-not" is just too much too keep in my head.
>
> Dag Sverre
> _______________________________________________
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Re: [Cython] OpenMP support

Reply via email to