[Cython] OpenMP thread private variable not recognized (bug report + discussion)

2014-08-11 Thread Leon Bottou

The attached cython program uses an extension class to represent a unit of 
work. The with parallel block temporarily gets the gil to allocate the object, 
then release the gil and performs the task with a for i in prange(...) 
statement. My expectation was to have w recognized as a thread private 
variable.

with nogil, parallel():
with gil: 
w = Worker(n) # should be thread private
with nogil: 
for i in prange(0,m): 
r += w.run()  # should be reduction
w = None  # is this needed?


Cythonize (0.20.2) works without error but produces an incorrect C file.

hello.c: In function ‘__pyx_pf_5hello_run’:
hello.c:2193:42: error: expected identifier before ‘)’ token
hello.c: At top level:

The erroneous line is:

#pragma omp parallel private() reduction(+:__pyx_v_r) private(__pyx_t_5, 
__pyx_t_4, __pyx_t_3) firstprivate(__pyx_t_1, __pyx_t_2) 
private(__pyx_filename, __pyx_lineno, __pyx_clineno) 
shared(__pyx_parallel_why, __pyx_parallel_exc_type, __pyx_parallel_exc_value, 
__pyx_parallel_exc_tb)

where you can see that the first private() clause has no argument. The 
variable __pyx_v_w is not declared as private either as I would expect.

I believe that the problem comes from line 7720 in Cython/Compiler/Node.py

if self.privates:
privates = [e.cname for e in self.privates
if not e.type.is_pyobject]
code.put('private(%s)' % ', '.join(privates))

And I further believe that the clause "if not e.type.is_pyobject" has been 
added because nothing would decrements the reference count of the thread 
private worker object when leaving the parallel block. 

My quick fix would be to remove this clause and make sure that my program 
contains the line "w = None" before leaving the thread. But I realize that 
this is not sufficient for you.

Note that the temporary python objects generated by the call to the Worker 
construction are correctly recognized as thread private and their reference 
count is correctly decremented when they are no longer needed.  The problem 
here is the clash between the python scoping rules and the semantics of thread 
private variables. This is one of these cases where I would have liked to be 
able to write

with nogil, parallel():
with gil: 
cdef w = Worker(n)# block-scoped cdef
with nogil: 
for i in prange(0,m): 
r += w.run()

with an understanding that the scope of the cdef variable is limited to the 
block where the cdef appears. But when you try this, cython tells you that 
cdefs are not legal there. 


# -*- Python -*-

import numpy as np
cimport numpy as np
from cython.parallel import parallel, prange


cdef class Worker:
cdef double[::1] v
def __init__(self, int n):
self.v = np.random.randn(n)
cdef double run(self) nogil:
cdef int i
cdef int n = self.v.shape[0]
cdef double s = 0
for i in range(0,n): 
s += self.v[i]
return s / n

def run(int n, int m):
cdef Worker w
cdef double r
cdef int i
with nogil, parallel():
with gil: 
w = Worker(n)
with nogil: 
for i in prange(0,m): 
r += w.run()
w = None
return r / m


from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_module = Extension(
"hello",
["hello.pyx"],
extra_compile_args=['-fopenmp'],
extra_link_args=['-fopenmp'],
)

setup(
name = 'Hello',
cmdclass = {'build_ext': build_ext},
ext_modules = [ext_module],
)

___
cython-devel mailing list
cython-devel@python.org
https://mail.python.org/mailman/listinfo/cython-devel


[Cython] [Re] OpenMP thread private variable not recognized (bug report + discussion)

2014-08-12 Thread Leon Bottou
On Tue, 12 Aug 2014 14:26:31, Sturla Molden wrote:
> Cython does not do an error here:[...
> - i is recognized as private
> - r is recognized as reduction
> - w is (correctly) recognized as shared

Not according to the documentation.
http://docs.cython.org/src/userguide/parallelism.html documentation for
cython.parallel.parallel says "A contained prange will be a worksharing loop
that is not parallel, so any variable assigned to in the parallel section is
also private to the prange. Variables that are private in the parallel block
are unavailable after the parallel block.".  Variable w is such a variable.

Furthermore, if cython is correct, why does GCC report an error on the
cython generated C code?  

My point here is that there is a bug because (a) cython does not behave as
documented, and (b) it generates invalid C code despite not reporting an
error.

> Personally I prefer to avoid OpenMP and just use Python threads and an
> internal function (closure) or an internal class. If you start to use
OpenMP,
> Apple's libdispatch ("GCD"), Intel TBB, or Intel clikplus, you will soon
discover
> that they are all variations over the same theme: a thread pool and a
closure.

I am making heavy uses of OpenBlas which also uses OpenMP.
Using the same queue manager prevents lots of CPU provisioning problem.
Using multiple queue managers in the same code does not work as well because
they are not aware of what the other one is doing.


- L.




___
cython-devel mailing list
cython-devel@python.org
https://mail.python.org/mailman/listinfo/cython-devel


[Cython] [Re] OpenMP thread private variable not recognized (bug report + discussion)

2014-08-13 Thread Leon Bottou
> > I am making heavy uses of OpenBlas which also uses OpenMP.
> > Using the same queue manager prevents lots of CPU provisioning problem.
> > Using multiple queue managers in the same code does not work as well
> > because they are not aware of what the other one is doing.
> 
> Normally OpenBLAS is built without OpenMP. Also, OpenMP is not fork safe
> (cf. multiprocessing) but OpenBLAS' own threadpool is. So it is
recommended
> to build OpenBLAS without OpenMP dependency.
> 
> That is: If you build OpenBLAS with OpenMP, numpy.dot will hang if used
> together with multiprocessing.

I am effectively using a version of openblas built with openmp because
Debian used to compile openblas this way. They seem to have reverted now.
Note than I cannot use python multiprocessing because my threads work on a
very large state vector.  My current solution is to use python threading and
nogil cython compiled routines but this sometimes lead to weird effects
provisioning threads. 

This is why I wanted to try the pure openmp solution and found the
aforementioned bug in cython.parallel.

Is there somebody actively trying to make cython.parallel work correctly?
- If yes, then my bug report should be of interest to this person.
- If no, then one should avoid (and possibly deprecate) cython.parallel and
find other ways to do things. 

Thanks to the replies.

- L.


___
cython-devel mailing list
cython-devel@python.org
https://mail.python.org/mailman/listinfo/cython-devel