Hi Burlen, On Jun 13, 2016 5:24 PM, "Burlen Loring" <blor...@lbl.gov> wrote: > > Hi All, > > I'm working on a threaded pipeline where we want the end user to be able to > code up Python functions to do numerical work. Threading is all done in C++11 > and in each thread we've acquired gill before we invoke the user provided > Python callback and release it only when the callback returns. We've used > SWIG to expose bindings to C++ objects in Python. > > When run with more than 1 thread I get inconsistent segv's in various numpy > routines, and occasionally see a *** Reference count error detected: an > attempt was made to deallocate 11 (f) ***. > > To pass data from C++ to the Python callback as numpy array we have >> >> 206 // >> **************************************************************************** >> 207 template <typename NT> >> 208 PyArrayObject *new_object(teca_variant_array_impl<NT> *varrt) >> 209 { >> 210 // allocate a buffer >> 211 npy_intp n_elem = varrt->size(); >> 212 size_t n_bytes = n_elem*sizeof(NT); >> 213 NT *mem = static_cast<NT*>(malloc(n_bytes)); >> 214 if (!mem) >> 215 { >> 216 PyErr_Format(PyExc_RuntimeError, >> 217 "failed to allocate %lu bytes", n_bytes); >> 218 return nullptr; >> 219 } >> 220 >> 221 // copy the data >> 222 memcpy(mem, varrt->get(), n_bytes); >> 223 >> 224 // put the buffer in to a new numpy object >> 225 PyArrayObject *arr = reinterpret_cast<PyArrayObject*>( >> 226 PyArray_SimpleNewFromData(1, &n_elem, numpy_tt<NT>::code, mem)); >> 227 PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA); >> 228 >> 229 return arr; >> 230 }
This code would probably be much simpler if you let numpy allocate the buffer with PyArray_SimpleNew and then did the memcpy. I doubt that's your problem, though. Numpy should be assuming that "owned" data was allocated using malloc(), and if it were using a different allocator then I think you'd be seeing crashes much sooner. > This is the only place we create numpy objects in the C++ side. > > In my demo the Python callback is as follows: >> >> 33 def get_execute(rank, var_names): >> 34 def execute(port, data_in, req): >> 35 sys.stderr.write('descriptive_stats::execute MPI %d\n'%(rank)) >> 36 >> 37 mesh = as_teca_cartesian_mesh(data_in[0]) >> 38 >> 39 table = teca_table.New() >> 40 table.copy_metadata(mesh) >> 41 >> 42 table.declare_columns(['step','time'], ['ul','d']) >> 43 table << mesh.get_time_step() << mesh.get_time() >> 44 >> 45 for var_name in var_names: >> 46 >> 47 table.declare_columns(['min '+var_name, 'avg '+var_name, \ >> 48 'max '+var_name, 'std '+var_name, 'low_q '+var_name, \ >> 49 'med '+var_name, 'up_q '+var_name], ['d']*7) >> 50 >> 51 var = mesh.get_point_arrays().get(var_name).as_array() >> 52 >> 53 table << float(np.min(var)) << float(np.average(var)) \ >> 54 << float(np.max(var)) << float(np.std(var)) \ >> 55 << map(float, np.percentile(var, [25.,50.,75.])) >> 56 >> 57 return table >> 58 return execute > > this callback is the only spot where numpy is used. the as_array call is > implemented by new_object template above. > Further, If I remove our use of PyArray_SimpleNewFromData, by replacing line > 51 in the Python code above with var = np.array(range(1, 1100), 'f'), the > problem disappears. It must have something to do with use of > PyArray_SimpleNewFromData. > > I'm at a loss to see why things are going south. I'm using the GIL and I > thought that would serialize the Python code. I suspect that numpy is using > global or static variables some where internally and that it's inherently > thread unsafe. Can anyone confirm/deny? maybe point me in the right direction? Numpy does use global/static variables, and it is unsafe to call into numpy simultaneously from different threads. But that's ok, because you're not allowed to call numpy functions simultaneously from different threads -- you have to hold the GIL first, and that serializes access to all of numpy's internal state. Numpy is very commonly used in threaded code and most people aren't seeing random segfaults, so the problem is most likely in your code. Sorry I can't help much more than that... I guess I'd start by triple-checking that the code really truly does hold the GIL every time that it calls into numpy/python APIs. I'd also try running it under valgrind in case it's some other random memory corruption that's just showing up in a weird way. -n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion