I have been thinking about the multithreading problem in PyPy for a while and I have come up with an idea. I'd like to have feedback from people who know the codebase well.
The first and hardest step is to change the PyPy runtime so that it can run multiple threads at the same time. To simplify matters, we allocate all external resources to one thread to start with. We assume that other threads don't use them. Neither do they call into extension modules that do messy things. Whenever we spawn a new thread, we give it its own object space and its own instance of the memory manager/garbage collector. Having gotten this far, we would have N threads that could run in parallell. Since they have no interaction with each other and no contetion for resources, they would require no locking mechanism. The thread with the external resources would still be dependent on the GIL, but the other ones wouldn't even see it. This setup would of course be utterly useless, because all but one of the threads would have no means of comunicationg their results to the world. So, in a second step, we provide for special data types that can be shared between threads. These would typically be allocated in non-movable memory, to avoid the complexity of garbage collection of memory with shared use. You can make simple fifo structures for communication between the threads and complex structures with advanced algorithms for dealing with shared access. In a third step, you may relax the requirement that the first thread owns all resources. You should be able to hand them out in a controlled manner. For instance, you may want to spawn a thread for each socket connection and have that thread deal with all the communication with the socket. Now I wonder about the feasability of the first step. How much global state would have to be wrapped in per-tread objects and how hard would that be? What other obstacles would there be to doing this change? I guess there is a complication with requesting memory from the kernel and returning memory, but I think that could be solved in more or less elegant ways. Jacob Hallén
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ [email protected] http://codespeak.net/mailman/listinfo/pypy-dev
