Hi, On Sun, Mar 09, 2008 at 10:17:05PM -0400, Thomas Bushnell BSG wrote: > On Mon, 2008-03-10 at 01:19 +0000, Samuel Thibault wrote:
> > This thread is syncing everything, i.e. asking a lot of writes, > > which triggers the creation of a lot of threads. Unfortunately the > > superblock was paged out, so they all block on reading it. > > Unfortunately, since in Debian there is a patch which limits the > > number of created threads, the read of the superblock doesn't > > actually create a new thread, that is delayed. But since none of > > the existing threads can progress (since they are all waiting for > > the super block), things are just dead locked... > > As a general rule, the Hurd always assumes that an RPC can be handled; > this is quite emebedded in the way diskfs works. > > A patch which limits the number of threads is inherently buggy in the > Hurd, and that patch MUST be disabled for anything to work properly. I'm glad this discussion came up at last: This is a very serious issue, and your input is necessary. The real problem here is that the current thread model of the Hurd servers is fundamentally broken. Creating a new kernel thread for each incoming RPC just doesn't work. The amount of incoming RPCs can be pretty much unlimited, but kernel threads are a limited resource -- a quite expensive resource, in fact. And this is not a theoretical problem, but a very real one: On heavy disk load, the filesystem server easily created hundreds, even thousands of threads. During large compile jobs for example the Hurd was crashing regularily with zalloc panics. And the problem became more and more pressing with machines getting faster and disk space usage larger. Another easy way to reproduce the problem is to create a process with many children, each opening /dev/null many times (the number of open files per process is limited, thus we need many children), and then killing all the children quickly (e.g. by terminating the parent). With some 30000 or so total open ports, this is guaranteed to result in a zalloc panic -- the null server is hammered with dead name notifications, resulting again in thread explosion. When these issues became known, Sergio Lopez implemented a hack that simply limits the number of thread created by each single server to a fixed amount. When more RPCs come in, they won't be handled until some of the threads becomes free. While this patch did wonders for stability under load, and no problems showed at first, it immediately struck me as having potential for causing deadlocks. When I pointed that out, Sergio made a slight modification: Rather than completely disabling creation of new threads once the limit is reached, each further thread becomes active only after idling for two seconds -- so additional threads get created very slowly. I never was happy with that solution(?), and suggested a more adaptive approach: Keep track of the existing threads, and if none of them makes progress in a certain amount of time (say 100 ms), allow creating some more threads. But that was never implemented. Also, it still might cause considerable delays in some situations; and I'm not even sure it would fix all problems. (I didn't fully understand the problem discussed in this thread, so I don't know whether it would be fixed by that?) And anyways, it still would be just an ugly workaround. The real solution here of course is to fix the thread model -- using some kind of continuation mechanism: Have a limited number of threads (ideally one per CPU) handle incoming requests. Whenever some operation would require blocking for some event (in the case of diskfs, waiting for the underlying store to finish reading/writing), the state is instead saved to some list of outstanding operations, and the thread goes on handling other requests. Only when the event completes, we read the state back and continue handling the original request. Of course, that will be a major change; it requires modification of considerable parts of the Hurd servers. But it seems the only way to handle this properly. What do you think? I wonder whether I should add this to the list of project ideas for GSoC... -antrik-