I wrote a test program. It does clone(CLONE_NEWPID | CLONE_VM) and sleep(), a new task repeates the same actions. This program creates 4000 tasks. When I tried to kill all this processes, a system was inaccessible for some minutes.
The system is inaccessible, because each process calls zap_pid_ns_processes, which tries to kill subprocesses under tasklist_lock. The most time are required for find_vpid(). I suggest to mark sub-namespaces in zap_pid_ns_processes. zap_pid_ns_processes for marked pidns doesn't kill tasks, it only waits them. I am not sure, that this idea is correct, but it helps. Maybe we should restrict depth of pidns? Why can't we enumerate task->children instead of using find_vpid()? Cc: Oleg Nesterov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Vasiliy Kulikov <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Vagin <[email protected]> --- include/linux/pid_namespace.h | 1 + kernel/pid_namespace.c | 14 ++++++++++++-- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 00474b0..28073a0 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -34,6 +34,7 @@ struct pid_namespace { kgid_t pid_gid; int hide_pid; int reboot; /* group exit code if this pidns was rebooted */ + atomic_t zapped; /* non zero if all process were killed */ }; extern struct pid_namespace init_pid_ns; diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index b051fa6..7db7dcd 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -177,21 +177,31 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns) * maintain a tasklist for each pid namespace. * */ + + if (atomic_read(&pid_ns->zapped)) + goto wait; /* All processes were already killed */ + read_lock(&tasklist_lock); nr = next_pidmap(pid_ns, 1); while (nr > 0) { rcu_read_lock(); task = pid_task(find_vpid(nr), PIDTYPE_PID); - if (task && !__fatal_signal_pending(task)) + if (task && !__fatal_signal_pending(task)) { + struct pid_namespace *ns; + send_sig_info(SIGKILL, SEND_SIG_FORCED, task); + ns = task_active_pid_ns(task); + if (unlikely(ns->child_reaper == task)) + atomic_set(&ns->zapped, 1); + } rcu_read_unlock(); nr = next_pidmap(pid_ns, nr); } read_unlock(&tasklist_lock); - +wait: /* Firstly reap the EXIT_ZOMBIE children we may have. */ do { clear_thread_flag(TIF_SIGPENDING); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

