Attached are 4 separate patches for each somewhat independent  portion of the 
kernel work related to the follow-fork implementation.


On 01/26/2012 04:23 AM, Kostik Belousov wrote:
On Wed, Jan 25, 2012 at 03:48:04PM -0800, Dmitry Mikulin wrote:
Thanks for taking a look at it. I'll try to explain the changes the best I
can: the work was done nearly 6 months ago...

I would certainly appreciate some more words describing the changes.

What is the goal of introducing the PT_FOLLOW_EXEC ? To not force
the debugger to filter all syscall exits to see exec events ?
PT_FOLLOW_EXEC was added to trace the exec() family of calls. When it's
enabled, the kernel will generate a trap to give debugger a chance to clean
up old process info and initialize its state with the new binary.

It was more a question, why PT_FLAG_EXEC is not enough.


Why did you moved the stopevent/ptracestop for exec events from
syscallret() to kern_execve() ? If moving, why the handling of TDB_EXEC
is not removed from syscallret() ? I do not think that TDB_EXEC can be
seen there after the patch. The same question about TDB_FORK.
The reason I moved the event notifications from syscallret() is because the
debugger is interested successful completion of either fork or exec, not
just the fact that they have been called. If, say, a call to fork() failed,
as far as debugger is concerned, fork() might as well had never been
called. Having a ptracestop in syscallret triggered a trap on every return
from fork without telling the debugger whether a new process had been
created or not. Same goes for exec().
PT_FLAG_EXEC is only set if an exec-kind of syscall succeeded. The same
is true for PT_FLAG_FORKED, the flag is set only if a new child was
successfully created.

Looking at the code now I think I should have removed TDB_EXEC/TDB_FORK
processing from syscallret(). I'll do that and verify that it works as
expected.

If possible, I would greatly prefer to have fork changes separated.
You mean separate fork changes into a separate patch from exec changes?
Yes. Even more, it seems that fork changes should be separated into
bug fixes and new functionality.

I doubt that disallowing RFMEM while tracing is the right change. It may
be to fix some (undescribed) bug, but RFMEM is documented behaviour used
not only for vfork(2), and the change just breaks rfork(2) for debugged
processes.

Even vfork() should not be broken, since I believe there are programs
that rely on the vfork() model, in particular, C shell. It will be
broken if vfork() is substituted by fork(). The fact that it breaks only
under debugger will make it esp. confusing.
I need to dig up the details since I can't recall the exact reason for
forcing fork() in cases of user calls to vfork() under gdb. I believe it
had to do with when child process memory was available for writing in case
of vfork() and it wasn't early enough to complete the switch over from
parent to child in gdb. After consulting with our internal kernel experts
we decided that doing fork() instead of vfork() under gdb is a low impact
change.

Why do we need to have TDB_FORK set for td2 ?
The debugger needs to intercept fork() in both parent and child so it can
detach from the old process and attach to the new one. Maybe it'll make
more sense in the context of gdb changes. Should I send them too? Don't
think Marcel included that patch...
No, I think the kernel changes are enough for now.

Does the orphan list change intended to not lost the child after fork ?
But the child shall be traced, so debugger would get the SIGTRAP on
the attach on fork returning to usermode. I remember that I explicitely
tested this when adding followfork changes.
Yes, the debugger gets SIGTRAPs. The problem arises when the real parent of
the forked process has the code to collect termination status. Since
attaching to a process changes the parent/child relationships, we need to
keep track of the children lost due to re-parenting so we can properly
attribute their exit status to the "real" parent.

I do not quite understand it. From the description it sounds as the problem
that is not related to follow fork changes at all, and shall be presented
regardless of the follow fork. If yes, I think this shall be separated
into a standalone patch.


Index: sys/kern/kern_exec.c
===================================================================
--- sys/kern/kern_exec.c	(revision 230617)
+++ sys/kern/kern_exec.c	(working copy)
@@ -55,6 +55,7 @@
 #include <sys/priv.h>
 #include <sys/proc.h>
 #include <sys/pioctl.h>
+#include <sys/ptrace.h>
 #include <sys/namei.h>
 #include <sys/resourcevar.h>
 #include <sys/sdt.h>
@@ -888,16 +889,22 @@
 	free(imgp->freepath, M_TEMP);
 
 	if (error == 0) {
+		if ((p->p_flag & (P_TRACED | P_FOLLOWEXEC)) == 
+		    (P_TRACED |	P_FOLLOWEXEC)) {
+			PROC_LOCK(p);
+			td->td_dbgflags |= TDB_EXEC;
+			PROC_UNLOCK(p);
+		}
+ 		/*
+ 		 * Stop the process here if its stop event mask has
+ 		 * the S_EXEC bit set.
+ 		 */
+ 		STOPEVENT(p, S_EXEC, 0);
+		PTRACESTOP_SC(p, td, S_PT_EXEC);
 		PROC_LOCK(p);
-		td->td_dbgflags |= TDB_EXEC;
+		td->td_dbgflags &= ~TDB_EXEC;
 		PROC_UNLOCK(p);
-
-		/*
-		 * Stop the process here if its stop event mask has
-		 * the S_EXEC bit set.
-		 */
-		STOPEVENT(p, S_EXEC, 0);
-		goto done2;
+ 		goto done2;
 	}
 
 exec_fail:
Index: sys/kern/sys_process.c
===================================================================
--- sys/kern/sys_process.c	(revision 230617)
+++ sys/kern/sys_process.c	(working copy)
@@ -660,6 +660,7 @@
 	case PT_TO_SCX:
 	case PT_SYSCALL:
 	case PT_FOLLOW_FORK:
+	case PT_FOLLOW_EXEC:
 	case PT_DETACH:
 		sx_xlock(&proctree_lock);
 		proctree_locked = 1;
@@ -874,6 +875,17 @@
 			p->p_flag &= ~P_FOLLOWFORK;
 		break;
 
+	case PT_FOLLOW_EXEC:
+		if (data) {
+			p->p_flag |= P_FOLLOWEXEC;
+			p->p_stops |= S_PT_EXEC;
+		}
+		else {
+			p->p_flag &= ~P_FOLLOWEXEC;
+			p->p_stops &= ~S_PT_EXEC;
+		}
+		break;
+
 	case PT_STEP:
 	case PT_CONTINUE:
 	case PT_TO_SCE:
@@ -936,7 +948,8 @@
 					p->p_sigparent = SIGCHLD;
 			}
 			p->p_oppid = 0;
-			p->p_flag &= ~(P_TRACED | P_WAITED | P_FOLLOWFORK);
+			p->p_flag &= ~(P_TRACED | P_WAITED | P_FOLLOWFORK | 
+				       P_FOLLOWEXEC);
 
 			/* should we send SIGCHLD? */
 			/* childproc_continued(p); */
Index: sys/sys/proc.h
===================================================================
--- sys/sys/proc.h	(revision 230617)
+++ sys/sys/proc.h	(working copy)
@@ -617,6 +617,7 @@
 #define	P_INMEM		0x10000000 /* Loaded into memory. */
 #define	P_SWAPPINGOUT	0x20000000 /* Process is being swapped out. */
 #define	P_SWAPPINGIN	0x40000000 /* Process is being swapped in. */
+#define	P_FOLLOWEXEC	0x80000000 /* Notify debugger of exec events. */
 
 #define	P_STOPPED	(P_STOPPED_SIG|P_STOPPED_SINGLE|P_STOPPED_TRACE)
 #define	P_SHOULDSTOP(p)	((p)->p_flag & P_STOPPED)
Index: sys/sys/ptrace.h
===================================================================
--- sys/sys/ptrace.h	(revision 230617)
+++ sys/sys/ptrace.h	(working copy)
@@ -64,6 +64,7 @@
 #define	PT_SYSCALL	22
 
 #define	PT_FOLLOW_FORK	23
+#define	PT_FOLLOW_EXEC  24
 
 #define PT_GETREGS      33	/* get general-purpose registers */
 #define PT_SETREGS      34	/* set general-purpose registers */
@@ -142,6 +143,7 @@
  */
 #define	S_PT_SCE	0x000010000
 #define	S_PT_SCX	0x000020000
+#define	S_PT_EXEC	0x000080000
 
 int	ptrace_set_pc(struct thread *_td, unsigned long _addr);
 int	ptrace_single_step(struct thread *_td);
Index: sys/kern/kern_fork.c
===================================================================
--- sys/kern/kern_fork.c	(revision 230617)
+++ sys/kern/kern_fork.c	(working copy)
@@ -59,6 +59,7 @@
 #include <sys/proc.h>
 #include <sys/procdesc.h>
 #include <sys/pioctl.h>
+#include <sys/ptrace.h>
 #include <sys/racct.h>
 #include <sys/resourcevar.h>
 #include <sys/sched.h>
@@ -702,7 +703,8 @@
 		 * for runaway child.
 		 */
 		td->td_dbgflags |= TDB_FORK;
-		td->td_dbg_forked = p2->p_pid;
+		td2->td_dbgflags |= TDB_FORK;
+		td->td_dbg_forked = td2->td_dbg_forked = p2->p_pid;
 		td2->td_dbgflags |= TDB_STOPATFORK;
 		_PHOLD(p2);
 		p2_held = 1;
@@ -731,6 +733,13 @@
 	knote_fork(&p1->p_klist, p2->p_pid);
 	SDT_PROBE(proc, kernel, , create, p2, p1, flags, 0, 0);
 
+	if (td->td_dbgflags & TDB_FORK) {
+		PTRACESTOP_SC(p1, td, S_PT_FORK);
+		PROC_LOCK(p1);
+		td->td_dbgflags &= ~TDB_FORK;
+		PROC_UNLOCK(p1);
+	}
+
 	/*
 	 * Wait until debugger is attached to child.
 	 */
@@ -1036,6 +1045,7 @@
 			proc_reparent(p, dbg);
 			sx_xunlock(&proctree_lock);
 			ptracestop(td, SIGSTOP);
+			td->td_dbgflags &= ~TDB_FORK;
 		} else {
 			/*
 			 * ... otherwise clear the request.
Index: sys/kern/sys_process.c
===================================================================
--- sys/kern/sys_process.c	(revision 230617)
+++ sys/kern/sys_process.c	(working copy)
@@ -868,10 +868,14 @@
 		break;
 
 	case PT_FOLLOW_FORK:
-		if (data)
+		if (data) {
 			p->p_flag |= P_FOLLOWFORK;
-		else
+			p->p_stops |= S_PT_FORK;
+		}
+		else {
 			p->p_flag &= ~P_FOLLOWFORK;
+			p->p_stops &= ~S_PT_FORK;
+		}
 		break;
 
 	case PT_STEP:
Index: sys/sys/ptrace.h
===================================================================
--- sys/sys/ptrace.h	(revision 230617)
+++ sys/sys/ptrace.h	(working copy)
@@ -142,6 +142,7 @@
  */
 #define	S_PT_SCE	0x000010000
 #define	S_PT_SCX	0x000020000
+#define	S_PT_FORK	0x000040000
 
 int	ptrace_set_pc(struct thread *_td, unsigned long _addr);
 int	ptrace_single_step(struct thread *_td);
Index: sys/kern/kern_exit.c
===================================================================
--- sys/kern/kern_exit.c	(revision 230617)
+++ sys/kern/kern_exit.c	(working copy)
@@ -680,7 +680,7 @@
  */
 void
 proc_reap(struct thread *td, struct proc *p, int *status, int options,
-    struct rusage *rusage)
+    struct rusage *rusage, int child)
 {
 	struct proc *q, *t;
 
@@ -720,7 +720,6 @@
 	if (p->p_oppid && (t = pfind(p->p_oppid)) != NULL) {
 		PROC_LOCK(p);
 		proc_reparent(p, t);
-		p->p_pptr->p_dbg_child--;
 		p->p_oppid = 0;
 		PROC_UNLOCK(p);
 		pksignal(t, SIGCHLD, p->p_ksi);
@@ -738,7 +737,10 @@
 	sx_xlock(&allproc_lock);
 	LIST_REMOVE(p, p_list);	/* off zombproc */
 	sx_xunlock(&allproc_lock);
-	LIST_REMOVE(p, p_sibling);
+	if (child)
+		LIST_REMOVE(p, p_sibling);
+	else
+		LIST_REMOVE(p, p_orphan);
 	leavepgrp(p);
 #ifdef PROCDESC
 	if (p->p_procdesc != NULL)
@@ -859,7 +861,7 @@
 		nfound++;
 		PROC_SLOCK(p);
 		if (p->p_state == PRS_ZOMBIE) {
-			proc_reap(td, p, status, options, rusage);
+			proc_reap(td, p, status, options, rusage, 1);
 			return (0);
 		}
 		if ((p->p_flag & P_STOPPED_SIG) &&
@@ -893,16 +895,47 @@
 
 			if (status)
 				*status = SIGCONT;
+		}
+		PROC_UNLOCK(p);
+	}
+	LIST_FOREACH(p, &q->p_orphans, p_orphan) {
+		PROC_LOCK(p);
+		if (pid != WAIT_ANY &&
+		    p->p_pid != pid && p->p_pgid != -pid) {
+			PROC_UNLOCK(p);
+			continue;
+		}
+		if (p_canwait(td, p)) {
+			PROC_UNLOCK(p);
+			continue;
+		}
+
+		/*
+		 * This special case handles a kthread spawned by linux_clone
+		 * (see linux_misc.c).  The linux_wait4 and linux_waitpid
+		 * functions need to be able to distinguish between waiting
+		 * on a process and waiting on a thread.  It is a thread if
+		 * p_sigparent is not SIGCHLD, and the WLINUXCLONE option
+		 * signifies we want to wait for threads and not processes.
+		 */
+		if ((p->p_sigparent != SIGCHLD) ^
+		    ((options & WLINUXCLONE) != 0)) {
+			PROC_UNLOCK(p);
+			continue;
+		}
+
+		nfound++;
+		PROC_SLOCK(p);
+		if (p->p_state == PRS_ZOMBIE) {
+		  proc_reap(td, p, status, options, rusage, 0);
 			return (0);
 		}
+		PROC_SUNLOCK(p);
 		PROC_UNLOCK(p);
 	}
 	if (nfound == 0) {
 		sx_xunlock(&proctree_lock);
-		if (td->td_proc->p_dbg_child)
-			return (0);
-		else
-			return (ECHILD);
+		return (ECHILD);
 	}
 	if (options & WNOHANG) {
 		sx_xunlock(&proctree_lock);
@@ -929,6 +962,7 @@
 void
 proc_reparent(struct proc *child, struct proc *parent)
 {
+	struct proc *p;
 
 	sx_assert(&proctree_lock, SX_XLOCKED);
 	PROC_LOCK_ASSERT(child, MA_OWNED);
@@ -940,5 +974,17 @@
 	PROC_UNLOCK(child->p_pptr);
 	LIST_REMOVE(child, p_sibling);
 	LIST_INSERT_HEAD(&parent->p_children, child, p_sibling);
+
+	LIST_FOREACH(p, &parent->p_orphans, p_orphan) {
+		if (p == child) {
+			LIST_REMOVE(child, p_orphan);
+			break;
+		}
+	}
+	if (child->p_flag & P_TRACED) {
+		LIST_INSERT_HEAD(&child->p_pptr->p_orphans, child, p_orphan);
+		PROC_UNLOCK(child);
+		PROC_LOCK(child);
+	}
 	child->p_pptr = parent;
 }
Index: sys/kern/kern_fork.c
===================================================================
--- sys/kern/kern_fork.c	(revision 230617)
+++ sys/kern/kern_fork.c	(working copy)
@@ -590,6 +590,7 @@
 	LIST_INSERT_AFTER(p1, p2, p_pglist);
 	PGRP_UNLOCK(p1->p_pgrp);
 	LIST_INIT(&p2->p_children);
+	LIST_INIT(&p2->p_orphans);
 
 	callout_init(&p2->p_itcallout, CALLOUT_MPSAFE);
 
Index: sys/kern/sys_process.c
===================================================================
--- sys/kern/sys_process.c	(revision 230617)
+++ sys/kern/sys_process.c	(working copy)
@@ -841,8 +841,6 @@
 		p->p_flag |= P_TRACED;
 		p->p_oppid = p->p_pptr->p_pid;
 		if (p->p_pptr != td->td_proc) {
-			/* Remember that a child is being debugged(traced). */
-			p->p_pptr->p_dbg_child++;
 			proc_reparent(p, td->td_proc);
 		}
 		data = SIGSTOP;
@@ -931,7 +929,6 @@
 					PROC_UNLOCK(pp);
 				PROC_LOCK(p);
 				proc_reparent(p, pp);
-				p->p_pptr->p_dbg_child--;
 				if (pp == initproc)
 					p->p_sigparent = SIGCHLD;
 			}
Index: sys/kern/sys_procdesc.c
===================================================================
--- sys/kern/sys_procdesc.c	(revision 230617)
+++ sys/kern/sys_procdesc.c	(working copy)
@@ -367,7 +367,7 @@
 		 * procdesc_reap().
 		 */
 		PROC_SLOCK(p);
-		proc_reap(curthread, p, NULL, 0, NULL);
+		proc_reap(curthread, p, NULL, 0, NULL, 0);
 	} else {
 		/*
 		 * If the process is not yet dead, we need to kill it, but we
Index: sys/sys/proc.h
===================================================================
--- sys/sys/proc.h	(revision 230617)
+++ sys/sys/proc.h	(working copy)
@@ -505,8 +505,6 @@
 /* The following fields are all zeroed upon creation in fork. */
 #define	p_startzero	p_oppid
 	pid_t		p_oppid;	/* (c + e) Save ppid in ptrace. XXX */
-	int		p_dbg_child;	/* (c + e) # of debugged children in
-							ptrace. */
 	struct vmspace	*p_vmspace;	/* (b) Address space. */
 	u_int		p_swtick;	/* (c) Tick when swapped in or out. */
 	struct itimerval p_realtimer;	/* (c) Alarm timer. */
@@ -574,6 +572,8 @@
 					   after fork. */
 	uint64_t	p_prev_runtime;	/* (c) Resource usage accounting. */
 	struct racct	*p_racct;	/* (b) Resource accounting. */
+	LIST_ENTRY(proc) p_orphan;	/* (e) List of orphan processes. */
+	LIST_HEAD(, proc) p_orphans;	/* (e) Pointer to list of orphans. */
 };
 
 #define	p_session	p_pgrp->pg_session
@@ -865,7 +865,7 @@
 void	proc_linkup0(struct proc *p, struct thread *td);
 void	proc_linkup(struct proc *p, struct thread *td);
 void	proc_reap(struct thread *td, struct proc *p, int *status, int options,
-	    struct rusage *rusage);
+	    struct rusage *rusage, int child);
 void	proc_reparent(struct proc *child, struct proc *newparent);
 struct	pstats *pstats_alloc(void);
 void	pstats_fork(struct pstats *src, struct pstats *dst);
Index: sys/kern/kern_fork.c
===================================================================
--- sys/kern/kern_fork.c	(revision 230617)
+++ sys/kern/kern_fork.c	(working copy)
@@ -797,6 +797,10 @@
 
 	p1 = td->td_proc;
 
+	/* Don't do vfork while being traced. */
+	if (p1->p_flag & P_TRACED)
+		flags &= ~(RFPPWAIT | RFMEM);
+
 	/*
 	 * Here we don't create a new process, but we divorce
 	 * certain parts of a process from itself.
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to