On Mon, Jul 20, 2020 at 04:35:12AM +0000, Visa Hankala wrote:
> On Sun, Jul 19, 2020 at 09:47:54PM +0100, Julian Smith wrote:
> > I've been finding egdb and gdb rather easily get stuck in an
> > uninterruptible wait, e.g. when running the 'next' command after
> > hitting a breakpoint.
> > 
> > So it's not possible to kill the debuggee or gdb and the only way to
> > kill the debuggee process and free up its listening sockets seems to be
> > to reboot the entire system.
> > 
> > Perhaps unsurprisingly one cannot attach a second invocation of gdb to
> > the uninterruptible gdb, so i don't know for sure what syscall is being
> > run that is getting stuck.
> > 
> > The debuggee is a local build of the flightgear flight simulator.
> > 
> > Here's the output of ps for the debugger and debuggee:
> > 
> > 12419 p0  D        0:34.37 egdb -ex handle SIGPIPE noprint nostop -ex set 
> > print thread-events off -ex set print pretty on -ex run --args 
> > build-walk/fgfs,clang,debug,opt,co
> > 63921 p0  TX+      0:42.45 
> > /home/jules/flightgear/build-walk/fgfs,clang,debug,opt,compositor,osg.exe 
> > --airport=egtk (fgfs,clang,debug)
> > 
> > I've tried using ktrace on egdb, and the kdump output ends like this:
> > 
> >  53950 egdb     CALL  wait4(WAIT_ANY,0x7f7ffffe8efc,0<>,0)
> >  53950 egdb     RET   wait4 97562/0x17d1a
> >  53950 egdb     CALL  ptrace(PT_GET_PROCESS_STATE,97562,0x7f7ffffe8ef0,12)
> >  53950 egdb     RET   ptrace 0
> >  53950 egdb     CALL  ptrace(PT_GETREGS,161560,0x7f7ffffe8b40,0)
> >  53950 egdb     RET   ptrace 0
> >  53950 egdb     CALL  
> > futex(0x6444e37c490,0x82<FUTEX_WAKE|FUTEX_PRIVATE_FLAG>,1,0,0)
> >  53950 egdb     RET   futex 0
> >  53950 egdb     CALL  
> > futex(0x644bef12740,0x82<FUTEX_WAKE|FUTEX_PRIVATE_FLAG>,1,0,0)
> >  53950 egdb     RET   futex 0
> >  53950 egdb     CALL  ptrace(PT_IO,97562,0x7f7ffffe8a30,0)
> >  53950 egdb     RET   ptrace 0
> >  53950 egdb     CALL  ptrace(PT_IO,97562,0x7f7ffffe8a30,0)
> >  53950 egdb     RET   ptrace 0
> >  53950 egdb     CALL  ptrace(PT_STEP,97562,0x1,0)
> >  53950 egdb     RET   ptrace 0
> >  53950 egdb     CALL  read(6,0x7f7ffffe9187,0x1)
> >  53950 egdb     RET   read -1 errno 35 Resource temporarily unavailable
> >  53950 egdb     CALL  poll(0x6441581e720,3,0)
> >  53950 egdb     STRU  struct pollfd [3] { fd=4, events=0x1<POLLIN>, 
> > revents=0<> } { fd=6, events=0x1<POLLIN>, revents=0<> } { fd=10, 
> > events=0x1<POLLIN>, revents=0<> }
> >  53950 egdb     RET   poll 0
> >  53950 egdb     CALL  wait4(WAIT_ANY,0x7f7ffffe8efc,0<>,0)
> > 
> > Assuming that this is the actual end of the ktrace output and there
> > isn't some missing ktrace output in a buffer somewhere, this looks
> > like egdb is simply blocked in wait4(), which should be harmless and
> > certainly not uninterruptable?
> 
> The single-thread check done by wait4() is non-interruptible.
> When the debugger gets stuck, is it blocked in "suspend" state?
> 
> However, I think there is a bug in the single-thread switch code.
> It looks that ps_singlecount can be decremented too much. This probably
> is a regression of making ps_singlecount unsigned and letting
> single_thread_check() run without the kernel lock.
> 
> The bug might go away if single_thread_check() made sure that
> P_SUSPSINGLE is set before the thread suspends. 

Below is an updated patch for testing. It extends the scope of
SCHED_LOCK() so that there are fewer chances of interleaving of
single_thread_set() and single_thread_check().

One problem is that once single_thread_set() sets ps_single, the other
threads can enter the suspension code in single_thread_check(). The
extended lock scope prevents the threads from taking action before
single_thread_set() has finished with the state updates.

Index: kern/kern_sig.c
===================================================================
RCS file: src/sys/kern/kern_sig.c,v
retrieving revision 1.258
diff -u -p -r1.258 kern_sig.c
--- kern/kern_sig.c     15 Jun 2020 13:18:33 -0000      1.258
+++ kern/kern_sig.c     20 Jul 2020 13:29:50 -0000
@@ -1915,16 +1915,17 @@ single_thread_check(struct proc *p, int 
                                        return (EINTR);
                        }
 
+                       SCHED_LOCK(s);
                        if (atomic_dec_int_nv(&pr->ps_singlecount) == 0)
                                wakeup(&pr->ps_singlecount);
                        if (pr->ps_flags & PS_SINGLEEXIT) {
+                               SCHED_UNLOCK(s);
                                KERNEL_LOCK();
                                exit1(p, 0, 0, EXIT_THREAD_NOCHECK);
-                               KERNEL_UNLOCK();
+                               /* NOTREACHED */
                        }
 
                        /* not exiting and don't need to unwind, so suspend */
-                       SCHED_LOCK(s);
                        p->p_stat = SSTOP;
                        mi_switch();
                        SCHED_UNLOCK(s);
@@ -1950,7 +1951,7 @@ single_thread_set(struct proc *p, enum s
 {
        struct process *pr = p->p_p;
        struct proc *q;
-       int error;
+       int error, s;
 
        KERNEL_ASSERT_LOCKED();
        KASSERT(curproc == p);
@@ -1974,26 +1975,22 @@ single_thread_set(struct proc *p, enum s
                panic("single_thread_mode = %d", mode);
 #endif
        }
+       SCHED_LOCK(s);
        pr->ps_singlecount = 0;
        membar_producer();
        pr->ps_single = p;
        TAILQ_FOREACH(q, &pr->ps_threads, p_thr_link) {
-               int s;
-
                if (q == p)
                        continue;
                if (q->p_flag & P_WEXIT) {
                        if (mode == SINGLE_EXIT) {
-                               SCHED_LOCK(s);
                                if (q->p_stat == SSTOP) {
                                        setrunnable(q);
                                        atomic_inc_int(&pr->ps_singlecount);
                                }
-                               SCHED_UNLOCK(s);
                        }
                        continue;
                }
-               SCHED_LOCK(s);
                atomic_setbits_int(&q->p_flag, P_SUSPSINGLE);
                switch (q->p_stat) {
                case SIDL:
@@ -2027,8 +2024,8 @@ single_thread_set(struct proc *p, enum s
                        signotify(q);
                        break;
                }
-               SCHED_UNLOCK(s);
        }
+       SCHED_UNLOCK(s);
 
        if (mode != SINGLE_PTRACE)
                single_thread_wait(pr, 1);

Reply via email to