On Mon, Sep 24, 2018 at 02:11:38PM +0200, Christian Brauner wrote: > On Mon, Sep 24, 2018, 14:03 Kees Bakker <[email protected]> wrote: > > > Same question again: what is the best approach to recover > > from a container in an ERROR state?
So another thing I would like to see is the current stack of the hung monitor process. Could you please paste (or send privately) the output of: cat /proc/<pid-of-hung-monitor-process>/stack Also, in what state is the monitor hung. Again in D state? Christian > > > > Please show me the dmesg output. If it is a kernel bug you're hitting > there's nothing that LXD can do to help you. > > > > This time it happened with Ubuntu 18.04 and LVM storage. > > > > The steps leading to this were as follows. It's just an FYI, I don't think > > it > > really matters, except for the stop and start. > > > > lvextend -L 20G local/containers_xyz > > resize2fs /dev/local/containers_xyz > > lxc stop xyz > > e2fsck -f /dev/local/containers_ > > lxc start xyz > > > > ... the start command hanged. > > > > Some output os ps auxfwww > > > > root 6224 0.0 0.0 22912 4096 pts/1 S sep06 0:00 > > | \_ -bash > > root 20900 0.0 0.0 1136140 12092 pts/1 Sl+ 12:19 0:00 > > | \_ lxc start xyz > > -- > > root 18157 3.5 4.2 5581444 1398904 ? Ssl sep12 611:36 > > /usr/lib/lxd/lxd --group lxd --logfile=/var/log/lxd/lxd.log > > root 20918 0.0 0.0 521720 19780 ? Sl 12:19 0:00 \_ > > /usr/lib/lxd/lxd forkstart xyz /var/lib/lxd/containers > > /var/log/lxd/xyz/lxc.conf > > root 20925 0.0 0.0 0 0 ? Z 12:19 0:00 \_ > > [lxd] <defunct> > > -- > > root 20926 0.0 0.0 530432 7280 ? Ss 12:19 0:00 [lxc > > monitor] /var/lib/lxd/containers xyz > > root 20943 0.0 0.0 530432 3484 ? D 12:19 0:00 \_ [lxc > > monitor] /var/lib/lxd/containers xyz > > > > > > > > On 11-09-18 15:13, Kees Bakker wrote: > > > Hey, > > > > > > Every now and then we have one or more containers in state ERROR. > > > Is there a clever method to recover from that, other than > > > rebooting the LXD server? > > > > > > Killing the monitor and the forkstart does help. And also a kworker > > > process (kworker/u16:0) is eating up one of the CPUs with 100% load. > > > lxc info gives "error: Monitor is hung" > > > > > > I'm running Ubuntu 16.04 with BTRFS. The kernel is 4.15.0-33-generic > > > > _______________________________________________ > > lxc-users mailing list > > [email protected] > > http://lists.linuxcontainers.org/listinfo/lxc-users _______________________________________________ lxc-users mailing list [email protected] http://lists.linuxcontainers.org/listinfo/lxc-users
