Hello! On Fri, Oct 02, 2009 at 10:01:22AM -0600, Eric Blake wrote: > However, I'm wondering if I just tickled a bug that hung flubber; > I was running the gnulib test for rename(2), and lost all response from > the machine. After killing that make process, even simple commands like > "ls" failed to do anything for several seconds; when I hit Ctrl-C, I got: > > $ ls > ^Cls: cannot open directory .: Interrupted system call > > and after I exited my ssh session, I can't seem to get back on.
Yes, such things still happen on Hurd systems :-)... If this turns out to be reproducible, we'll obviously have a bigger chance of getting an understanding of this failure, and fixing it. Indeed, I also found the machine to be unresponsive, even on the Xen console, so I restarted the domU, and got this: /dev/hd2 was not cleanly unmounted, check forced. /dev/hd2: Invalid inode number for '.' in directory inode 804372. /dev/hd2: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. ... which also indicates that it is a filesystem-related issue: hd2 is where /home is on, and where your rename() was acting, I guess. Usually, from previous experience, when the system crashes (due to resource exhaustion in the kernel, for example), /home is fine, but the root filesystem has some minor inconsistencies (due to open files, etc., as I understand it). For reference, here's the complete fsck run; indeed indicating that inode 804372 is related to your gnulib test: tschwi...@zenhost:~ $ sudo e2fsck -f /dev/zenhost/flubber-data e2fsck 1.41.9 (22-Aug-2009) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Invalid inode number for '.' in directory inode 804372. Fix<y>? yes Pass 3: Checking directory connectivity '..' in /home/ericb/gnulib/testdir1215/build/gltests/test-rename.tdir2 (804373) is /home/ericb/gnulib/testdir1215/build/gltests/test-rename.tdir (804372), should be /home/ericb/gnulib/testdir1215/build/gltests (639644). Fix<y>? yes Pass 3A: Optimizing directories Pass 4: Checking reference counts Inode 639644 ref count is 9, should be 8. Fix<y>? yes Inode 639925 ref count is 2, should be 1. Fix<y>? yes Inode 804373 ref count is 3, should be 2. Fix<y>? yes Pass 5: Checking group summary information Block bitmap differences: +(1286463--1286464) +(1617297--1617298) +(1617300--1617301) Fix<y>? yes Free blocks count wrong for group #39 (28393, counted=28391). Fix<y>? yes Free blocks count wrong for group #49 (28617, counted=28613). Fix<y>? yes Free blocks count wrong (904952, counted=904946). Fix<y>? yes Inode bitmap differences: +(639924--639925) +(804370--804373) Fix<y>? yes Free inodes count wrong for group #39 (15157, counted=15155). Fix<y>? yes Free inodes count wrong for group #49 (14830, counted=14826). Fix<y>? yes Directories count wrong for group #49 (142, counted=146). Fix<y>? yes Free inodes count wrong (672146, counted=672140). Fix<y>? yes /dev/zenhost/flubber-data: ***** FILE SYSTEM WAS MODIFIED ***** /dev/zenhost/flubber-data: 245364/917504 files (5.1% non-contiguous), 930062/1835008 blocks flubber is up again. As that system is primarily used for non-system-destabilizing things (apart from encountering them for the first time, obviously), I tend to use a different system for further exploring such issues: grubber (also listed on <http://www.gnu.org/software/hurd/public_hurd_boxen.html>). If you want to explore this rename() issue further (first try to reproduce it), would it be fine for you to move this testing to grubber (you already got an account on there), which is otherwise unused at the moment? Also, if you want, I wouldn't hesitate to give you root access, and access to the dom0 so that you can restart hanging / crashed domUs. Unfortunately, at the moment I don't have time to address the other topics you and Samuel came up with, but I'll keep the emails around. Thanks for helping! Regards, Thomas
signature.asc
Description: Digital signature