On Fri, Feb 27, 2004 at 04:50:58PM -0500, Roland McGrath wrote: > Firstly, you are using the second patch I posted, not the first, right?
Yes. > > > > 1.) Quite often (seemingly random) a bogus gnu.author gets displayed by > > getfattr: > > > > blackbird/mnt$ getfattr -Rh -d -m '.*' gnu/servers > > # file: gnu/servers/socket > > gnu.author=0shgAAAA== > > Is it really random? Random in a sense that I get different values for different files and different rounds of unpacking. > Anyway, hgAAAA== in base64 is little-endian 134. Is that a uid you > might be using? No, and it is not present on my system. This might illustrate the 'randomness': blackbird/gnu/home/mbanck/gnu$ for i in `seq 1 20`; do touch $i; setfattr -n "gnu.translator" -v "/hurd/foo\0" $i; done blackbird/gnu/home/mbanck/gnu$ getfattr -e hex -Rh -d -m '.*' [0-9]* | grep author | grep -v '0x00' gnu.author=0x28000000 gnu.author=0x30990408 blackbird/gnu/home/mbanck/gnu$ > > 2.) Sometimes, I/O errors get reported. They seem to be coupled to star, > > as every invocation of getfattr yields the same error for a set of > > extracted translators. > [...] > > Such an I/O error gets marked in the syslog as > > Feb 27 21:51:17 blackbird kernel: hda5: rw=0, want=2061425788, limit=196497 > > Feb 27 21:51:48 blackbird kernel: attempt to access beyond end of device > > That sure looks like a bug in my code. Please use strace on star to > ascertain the sequence of *xattr calls and other operations on those files > that results in this lossage. This is strace on star: lstat64("gnu/servers/socket/2", 0xbfffd4f0) = -1 ENOENT (No such file or directory) open("gnu/servers/socket/2", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 4 fcntl64(4, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40976000 _llseek(4, 0, [0], SEEK_CUR) = 0 munmap(0x40976000, 4096) = 0 fcntl64(4, F_GETFD) = 0 fcntl64(4, F_GETFD) = 0 fsync(4) = 0 close(4) = 0 utimes("gnu/servers/socket/2", {1057965533, 0}) = 0 chmod("gnu/servers/socket/2", 0666) = 0 setxattr("gnu/servers/socket/2", "gnu.translator", 0x808c110, 72, ) = 0 lchown32("gnu/servers/socket/2", 0, 0) = 0 utimes("gnu/servers/socket/2", {1057965533, 0}) = 0 [...] lstat64("gnu/servers/exec", 0xbfffd4f0) = -1 ENOENT (No such file or directory) open("gnu/servers/exec", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 4 fcntl64(4, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40976000 _llseek(4, 0, [0], SEEK_CUR) = 0 munmap(0x40976000, 4096) = 0 fcntl64(4, F_GETFD) = 0 fcntl64(4, F_GETFD) = 0 fsync(4) = 0 close(4) = 0 utimes("gnu/servers/exec", {1057965161, 0}) = 0 chmod("gnu/servers/exec", 0644) = 0 setxattr("gnu/servers/exec", "gnu.translator", 0x808c110, 11, <unfinished ...> > Then try to make a simple script using just the *fattr commands and > the necessary simple file commands to elicit the bug. Well, this works for me: blackbird/mnt$ su -c 'for i in `seq 1 999`; do echo $i; touch $i; setfattr -n "gnu.translator" -v "/hurd/foo\0" $i; getfattr -Rh -d -m "gnu.translator" $i > /dev/null || echo "$i: $?"; done' Password: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ... here it hangs and I get those block=3221224218, b_blocknr=18446744072635808536 b_state=0x00000010, b_size=1024 on syslog again (the actual numbers are different, but /var/log is full so I only get those on a virtual console I have syslog on). (The previous run went until 10, but I also managed to {get,set}fattr more than 100 times before an error occured) > Your output here tells us that getxattr tried to read this bogus block > number, presumably because that bogus value was recorded as the translator > block. > > Feb 27 12:00:56 blackbird kernel: buffer layer error at fs/buffer.c:430 > [...] > > Feb 27 12:00:56 blackbird kernel: block=3221224218, b_blocknr=18446744072635808536 > > Feb 27 12:00:56 blackbird kernel: b_state=0x00000010, b_size=1024 > > I believe this is just more fallout from the attempt to read a bogus block > number previously. Does the bogus number 3221224218 appear in other errors? It does not appear so. That number/that message gets written just before the call traces and from then on ad ininitum. > > I hope this information is somewhat useful, just tell me what I > > should do else. > > After you have any such errors, unmount the filesystem unmounting sometimes does not work, I then get: blackbird/$ sudo umount mnt umount: /mnt: device is busy umount: /mnt: device is busy (The message is really printed out twice) > (or reboot if you are screwed, whatever). Then run e2fsck on the disk > (make sure this doesn't happen automatically at boot from /etc/fstab). > First use e2fsck -n and save the output to post here. e2fsck 1.35-WIP (31-Jan-2004) Pass 1: Checking inodes, blocks, and sizes Inode 18446 has illegal block(s). Clear? no Illegal block #-4 (708457779) in inode 18446. IGNORED. Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/hda5: ********** WARNING: Filesystem still has errors ********** /dev/hda5: 27/24576 files (0.0% non-contiguous), 3134/98248 blocks debugfs: ncheck 18446 Inode Pathname 18446 /test/gnu/servers/crash-dump-core (which was the offending translator at that time) Michael _______________________________________________ Bug-hurd mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-hurd