On 18-07-2010 15:02, Gavin Atkinson wrote:
On Sat, 17 Jul 2010, Gavin Atkinson wrote:
Semi-regularly (every two-three days) I'm seeing what appears to be some
sort of filesystem wedge. I usually see it initially with web browsers,
but it's possible that's only because it's what produces most disk
activity on this machine. I've seen it with both Opera and Firefox.
I've been seeing this too. It still happens with kernel r211000.
What happens is that the process will just wedge. A "procstat -kk" on it
shows the following stack backtrace:
9012 100243 firefox-bin initial thread mi_switch+0x21d
sleepq_switch+0x123 sleepq_wait+0x4d _sleep+0x357 getdirtybuf+0x21e
flush_deplist+0x6f softdep_sync_metadata+0x153 ffs_syncvnode+0x213
ffs_fsync+0x43 fsync+0x148 syscallenter+0x1b5 syscall+0x4c
Xfast_syscall+0xe2
Firefox is the usual first sign:
acer % ps ax|grep firefox
82117 v0 T 16:24,08 /usr/local/lib/firefox3/firefox-bin
13416 3 S+ 0:00,00 grep firefox
acer % procstat -kk 82117
PID TID COMM TDNAME KSTACK
82117 100195 firefox-bin - mi_switch+0x219
thread_suspend_switch+0x103 thread_single+0x25c exit1+0x81 sigexit+0x84
cursig+0 ast+0x1aa doreti_ast+0x1f
82117 100221 firefox-bin initial thread mi_switch+0x219
sleepq_switch+0xfa sleepq_wait+0x46 _sleep+0x256 getdirtybuf+0x1af
flush_deplist+0x6a softdep_sync_metadata+0x153 ffs_syncvnode+0x22d
ffs_fsync+0x43 fsync+0x13d syscallenter+0x194 syscall+0x41
Xfast_syscall+0xe2
acer %
A bit more detail: it does look like whatever is supposed to periodically
flush the journal just stops doing it's job. Presumably this is also the
root cause of the "softdep: Out of journal space!" messages I have been
seeing in the past, which I had assumed may have been fixed by r209717.
I haven't seen any "softdep: Out of journal space!" messages since June
24, but I've indeed seen it once before (somewhere after June 11).
(I'm running r209723 at the moment)
While processes are starting to hang, "sh ffs" from ddb shows:
db> sh ffs
mp 0xffffff0002c45be0 / devvp 0xffffff0002c51000 fs 0xffffff0002c67000 su_wl 0
su_wl_in 0 su_deps 0 su_req 0
mp 0xffffff0002d705f0 /tmp devvp 0xffffff0002d48780 fs 0xffffff0002c64800 su_wl
0 su_wl_in 0 su_deps 0 su_req 0
mp 0xffffff0002c458e8 /usr devvp 0xffffff0002d485a0 fs 0xffffff0002c66000 su_wl
0 su_wl_in 0 su_deps 17345 su_req 0
mp 0xffffff0002c455f0 /var devvp 0xffffff0002d483c0 fs 0xffffff0002c66800 su_wl
0 su_wl_in 0 su_deps 55 su_req 0
Leaving it another couple of hours, I then see:
db> sh ffs
mp 0xffffff0002c45be0 / devvp 0xffffff0002c51000 fs 0xffffff0002c67000 su_wl 0
su_wl_in 0 su_deps 0 su_req 0
mp 0xffffff0002d705f0 /tmp devvp 0xffffff0002d48780 fs 0xffffff0002c64800 su_wl
0 su_wl_in 0 su_deps 36 su_req 0
mp 0xffffff0002c458e8 /usr devvp 0xffffff0002d485a0 fs 0xffffff0002c66000 su_wl
0 su_wl_in 0 su_deps 31899 su_req 0
mp 0xffffff0002c455f0 /var devvp 0xffffff0002d483c0 fs 0xffffff0002c66800 su_wl
0 su_wl_in 0 su_deps 95 su_req 0
so, su_deps is increasing significantly.
During reboot, vnlru failed to stop within 60 seconds, and gave up on
syncing 125 vnodes and 140 buffers (no idea if these are related). On
reboot, SU+J fsck shows for /usr:
** SU+J Recovering /dev/ad4s1f
** Reading 33554432 byte journal from inode 150.
** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.
** 405991 journal records in 18194944 bytes for 71.40% utilization
** Freed 3872 inodes (0 dirs) 48157 blocks, and 8744 frags.
Similar here.
So it seems clear that somehow the journal is filling up, and never being
written.
Any other suggestions as to where I should go from here?
Disabling the journal would be a "solution", but not desirable.
Maybe any lock order reversals to look out for (most are ufs-related) ?
I don't know if it is related, but yesterday a full fsck on /usr cleared
up two unallocated files in /usr/ports/editors/openoffice-3/work/ (they
were in userland as having a bad file descriptor), which the journal
didn't catch.
Regards,
Rene
--
http://www.rene-ladan.nl/
GPG fingerprint = ADBC ECCD EB5F A6B4 549F 600D 8C9E 647A E564 2BFC
(subkeys.pgp.net)
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"