Control: tags -1 + moreinfo On Thu, Sep 15, 2016 at 01:12:42PM +0200, Sergio Gelato wrote: > Package: linux-image-3.16.0-4-amd64 > Version: 3.16.36-1+deb8u1 > > One of our systems is suddenly unable to stat() a particular file > (cookies.sqlite-wal in a user's Firefox profile). Any attempt to > do so hangs in the system call, as shown by strace. The file resides > on an NFSv4 share (sec=krb5p). Other files in the same directory on > the same share remain accessible. The affected file is normally accessible > on the NFS server and from other NFS clients running the same kernel. > > The user has reported a similar incident yesterday on some directories > on a different NFS share (also sec=krb5p, but hosted on a different > server). He rebooted to clear up the problem. > > I'd like advice on how to troubleshoot this effectively. I've tried > rpcdebug -m {nfs,rpc,nlm} -s all but didn't see any smoking gun; maybe > some information cached by the kernel is suppressing NFS activity > associated with the stat() calls. The log entries I do see say > NFS: nfs_lookup_revalidate(cookies.sqlite-wal) is valid > > Some related kernel traces from this system's logs (in chronological order, > with an intervening reboot; the first trace is associated with yesterday's > incident, the second trace is 2-3 minutes newer than the timestamp on > cookies.sqlite-wal): > > [97483.663949] INFO: task ls:23767 blocked for more than 120 seconds. > [97483.663951] Tainted: P W O 3.16.0-4-amd64 #1 > [97483.663952] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [97483.663954] ls D ffff8800afd3b808 0 23767 1 > 0x00000004 > [97483.663957] ffff8800afd3b3b0 0000000000000086 0000000000012f40 > ffff88039a057fd8 > [97483.663960] 0000000000012f40 ffff8800afd3b3b0 ffff88041ea137f0 > ffff88041edcd128 > [97483.663962] 0000000000000002 ffffffff8113eb50 ffff88039a057d60 > ffff88039a057e40 > [97483.663965] Call Trace: > [97483.663968] [<ffffffff8113eb50>] ? wait_on_page_read+0x60/0x60 > [97483.663971] [<ffffffff81515069>] ? io_schedule+0x99/0x120 > [97483.663974] [<ffffffff8113eb5a>] ? sleep_on_page+0xa/0x10 > [97483.663977] [<ffffffff815153ec>] ? __wait_on_bit+0x5c/0x90 > [97483.663980] [<ffffffff8113e956>] ? wait_on_page_bit+0xc6/0xd0 > [97483.663983] [<ffffffff810a95c0>] ? autoremove_wake_function+0x30/0x30 > [97483.663986] [<ffffffff8114c22d>] ? pagevec_lookup_tag+0x1d/0x30 > [97483.663989] [<ffffffff8113ea30>] ? filemap_fdatawait_range+0xd0/0x160 > [97483.663993] [<ffffffff81140656>] ? filemap_write_and_wait+0x36/0x50 > [97483.664002] [<ffffffffa06c4c08>] ? nfs_getattr+0x108/0x220 [nfs] > [97483.664005] [<ffffffff811af057>] ? vfs_fstatat+0x57/0x90 > [97483.664009] [<ffffffff811af51d>] ? SYSC_newlstat+0x1d/0x40 > [97483.664013] [<ffffffff8151854d>] ? system_call_fast_compare_end+0x10/0x15 > > [ 9724.415533] INFO: task mozStorage #5:2748 blocked for more than 120 > seconds. > [ 9724.415537] Tainted: P W O 3.16.0-4-amd64 #1 > [ 9724.415538] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 9724.415539] mozStorage #5 D ffff8803ee153a88 0 2748 2323 > 0x00000000 > [ 9724.415542] ffff8803ee153630 0000000000000082 0000000000012f40 > ffff8803ee2cffd8 > [ 9724.415544] 0000000000012f40 ffff8803ee153630 ffff88041ea537f0 > ffff88041edaf7a0 > [ 9724.415545] 0000000000000002 ffffffffa0669800 ffff8803ee2cfc30 > ffff88040b1152c0 > [ 9724.415547] Call Trace: > [ 9724.415557] [<ffffffffa0669800>] ? nfs_pageio_doio+0x50/0x50 [nfs] > [ 9724.415560] [<ffffffff81515069>] ? io_schedule+0x99/0x120 > [ 9724.415563] [<ffffffffa066980a>] ? nfs_wait_bit_uninterruptible+0xa/0x10 > [nfs] > [ 9724.415566] [<ffffffff815153ec>] ? __wait_on_bit+0x5c/0x90 > [ 9724.415568] [<ffffffff810738ba>] ? internal_add_timer+0x2a/0x70 > [ 9724.415571] [<ffffffffa0669800>] ? nfs_pageio_doio+0x50/0x50 [nfs] > [ 9724.415573] [<ffffffff81515497>] ? out_of_line_wait_on_bit+0x77/0x90 > [ 9724.415575] [<ffffffff810a95c0>] ? autoremove_wake_function+0x30/0x30 > [ 9724.415578] [<ffffffffa066ef2e>] ? nfs_updatepage+0x15e/0x830 [nfs] > [ 9724.415582] [<ffffffffa06609a7>] ? nfs_write_end+0x57/0x320 [nfs] > [ 9724.415585] [<ffffffff81164d25>] ? > iov_iter_copy_from_user_atomic+0x75/0x190 > [ 9724.415588] [<ffffffff8113f08b>] ? generic_perform_write+0x11b/0x1c0 > [ 9724.415590] [<ffffffff81140f98>] ? __generic_file_write_iter+0x158/0x340 > [ 9724.415592] [<ffffffff811411b9>] ? generic_file_write_iter+0x39/0xa0 > [ 9724.415595] [<ffffffffa065fdb3>] ? nfs_file_write+0x83/0x1a0 [nfs] > [ 9724.415598] [<ffffffff811aa134>] ? new_sync_write+0x74/0xa0 > [ 9724.415600] [<ffffffff811aa872>] ? vfs_write+0xb2/0x1f0 > [ 9724.415601] [<ffffffff811ab3b2>] ? SyS_write+0x42/0xa0 > [ 9724.415603] [<ffffffff811aa393>] ? SyS_lseek+0x43/0xa0 > [ 9724.415605] [<ffffffff8151854d>] ? system_call_fast_compare_end+0x10/0x15
Do you still can reproduce the issue with a recent kernel? Regards, Salvatore