Control: tags -1 + moreinfo

On Thu, Sep 15, 2016 at 01:12:42PM +0200, Sergio Gelato wrote:
> Package: linux-image-3.16.0-4-amd64
> Version: 3.16.36-1+deb8u1
> 
> One of our systems is suddenly unable to stat() a particular file
> (cookies.sqlite-wal in a user's Firefox profile). Any attempt to
> do so hangs in the system call, as shown by strace. The file resides
> on an NFSv4 share (sec=krb5p). Other files in the same directory on
> the same share remain accessible. The affected file is normally accessible
> on the NFS server and from other NFS clients running the same kernel.
> 
> The user has reported a similar incident yesterday on some directories
> on a different NFS share (also sec=krb5p, but hosted on a different
> server). He rebooted to clear up the problem.
> 
> I'd like advice on how to troubleshoot this effectively. I've tried
> rpcdebug -m {nfs,rpc,nlm} -s all but didn't see any smoking gun; maybe
> some information cached by the kernel is suppressing NFS activity
> associated with the stat() calls. The log entries I do see say
> NFS: nfs_lookup_revalidate(cookies.sqlite-wal) is valid
> 
> Some related kernel traces from this system's logs (in chronological order,
> with an intervening reboot; the first trace is associated with yesterday's
> incident, the second trace is 2-3 minutes newer than the timestamp on
> cookies.sqlite-wal):
> 
> [97483.663949] INFO: task ls:23767 blocked for more than 120 seconds.
> [97483.663951]       Tainted: P        W  O  3.16.0-4-amd64 #1
> [97483.663952] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [97483.663954] ls              D ffff8800afd3b808     0 23767      1 
> 0x00000004
> [97483.663957]  ffff8800afd3b3b0 0000000000000086 0000000000012f40 
> ffff88039a057fd8
> [97483.663960]  0000000000012f40 ffff8800afd3b3b0 ffff88041ea137f0 
> ffff88041edcd128
> [97483.663962]  0000000000000002 ffffffff8113eb50 ffff88039a057d60 
> ffff88039a057e40
> [97483.663965] Call Trace:
> [97483.663968]  [<ffffffff8113eb50>] ? wait_on_page_read+0x60/0x60
> [97483.663971]  [<ffffffff81515069>] ? io_schedule+0x99/0x120
> [97483.663974]  [<ffffffff8113eb5a>] ? sleep_on_page+0xa/0x10
> [97483.663977]  [<ffffffff815153ec>] ? __wait_on_bit+0x5c/0x90
> [97483.663980]  [<ffffffff8113e956>] ? wait_on_page_bit+0xc6/0xd0
> [97483.663983]  [<ffffffff810a95c0>] ? autoremove_wake_function+0x30/0x30
> [97483.663986]  [<ffffffff8114c22d>] ? pagevec_lookup_tag+0x1d/0x30
> [97483.663989]  [<ffffffff8113ea30>] ? filemap_fdatawait_range+0xd0/0x160
> [97483.663993]  [<ffffffff81140656>] ? filemap_write_and_wait+0x36/0x50
> [97483.664002]  [<ffffffffa06c4c08>] ? nfs_getattr+0x108/0x220 [nfs]
> [97483.664005]  [<ffffffff811af057>] ? vfs_fstatat+0x57/0x90
> [97483.664009]  [<ffffffff811af51d>] ? SYSC_newlstat+0x1d/0x40
> [97483.664013]  [<ffffffff8151854d>] ? system_call_fast_compare_end+0x10/0x15
> 
> [ 9724.415533] INFO: task mozStorage #5:2748 blocked for more than 120 
> seconds.
> [ 9724.415537]       Tainted: P        W  O 3.16.0-4-amd64 #1
> [ 9724.415538] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [ 9724.415539] mozStorage #5   D ffff8803ee153a88     0  2748   2323 
> 0x00000000
> [ 9724.415542]  ffff8803ee153630 0000000000000082 0000000000012f40 
> ffff8803ee2cffd8
> [ 9724.415544]  0000000000012f40 ffff8803ee153630 ffff88041ea537f0 
> ffff88041edaf7a0
> [ 9724.415545]  0000000000000002 ffffffffa0669800 ffff8803ee2cfc30 
> ffff88040b1152c0
> [ 9724.415547] Call Trace:
> [ 9724.415557]  [<ffffffffa0669800>] ? nfs_pageio_doio+0x50/0x50 [nfs]
> [ 9724.415560]  [<ffffffff81515069>] ? io_schedule+0x99/0x120
> [ 9724.415563]  [<ffffffffa066980a>] ? nfs_wait_bit_uninterruptible+0xa/0x10 
> [nfs]
> [ 9724.415566]  [<ffffffff815153ec>] ? __wait_on_bit+0x5c/0x90
> [ 9724.415568]  [<ffffffff810738ba>] ? internal_add_timer+0x2a/0x70
> [ 9724.415571]  [<ffffffffa0669800>] ? nfs_pageio_doio+0x50/0x50 [nfs]
> [ 9724.415573]  [<ffffffff81515497>] ? out_of_line_wait_on_bit+0x77/0x90
> [ 9724.415575]  [<ffffffff810a95c0>] ? autoremove_wake_function+0x30/0x30
> [ 9724.415578]  [<ffffffffa066ef2e>] ? nfs_updatepage+0x15e/0x830 [nfs]
> [ 9724.415582]  [<ffffffffa06609a7>] ? nfs_write_end+0x57/0x320 [nfs]
> [ 9724.415585]  [<ffffffff81164d25>] ? 
> iov_iter_copy_from_user_atomic+0x75/0x190
> [ 9724.415588]  [<ffffffff8113f08b>] ? generic_perform_write+0x11b/0x1c0
> [ 9724.415590]  [<ffffffff81140f98>] ? __generic_file_write_iter+0x158/0x340
> [ 9724.415592]  [<ffffffff811411b9>] ? generic_file_write_iter+0x39/0xa0
> [ 9724.415595]  [<ffffffffa065fdb3>] ? nfs_file_write+0x83/0x1a0 [nfs]
> [ 9724.415598]  [<ffffffff811aa134>] ? new_sync_write+0x74/0xa0
> [ 9724.415600]  [<ffffffff811aa872>] ? vfs_write+0xb2/0x1f0
> [ 9724.415601]  [<ffffffff811ab3b2>] ? SyS_write+0x42/0xa0
> [ 9724.415603]  [<ffffffff811aa393>] ? SyS_lseek+0x43/0xa0
> [ 9724.415605]  [<ffffffff8151854d>] ? system_call_fast_compare_end+0x10/0x15

Do you still can reproduce the issue with a recent kernel?

Regards,
Salvatore

Reply via email to