We're still seeing this, but with different workloads than before (i.e., the workload which was affected in our earlier reports was fixed by http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=54;bug=517449, but there are other workloads under which we're now observing this).
The common thread seems to be a single task suddenly performing lots of I/O on an otherwise very lightly loaded machine. updatedb.mlocate has triggered this a few times; other times it's been a single KVM virtual machine on an otherwise idle physical machine. Sometimes it's that particular task that blocks and triggers this warning; other times it's unrelated tasks (sshd, for example). This seems to happen most readily with NFS I/O when network latency is relatively high (we have a few cross-country NFS mounts that aren't the fastest thing on four wheels), but we've also seen this with predominantly local disk I/O. FWIW, this is with kernels that have the performance regression in the RPC auth code (#524199) fixed. The machines don't seem to lock up as was the case before, but the affected tasks are obviously unresponsive. john [697382.442558] INFO: task sshd:27512 blocked for more than 120 seconds. [697382.442558] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [697382.442558] sshd D ffff810001031250 0 27512 3826 [697382.442558] ffff8105bc8d9938 0000000000000086 0000000000000010 ffff81082ccdd000 [697382.442558] ffff81082b13ead0 ffff8108299a2910 ffff81082b13ed58 0000000400000000 [697382.442558] 0f480000bc8d9928 ffffffff80295f4d 0000000000011200 0000000000000292 [697382.442558] Call Trace: [697382.442558] [<ffffffff80295f4d>] cache_alloc_refill+0x111/0x20c [697382.442558] [<ffffffffa02893ca>] :sunrpc:rpc_wait_bit_killable+0x0/0x31 [697382.442558] [<ffffffffa02893f4>] :sunrpc:rpc_wait_bit_killable+0x2a/0x31 [697382.442558] [<ffffffff80429332>] __wait_on_bit+0x40/0x6e [697382.442558] [<ffffffffa02893ca>] :sunrpc:rpc_wait_bit_killable+0x0/0x31 [697382.442558] [<ffffffff804293cc>] out_of_line_wait_on_bit+0x6c/0x78 [697382.442558] [<ffffffff8024622f>] wake_bit_function+0x0/0x23 [697382.442558] [<ffffffffa02862e9>] :sunrpc:xprt_connect+0x89/0x123 [697382.442558] [<ffffffffa028998f>] :sunrpc:__rpc_execute+0xe6/0x223 [697382.442558] [<ffffffffa0283bcb>] :sunrpc:rpc_run_task+0x4f/0x56 [697382.442558] [<ffffffffa0283c67>] :sunrpc:rpc_call_sync+0x3e/0x5b [697382.442558] [<ffffffffa02dbefe>] :nfs:nfs3_rpc_wrapper+0x19/0x50 [697382.442558] [<ffffffffa02dc5cf>] :nfs:nfs3_proc_access+0x125/0x191 [697382.442558] [<ffffffff8023cfeb>] try_to_del_timer_sync+0x51/0x5a [697382.442558] [<ffffffff8023d000>] del_timer_sync+0xc/0x16 [697382.442558] [<ffffffff80246388>] remove_wait_queue+0x12/0x45 [697382.442558] [<ffffffff802a6eeb>] free_poll_entry+0x11/0x1a [697382.442558] [<ffffffff802a6f1d>] poll_freewait+0x29/0x6a [697382.442558] [<ffffffffa02cd1e8>] :nfs:nfs_do_access+0x163/0x30c [697382.442558] [<ffffffffa02cd481>] :nfs:nfs_permission+0xf0/0x15f [697382.442558] [<ffffffff802a21f7>] permission+0xb5/0x118 [697382.442558] [<ffffffff802a377f>] __link_path_walk+0x150/0xd05 [697382.442558] [<ffffffff8021a4ee>] apic_wait_icr_idle+0xe/0x15 [697382.442558] [<ffffffff8021a250>] native_flush_tlb_others+0x5d/0x83 [697382.442558] [<ffffffff802a437a>] path_walk+0x46/0x8b [697382.442558] [<ffffffff802a46a6>] do_path_lookup+0x158/0x1cf [697382.442558] [<ffffffff802a51ea>] __path_lookup_intent_open+0x56/0x96 [697382.442558] [<ffffffff802a531f>] do_filp_open+0x9c/0x7c4 [697382.442558] [<ffffffff8023e186>] lock_task_sighand+0x2c/0x52 [697382.442558] [<ffffffff80221fbc>] do_page_fault+0x5d8/0x9c8 [697382.442558] [<ffffffff802995a0>] get_unused_fd_flags+0x71/0x115 [697382.442558] [<ffffffff8029968a>] do_sys_open+0x46/0xc3 [697382.442558] [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f [1726155.948043] INFO: task bzip2:456 blocked for more than 120 seconds. [1726155.949118] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1726155.950281] bzip2 D ffff8100010190f0 0 456 450 [1726155.950285] ffff810008209c78 0000000000000046 ffff8101001334c8 ffffffffa01d2b18 [1726155.950289] ffff81011fa90080 ffff81011fba5120 ffff81011fa90308 00000000001334c8 [1726155.950291] ffff81011d1a4800 ffffffffa01cecff 0000000000000001 0000000000000000 [1726155.950294] Call Trace: [1726155.950363] [<ffffffffa01d2b18>] :sunrpc:rpc_sleep_on+0x21/0x2a3 [1726155.950376] [<ffffffffa01cecff>] :sunrpc:xprt_reserve+0x14b/0x15a [1726155.950429] [<ffffffffa021fa73>] :nfs:nfs_wait_bit_killable+0x0/0x30 [1726155.950440] [<ffffffffa021fa9d>] :nfs:nfs_wait_bit_killable+0x2a/0x30 [1726155.950466] [<ffffffff80429312>] __wait_on_bit+0x40/0x6e [1726155.950477] [<ffffffffa021fa73>] :nfs:nfs_wait_bit_killable+0x0/0x30 [1726155.950481] [<ffffffff804293ac>] out_of_line_wait_on_bit+0x6c/0x78 [1726155.950495] [<ffffffff80246273>] wake_bit_function+0x0/0x23 [1726155.950508] [<ffffffffa0223565>] :nfs:nfs_sync_mapping_wait+0xed/0x298 [1726155.950521] [<ffffffffa02238f2>] :nfs:__nfs_write_mapping+0x2c/0x4f [1726155.950533] [<ffffffffa0223958>] :nfs:nfs_write_mapping+0x43/0x65 [1726155.950544] [<ffffffffa0218604>] :nfs:nfs_do_fsync+0x19/0x34 [1726155.950553] [<ffffffffa0218a61>] :nfs:nfs_file_flush+0x84/0xac [1726155.950568] [<ffffffff80299549>] filp_close+0x37/0x65 [1726155.950583] [<ffffffff80236776>] put_files_struct+0x66/0xc4 [1726155.950586] [<ffffffff80237a03>] do_exit+0x274/0x6a6 [1726155.950603] [<ffffffff8031e0ef>] __up_write+0x21/0x10e [1726155.950607] [<ffffffff80237ea2>] do_group_exit+0x6d/0x9d [1726155.950609] [<ffffffff80237ee4>] sys_exit_group+0x12/0x16 [1726155.950615] [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f [22652485.379503] INFO: task kvm:5135 blocked for more than 120 seconds. [22652485.379503] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [22652485.379503] kvm D 0000000000000000 0 5135 1 [22652485.379503] ffff81042401fbe8 0000000000000082 0000000000000000 ffff810008a652f0 [22652485.379503] ffff81042bbdf7b0 ffffffff804f8480 ffff81042bbdfa38 00000000b6acf690 [22652485.379503] 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [22652485.379503] Call Trace: [22652485.379503] [<ffffffff80271066>] sync_page+0x0/0x41 [22652485.379503] [<ffffffff804284e7>] io_schedule+0x5c/0x9e [22652485.379503] [<ffffffff802710a2>] sync_page+0x3c/0x41 [22652485.379503] [<ffffffff80428742>] __wait_on_bit+0x40/0x6e [22652485.379503] [<ffffffff802712ed>] wait_on_page_bit+0x6b/0x71 [22652485.379503] [<ffffffff80246213>] wake_bit_function+0x0/0x23 [22652485.379503] [<ffffffff80278dca>] pagevec_lookup_tag+0x1a/0x21 [22652485.379503] [<ffffffff80271804>] wait_on_page_writeback_range+0x66/0x113 [22652485.379503] [<ffffffff80277732>] do_writepages+0x27/0x2d [22652485.379503] [<ffffffff802b66ee>] generic_osync_inode+0x5e/0xcf [22652485.379503] [<ffffffff80272771>] generic_file_aio_write+0xa6/0xc1 [22652485.379503] [<ffffffffa018025a>] :ext3:ext3_file_write+0x16/0x94 [22652485.379503] [<ffffffff8029ac3b>] do_sync_write+0xc9/0x10c [22652485.379503] [<ffffffff802461e5>] autoremove_wake_function+0x0/0x2e [22652485.379503] [<ffffffffa02aa36e>] :kvm:kvm_vm_ioctl+0x19c/0x1b5 [22652485.379503] [<ffffffff8023fc52>] sys_rt_sigtimedwait+0xf1/0x25f [22652485.379503] [<ffffffff8029b3e5>] vfs_write+0xad/0x156 [22652485.379503] [<ffffffff8029b987>] sys_write+0x45/0x6e [22652485.379503] [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f -- John Morrissey _o /\ ---- __o j...@horde.net _-< \_ / \ ---- < \, www.horde.net/ __(_)/_(_)________/ \_______(_) /_(_)__ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org