On Thu, Jul 13, 2017 at 12:54 PM, Ilya Dryomov <[email protected]> wrote:
> On Wed, Jul 12, 2017 at 7:15 PM, Nick Fisk <[email protected]> wrote:
>>> Hi Ilya,
>>>
>>> I have managed today to capture the kernel logs with debugging turned on
>>> and the ms+osd debug logs from the mentioned OSD.
>>> However, this is from a few minutes after the stall starts, not before. The
>>> very random nature of the stalls have made it difficult to
>>> have debug logging on for extended periods of time. I hope there is
>>> something in here which might give a clue to what is happening,
>>> otherwise I will continue to try and capture debugs from before the stall
>>> occurs.
>>>
>>> The kernel logs and OSD logs are attached as a zip.
>>>
>>> Sample of the hung requests
>>> Wed 12 Jul 11:28:01 BST 2017
>>> REQUESTS 8 homeless 0
>>> 11457738 osd37 17.be732844 [37,74,14]/37 [37,74,14]/37
>>> rbd_data.15d8670238e1f29.00000000001a2068 0x400024 1
>>> 0'0 set-alloc-hint,write
>>> 11457759 osd37 17.9e3d4404 [37,74,14]/37 [37,74,14]/37
>>> rbd_data.15d8670238e1f29.0000000000118149 0x400024 1
>>> 0'0 set-alloc-hint,write
>>> 11457770 osd37 17.86ec0d14 [37,72,74]/37 [37,72,74]/37
>>> rbd_data.15d8670238e1f29.000000000006665c 0x400024 1
>>> 0'0 set-alloc-hint,write
>>> 11457818 osd37 17.e80ed1c0 [37,75,14]/37 [37,75,14]/37
>>> rbd_data.158f204238e1f29.00000000000d7f1e 0x400014 1
>>> 0'0 read
>>> 11457822 osd37 17.9db0a684 [37,74,14]/37 [37,74,14]/37
>>> rbd_data.15d8670238e1f29.000000000000211c 0x400024 1
>>> 0'0 set-alloc-hint,write
>>> 11457916 osd37 17.1848293b [37,3,73]/37 [37,3,73]/37
>>> rbd_data.158f204238e1f29.00000000000d7f8e 0x400024 1 0'0
>>> set-alloc-hint,write
>>> 11457967 osd37 17.56b0f4c0 [37,75,14]/37 [37,75,14]/37
>>> rbd_data.158f204238e1f29.0000000000047ffc 0x400014 1 0'0
>>> read
>>> 11457970 osd37 17.65ad6d40 [37,75,14]/37 [37,75,14]/37
>>> rbd_data.15d8670238e1f29.0000000000060318 0x400024 1
>>> 0'0 set-alloc-hint,write
>>>
>>> Also a nice hung_task_timeout
>>> [57154.424300] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>>> this message.
>>> [57154.424331] nfsd D 0 1650 2 0x00000000
>>> [57154.424334] Call Trace:
>>> [57154.424341] __schedule+0x3c6/0x8c0
>>> [57154.424344] schedule+0x36/0x80
>>> [57154.424346] rwsem_down_write_failed+0x230/0x3a0
>>> [57154.424389] ? xfs_file_buffered_aio_write+0x68/0x270 [xfs]
>>> [57154.424392] call_rwsem_down_write_failed+0x17/0x30
>>> [57154.424394] ? call_rwsem_down_write_failed+0x17/0x30
>>> [57154.424396] down_write+0x2d/0x40
>>> [57154.424422] xfs_ilock+0xa7/0x110 [xfs] [57154.424446]
>>> xfs_file_buffered_aio_write+0x68/0x270 [xfs] [57154.424449] ?
>>> iput+0x8a/0x230 [57154.424450] ? __check_object_size+0x100/0x19d
>>> [57154.424474] xfs_file_write_iter+0x103/0x150 [xfs]
>>> [57154.424482] __do_readv_writev+0x2fb/0x390 [57154.424484]
>>> do_readv_writev+0x8d/0xe0 [57154.424487] ?
>>> security_file_open+0x8a/0x90 [57154.424488] ? do_dentry_open+0x27a/0x310
>>> [57154.424511] ?
>>> xfs_extent_busy_ag_cmp+0x20/0x20 [xfs] [57154.424513] vfs_writev+0x3c/0x50
>>> [57154.424514] ? vfs_writev+0x3c/0x50
>>> [57154.424525] nfsd_vfs_write+0xc6/0x3a0 [nfsd] [57154.424531]
>>> nfsd_write+0x144/0x200 [nfsd] [57154.424538]
>>> nfsd3_proc_write+0xaa/0x140 [nfsd] [57154.424549] nfsd_dispatch+0xc8/0x260
>>> [nfsd] [57154.424566]
>>> svc_process_common+0x374/0x6b0 [sunrpc] [57154.424575]
>>> svc_process+0xfe/0x1b0 [sunrpc] [57154.424581] nfsd+0xe9/0x150
>>> [nfsd] [57154.424583] kthread+0x109/0x140 [57154.424588] ?
>>> nfsd_destroy+0x60/0x60 [nfsd] [57154.424590] ?
>>> kthread_create_on_node+0x70/0x70 [57154.424592] ret_from_fork+0x2c/0x40
>>>
>>> Regards,
>>> Nick
>>>
>>>
>>
>> Just looking into this some more, I have found laggy requests to osd37 in
>> the kernel logs:
>>
>> Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685559] libceph: req
>> ffff9adad97a1450 tid 11457738 on osd37 is laggy
>> Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685560] libceph: req
>> ffff9adad0df9040 tid 11457759 on osd37 is laggy
>> Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685561] libceph: req
>> ffff9adad97a5d70 tid 11457770 on osd37 is laggy
>> Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685562] libceph: req
>> ffff9adad9036590 tid 11457818 on osd37 is laggy
>> Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685563] libceph: req
>> ffff9adad9035960 tid 11457822 on osd37 is laggy
>> Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685564] libceph: req
>> ffff9adad9036db0 tid 11457916 on osd37 is laggy
>> Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685565] libceph: req
>> ffff9adad90330c0 tid 11457967 on osd37 is laggy
>> Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685565] libceph: req
>> ffff9adad97a5550 tid 11457970 on osd37 is laggy
>
> Hi Nick,
>
> I took a look, but didn't see anything useful. There are gaps in these
> excerpts, like:
>
> Jul 12 11:26:10 MS-CEPH-Proxy1 kernel: [66716.981713] libceph:
> try_write start ffff9adaebf00288 state 5
> Jul 12 11:26:10 MS-CEPH-Proxy1 kernel: [66716.981714] libceph:
> try_write out_kvec_bytes 0
> Jul 12 11:26:10 MS-CEPH-Proxy1 kernel: [66716.981716] libceph:
> try_write nothing else to write.
> Jul 12 11:26:10 MS-CEPH-Proxy1 kernel: [66716.981716] libceph:
> try_write done on ffff9adaebf00288 ret 0
> Jul 12 11:26:12 MS-CEPH-Proxy1 systemd[1]: Started Session 2398 of
> user sysprosupport.
> Jul 12 11:26:16 MS-CEPH-Proxy1 postfix/pickup[13798]: 042FC385:
> uid=1000 from=<sysprosupport>
> Jul 12 11:26:16 MS-CEPH-Proxy1 postfix/cleanup[28873]: 042FC385:
> message-id=<20170712102616.042FC385@MS-CEPH-Proxy1>
> Jul 12 11:26:16 MS-CEPH-Proxy1 postfix/qmgr[2151]: 042FC385:
> from=<sysprosupport@MS-CEPH-Proxy1>, size=487, nrcpt=1 (queue active)
> Jul 12 11:26:16 MS-CEPH-Proxy1 postfix/local[28875]: 042FC385:
> to=<root@MS-CEPH-Proxy1>, orig_to=<root>, relay=local, delay=0.24,
> delays=0.11/0.11/0/0.01, dsn=2.0.0, status=sent (delivered to mailbox)
> Jul 12 11:26:16 MS-CEPH-Proxy1 postfix/qmgr[2151]: 042FC385: removed
> Jul 12 11:26:18 MS-CEPH-Proxy1 systemd[1]: Started Session c2 of user root.
> Jul 12 11:27:01 MS-CEPH-Proxy1 CRON[29822]: (root) CMD ( date >>
> /home/sysprosupport/osdc.txt;cat /sys/kernel/debug/ceph/*/osdc >>
> /home/sysprosupport/osdc.txt)
> Jul 12 11:27:01 MS-CEPH-Proxy1 CRON[29823]: (root) CMD (date >>
> /home/sysprosupport/buddy.txt;cat /proc/buddyinfo >>
> /home/sysprosupport/buddy.txt)
> Jul 12 11:28:01 MS-CEPH-Proxy1 CRON[31057]: (root) CMD (date >>
> /home/sysprosupport/buddy.txt;cat /proc/buddyinfo >>
> /home/sysprosupport/buddy.txt)
> Jul 12 11:28:01 MS-CEPH-Proxy1 CRON[31058]: (root) CMD ( date >>
> /home/sysprosupport/osdc.txt;cat /sys/kernel/debug/ceph/*/osdc >>
> /home/sysprosupport/osdc.txt)
> Jul 12 11:29:01 MS-CEPH-Proxy1 CRON[32293]: (root) CMD (date >>
> /home/sysprosupport/buddy.txt;cat /proc/buddyinfo >>
> /home/sysprosupport/buddy.txt)
> Jul 12 11:29:01 MS-CEPH-Proxy1 CRON[32294]: (root) CMD ( date >>
> /home/sysprosupport/osdc.txt;cat /sys/kernel/debug/ceph/*/osdc >>
> /home/sysprosupport/osdc.txt)
>
> There is no way the kernel client would suddently stay silent for
> minutes. Looks like your osdc cron job was running - could you attach
> all output between 10:00 and 12:00? I'd like to see when the first of
> these 8 requests showed up and when they cleared up. It looks like
> they cleared up at around 11:35, but I want to confirm.
>
>>
>> And have found in the kernel client where it displays that message.
>>
>> https://github.com/ceph/ceph-client/blob/master/net/ceph/osd_client.c#L3073-L3098
>>
>> Does that suggest for some reason the TCP connection is getting broken and
>> causing the hang? For whatever reason it can't re-establish, or takes longer
>> than it should? And thus causes the outage?
>
> It could be. When the kernel client detects laggy requests, it sends
> the keepalive tag -- so in this case it should have been sending it to
> osd37 every 5 seconds. If those keepalives (along with the original
> OSD requests) were going nowhere until the networking stack closed the
> socket
>
> Jul 12 11:35:34 MS-CEPH-Proxy1 kernel: [67280.665519] libceph: osd37
> 10.3.31.14:6829 socket closed (con state OPEN)
>
> thus triggering the resend of the OSD requests, that would certainly do
> it ;) Who closed osd37 socket at 11:35 (the OSD or the kernel) was one
> of the things I was going to check for, but the OSD log ends at 11:28.
>
> Try capturing netstat output along with osdc and osdmap from debugfs
> every 5-10 seconds on the kernel client boxes. That's not a lot of
> output and correlating send/recv queue sizes for IPs from osdmap to
> osdc might help.
Hi Nick,
Any progress on this?
Thanks,
Ilya
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com