dump_historic_ops, slow requests

Deneau, Tom Mon, 12 Oct 2015 14:23:33 -0700

I have a small ceph cluster (3 nodes, 5 osds each, journals all just partitions
on the spinner disks) and I have noticed that when I hit it with a bunch of
rados bench clients all doing writes of large (40M objects) with --no-cleanup,
the rados bench commands seem to finish OK but I often get health warnings like
    HEALTH_WARN 4 requests are blocked > 32 sec;
                2 osds have slow requests 3 ops are blocked > 32.768 sec on 
osd.9
                1 ops are blocked > 32.768 sec on osd.10
                2 osds have slow requests
After a couple of minutes, health goes to HEALTH_OK.


But if I go to the node containing osd.10 for example and do dump_historic_ops
I do get lots of around 20-sec durations but nothing over 32 sec.

The 20-sec or so ops are always  "ack+ondisk+write+known_if_redirected"
with type_data = "commit sent: apply or cleanup"
and the following are typical event timings

                               initiated: 14:06:58.205937
                              reached_pg: 14:07:01.823288, gap=  3617.351
                                 started: 14:07:01.823359, gap=     0.071
               waiting for subops from 3: 14:07:01.855259, gap=    31.900
         commit_queued_for_journal_write: 14:07:03.132697, gap=  1277.438
          write_thread_in_journal_buffer: 14:07:03.143356, gap=    10.659
             journaled_completion_queued: 14:07:04.175863, gap=  1032.507
                               op_commit: 14:07:04.585040, gap=   409.177
                              op_applied: 14:07:04.589751, gap=     4.711
                sub_op_commit_rec from 3: 14:07:14.682925, gap= 10093.174
                             commit_sent: 14:07:14.683081, gap=     0.156
                                    done: 14:07:14.683119, gap=     0.038

Should I expect to see a historic op with duration greater than 32 sec?

-- Tom Deneau

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

dump_historic_ops, slow requests

Reply via email to