> I finally got around to looking at the dump_historic_ops output for the
> 1-client and 2-client cases.
> As you recall these are all read-ops. so the events in the dump are
> initiated
> reached_pg
> started
> done
>
> The pattern I see for most of the slow ops recorded in the dump is:
>
> * In the 1-client case the typical slow op has duration between 50-65 ms
> and usually most of this is the interval between reached_pg and started.
>
> * In the 2-client case the typical slow op has duration between 95-120 ms
> and again usually most of this is the interval between reached_pg
> and started.
>
> Could someone describe what the interval between reached_pg and started
> means?
I think the slow part is probably find_object_context() (although to be
fair tons of stuff happens here, see do_op()). You could test this
theory or otherwise narrow this down with additional event
markers lke
diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc
index d6f3084..6faccc2 100644
--- a/src/osd/ReplicatedPG.cc
+++ b/src/osd/ReplicatedPG.cc
@@ -1691,10 +1691,12 @@ void ReplicatedPG::do_op(OpRequestRef& op)
return;
}
+ op->mark_event("about to find");
int r = find_object_context(
oid, &obc, can_create,
m->has_flag(CEPH_OSD_FLAG_MAP_SNAP_CLONE),
&missing_oid);
+ op->mark_event("found");
if (r == -EAGAIN) {
// If we're not the primary of this OSD, and we have
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html