chewbranca commented on code in PR #5213:
URL: https://github.com/apache/couchdb/pull/5213#discussion_r1807067405
##########
src/couch/priv/stats_descriptions.cfg:
##########
@@ -418,10 +422,47 @@
{type, counter},
{desc, <<"number of other requests">>}
]}.
+{[couchdb, query_server, js_filter], [
+ {type, counter},
+ {desc, <<"number of JS filter invocations">>}
+]}.
+{[couchdb, query_server, js_filtered_docs], [
+ {type, counter},
+ {desc, <<"number of docs filtered through JS invocations">>}
+]}.
+{[couchdb, query_server, js_filter_error], [
+ {type, counter},
+ {desc, <<"number of JS filter invocation errors">>}
+]}.
{[couchdb, legacy_checksums], [
{type, counter},
{desc, <<"number of legacy checksums found in couch_file instances">>}
]}.
+{[couchdb, btree, folds], [
+ {type, counter},
+ {desc, <<"number of couch btree kv fold callback invocations">>}
+]}.
+{[couchdb, btree, get_node, kp_node], [
+ {type, counter},
+ {desc, <<"number of couch btree kp_nodes read">>}
+]}.
+{[couchdb, btree, get_node, kv_node], [
+ {type, counter},
+ {desc, <<"number of couch btree kv_nodes read">>}
+]}.
Review Comment:
> It's not clear which btrees the metrics refer to.
So that's the thing, we don't have current visibility into the volume of
`couch_btree:get_node` calls _or_ what type of operations are inducing them.
These could come from normal db doc reads, all_docs folds, compaction folds,
index build folds, all these different systems induce heavy couch_btree
operations and we don't know where/when/why.
The idea here is to start tracking the amount of core btree operations by
the workers, allowing us to track it at a process level like we're doing with
the rest of the stats in this PR, and then we'll at least be able to see the
total volume of couch_btree operations induced, and then use the reports to
quantify the requests/dbs/etc that performed those operations.
A key aspect of the tracking here is that `couch_btree:*` are still inside
the _caller_ processes, eg the workers themselves, so we can do the process
local tracking without having to extend beyond process boundaries. This CSRT
engine does not yet handle nested work tracking, so for instance we can't
funnel back information from the couch_file pid back through IOQ. Tracking the
data here is a solid boundary of per request work being induced that we can
track, while also providing much needed insight into overall database
operations.
As for tracking of `kv_node` separately from `kp_node`, I think this is
important because we're in a btree and performance is proportional to the log
of the size of the btree, and we currently don't have any metrics illuminating
the impact of btree size on read throughput. In my experience it's a fairly
common experience for users to forget that while `logN` in a B+tree is _really_
good, it's _not_ a constant value and once you get into databases with billions
of documents it takes longer to get through the btree, on every request. My
hope here is that by tracking `kp_node` reads separately from `kv_nodes` we can
more easily understand the workloads induced by larger databases, and more
easily quantify that, something that I personally find challenging to do today.
I thought about trying to funnel down into `couch_btree` an indicator
variable of the datatype, so for instance we could track changes btree
operations independently of compaction folds, but that'd either require passing
through a more informative context into `couch_btree:*` or some ugly pdict
hacks. I think this approach of getting as close to the caller boundary in as
we can before we funnel off to another process gives us low level insight into
`couch_btree` operations while also allowing us to track it at a request/worker
level so that retroactively we'll be able to go look at the report logs and
quantify what requests were responsible for the induced btree operations. It's
important to note that this approach will also extend naturally to background
worker CSRT tracking, eg we can easily extend compaction, replication, and
indexing to create a CSRT context and then we'll be able to quantify the work
induced by those background jobs too.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]