Re: [PR] Couch stats resource tracker v2 [couchdb]

via GitHub Fri, 18 Oct 2024 15:29:03 -0700


chewbranca commented on code in PR #5213:
URL: https://github.com/apache/couchdb/pull/5213#discussion_r1807067405



##########
src/couch/priv/stats_descriptions.cfg:
##########
@@ -418,10 +422,47 @@
     {type, counter},
     {desc, <<"number of other requests">>}
 ]}.
+{[couchdb, query_server, js_filter], [
+    {type, counter},
+    {desc, <<"number of JS filter invocations">>}
+]}.
+{[couchdb, query_server, js_filtered_docs], [
+    {type, counter},
+    {desc, <<"number of docs filtered through JS invocations">>}
+]}.
+{[couchdb, query_server, js_filter_error], [
+    {type, counter},
+    {desc, <<"number of JS filter invocation errors">>}
+]}.
 {[couchdb, legacy_checksums], [
     {type, counter},
     {desc, <<"number of legacy checksums found in couch_file instances">>}
 ]}.
+{[couchdb, btree, folds], [
+    {type, counter},
+    {desc, <<"number of couch btree kv fold callback invocations">>}
+]}.
+{[couchdb, btree, get_node, kp_node], [
+    {type, counter},
+    {desc, <<"number of couch btree kp_nodes read">>}
+]}.
+{[couchdb, btree, get_node, kv_node], [
+    {type, counter},
+    {desc, <<"number of couch btree kv_nodes read">>}
+]}.

Review Comment:
   > It's not clear which btrees the metrics refer to.
   
   So that's the thing, we don't have current visibility into the volume of 
`couch_btree:get_node` calls _or_ what type of operations are inducing them. 
These could come from normal db doc reads, all_docs folds, compaction folds, 
index build folds, all these different systems induce heavy couch_btree 
operations and we don't know where/when/why.
   
   The idea here is to start tracking the amount of core btree operations by 
the workers, allowing us to track it at a process level like we're doing with 
the rest of the stats in this PR, and then we'll at least be able to see the 
total volume of couch_btree operations induced, and then use the reports to 
quantify the requests/dbs/etc that performed those operations.
   
   A key aspect of the tracking here is that `couch_btree:*` are still inside 
the _caller_ processes, eg the workers themselves, so we can do the process 
local tracking without having to extend beyond process boundaries. This CSRT 
engine does not yet handle nested work tracking, so for instance we can't 
funnel back information from the couch_file pid back through IOQ. Tracking the 
data here is a solid boundary of per request work being induced that we can 
track, while also providing much needed insight into overall database 
operations.
   
   As for tracking of `kv_node` separately from `kp_node`, I think this is 
important because we're in a btree and performance is proportional to the log 
of the size of the btree, and we currently don't have any metrics illuminating 
the impact of btree size on read throughput. In my experience it's a fairly 
common experience for users to forget that while `logN` in a B+tree is _really_ 
good, it's _not_ a constant value and once you get into databases with billions 
of documents it takes longer to get through the btree, on every request. My 
hope here is that by tracking `kp_node` reads separately from `kv_nodes` we can 
more easily understand the workloads induced by larger databases, and more 
easily quantify that, something that I personally find challenging to do today.
   
   I thought about trying to funnel down into `couch_btree` an indicator 
variable of the datatype, so for instance we could track changes btree 
operations independently of compaction folds, but that'd either require passing 
through a more informative context into `couch_btree:*` or some ugly pdict 
hacks. I think this approach of getting as close to the caller boundary in as 
we can before we funnel off to another process gives us low level insight into 
`couch_btree` operations while also allowing us to track it at a request/worker 
level so that retroactively we'll be able to go look at the report logs and 
quantify what requests were responsible for the induced btree operations. It's 
important to note that this approach will also extend naturally to background 
worker CSRT tracking, eg we can easily extend compaction, replication, and 
indexing to create a CSRT context and then we'll be able to quantify the work 
induced by those background jobs too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Couch stats resource tracker v2 [couchdb]

Reply via email to