chewbranca commented on code in PR #5213:
URL: https://github.com/apache/couchdb/pull/5213#discussion_r1807052709


##########
src/fabric/priv/stats_descriptions.cfg:
##########
@@ -26,3 +26,53 @@
     {type, counter},
     {desc, <<"number of write quorum errors">>}
 ]}.
+
+
+%% fabric_rpc worker stats
+%% TODO: decide on which naming scheme:
+%% {[fabric_rpc, get_all_security, spawned], [
+%% {[fabric_rpc, spawned, get_all_security], [
+{[fabric_rpc, get_all_security, spawned], [

Review Comment:
   It's `spawned` because it directly corresponds to `rexi_server` spawning the 
RPC worker processes by way of `rexi_server:init_p` here 
https://github.com/apache/couchdb/blob/couch-stats-resource-tracker-v2/src/rexi/src/rexi_server.erl#L143
 and we handle that here 
https://github.com/apache/couchdb/blob/couch-stats-resource-tracker-v2/src/couch_stats/src/couch_stats.erl#L126-L131.
   
   The core goal here is to track the volume or RPC work being spawned on the 
system, the amount of data they're processing, and the amount of data returned, 
as appropriate. All RPC workers are spawned processes, and tracking the rate of 
arrival is an important metric we're missing, so we track `spawned` for the 
core API operations (the should_track logic keeps it to a targeted subset of 
API operations to focus on commonly useful metrics, but it's arguable we should 
extend this tracking to _all_ induced RPC work). Many of the API operations 
stream out many rows, so we want to be able to track the volume of `returned` 
rows/data, and then similarly, some of the API operations filter rows and 
return a subset of the data.
   
   Currently it's way too difficult to correlate a CouchDB node's IOQ usage 
with the underlying database operations being performed. Seeing the rate of 
incoming work, the amount of data processed, and the amount returned opens up a 
lot of visibility on that front, especially around things like filtered 
operations where we haven't had metrics around the skipped work that is tricky 
to introspect.
   
   If you're comment is more around the naming hierarchy, my approach was to 
have a consistent naming hierarchy with `[fabric_rpc, $call, spawned]` for all 
the tracked RPC workers, and then similarly with `processed` or `returned`, I 
figured this naming hierarchy works well with various metrics stacks for doing 
things like `graph([fabric_rpc, *, spawned])` to easily see the rate of induced 
workers, like I did in the Graphite links in the top description of the PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to