bobhan1 commented on code in PR #63832:
URL: https://github.com/apache/doris/pull/63832#discussion_r3331443024


##########
be/src/cloud/cloud_internal_service.cpp:
##########
@@ -407,10 +414,103 @@ bvar::Adder<uint64_t> 
g_file_cache_warm_up_rowset_wait_for_compaction_num(
 bvar::Adder<uint64_t> 
g_file_cache_warm_up_rowset_wait_for_compaction_timeout_num(
         "file_cache_warm_up_rowset_wait_for_compaction_timeout_num");
 
+// Per-job windowed metrics for target BE
+// bvar::Window enforces MAX_SECONDS_LIMIT = 3600, so the longest window is 1h.
+static constexpr int WINDOW_5M = 300;
+static constexpr int WINDOW_30M = 1800;
+static constexpr int WINDOW_1H = 3600;
+
+MBvarWindowedAdder 
g_warmup_ed_finish_segment_num("warmup_ed_finish_segment_num", {"job_id"},

Review Comment:
   I checked the bvar implementation again.
   
   `bvar::Window` does not record every update written to the `Adder`. For 
`bvar::Adder`, the underlying sampler samples the cumulative adder value 
roughly once per second, and the window value is calculated from the difference 
between the latest sampled cumulative value and the oldest sampled cumulative 
value in the requested window.
   
   The 5m/30m/1h windows created for the same `Adder` also share the same 
underlying sampler. The sampler queue is sized by the largest window, so here 
it keeps about `3600 + 1` samples, not `300 + 1800 + 3600` samples and not one 
sample per warm-up event.
   
   Rough estimate:
   - One `Sample<int64_t>` stores `data` and `time_us`, so it is about 16 bytes.
   - The largest window is 1h, so one sampler queue is about `(3600 + 1) * 16 
~= 56KB`.
   - Source-side stats have 4 windowed adders, about `4 * 56KB ~= 224KB/job` 
for sampler queues.
   - Target-side stats have 8 windowed adders, about `8 * 56KB ~= 448KB/job` 
for sampler queues.
   - If the same BE process observes both sides, the sampler queue storage is 
roughly `(4 + 8) * 56KB ~= 672KB/job`, plus small object/map/string overhead.
   
   So this is proportional to the number of job_id dimensions seen by a BE 
process, not proportional to the number of rowsets/segments/events. The overall 
memory usage should be small for the expected number of warm-up jobs. This 
state is also BE-process-local memory only; it is not persisted and will be 
released after BE restart.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to