Hi Aditya,

AK1_5: Change paused-partitions-count to paused-partitions.

Thanks for your suggestions.

Kind Regards,
PoAn

> On Apr 10, 2026, at 12:00 AM, Aditya Kousik <[email protected]> wrote:
> 
> Thanks, overall LGTM.
> 
> AK1_5: I see paused-partitions-count and paused-partitions both representing 
> int gauges. Can we unify into simply paused-partitions? *-Count can also 
> point users towards windowed/cumulative count which isn’t the case here.
> 
> -Aditya
> 
> On 2026/04/09 12:43:10 PoAn Yang wrote:
>> Hi Chia-Ping, Aditya, and Sahil,
>> 
>> Thanks for your suggestion.
>> 
>> chia_00, AK1, SD2: Align the naming is better. Change all metrics with
>> prefix `paused-partitions`.
>> 
>> chia_01: After checking the code again, it's better to follow current
>> per-partition metrics like records-lag.
>> I change per-partition paused-partitions* metrics to use INFO level.
>> 
>> AK2: Change paused-partitions-count to consumer-coordinator-metrics group.
>> 
>> AK3: Add paused-partitions-rate/paused-partitions-total to both consumer
>> and per-partition levels.
>> Since existing per-partition metrics use INFO level, I change to use INFO
>> as well.
>> 
>> AK4: Add a note about cardinality to consumer-fetch-manager-metrics
>> paragraph.
>> 
>> SD1: Change to use -1 as default value
>> for paused-partitions-duration-seconds.
>> 
>> SD3: Mention per-partition metrics are reset on partition reassignment in
>> Proposed Changes.
>> 
>> Kind regards,
>> PoAn
>> 
>> Sahil Devgon <[email protected]> 於 2026年4月7日週二 下午12:17寫道:
>> 
>>> Hello PoAn,
>>> Thanks for the KIP, I have a few comments that we may consider adding to
>>> the KIP:
>>> 1. One thing I noticed is for partition-paused-time-ms, returning 0 when a
>>> partition is not paused could be slightly ambiguous since it's the same
>>> value a freshly paused partition would return. Would you consider returning
>>> -1 to indicate "not paused" (consistent with how partition-paused uses
>>> 0/1)? Or if 0 is preferred, a clear doc note would go a long way in
>>> preventing false positives in monitoring setups.
>>> 
>>> 2. Adding to Chia-Ping and Aditya's naming suggestions,
>>> partition-paused-time-ms reads as "time in milliseconds" but semantically
>>> it measures elapsed duration since pause. A name like
>>> paused-partition-duration-ms/paused-partition-duration-seconds would better
>>> communicate intent and align with naming conventions used in other Kafka
>>> duration metrics (e.g., records-lag,fetch-latency-avg).
>>> 
>>> 3. The test plan mentions verifying that the pause timestamp is "reset on
>>> partition reassignment" , it would be helpful to also describe this
>>> behavior explicitly in the Proposed Changes section, not just the test
>>> plan. For example, calling out that the pause state is cleared on
>>> reassignment regardless of prior pause status would make the spec feel
>>> complete. This is especially relevant for rebalance-heavy workloads where
>>> partitions move around frequently.
>>> 
>>> Best,
>>> Sahil Devgon
>>> 
>>> On Mon, Apr 6, 2026 at 4:34 PM PoAn Yang <[email protected]> wrote:
>>> 
>>>> Hello everyone,
>>>> 
>>>> I would like to start a discussion thread on KIP-1304. In this KIP, we
>>>> plan to add new consumer metrics about paused partitions.
>>>> 
>>>> 
>>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1304%3A+Add+consumer+metric+about+paused+partitions
>>>> 
>>>> Please take a look and feel free to share any thoughts.
>>>> 
>>>> Thanks,
>>>> PoAn
>>> 
>> 

Reply via email to