[ 
https://issues.apache.org/jira/browse/KAFKA-18344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17907963#comment-17907963
 ] 

TengYao Chi commented on KAFKA-18344:
-------------------------------------

Hi [~mjsax] 

I would take over this issue. 

Thank you 😀

> Consider to distinguish between multiple "positions"
> ----------------------------------------------------
>
>                 Key: KAFKA-18344
>                 URL: https://issues.apache.org/jira/browse/KAFKA-18344
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients, consumer
>            Reporter: Matthias J. Sax
>            Assignee: TengYao Chi
>            Priority: Major
>              Labels: needs-kip
>
> KafkaConsumer currently maintains a "position" which is the max offset of 
> records returned via `poll()`.
> This "position" is used to compute the consumer "lag metrics". This implies, 
> that lag is computed slightly different on the consumer, compared to other 
> tools which use `endOffset - committedOffset`, because "position" does not 
> reflect the latest _processed_ record, but might be ahead of what the 
> application code did process. If lag is computed as "endOffset - 
> committedOffset", lag is always behind, ie, larger than the real lag, what 
> might actually provide better semantics. – It seems undesired that the 
> consumer lag metric could be smaller and the actual lag...
> We should consider to update the position of the consumer differently:
>  # A simple changes could be, to update the position to the offset of the 
> first/oldest record in a `poll()` call (instead of latest/newest as we do 
> right now), to avoid that the position get ahead and lag is "too small"
>  # We could also try to hook into the returned `ConsumerRecords` iterator, to 
> track the position more fine grained on a per-record basis
>  # We could track multiple positions, like "processed positions" and "fetched 
> position" (not that "fetched position" might be even further ahead than the 
> current position, as based on `max.poll.records` not all fetch records might 
> be returned from `poll()`)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to