[
https://issues.apache.org/jira/browse/KAFKA-18344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
TengYao Chi reassigned KAFKA-18344:
-----------------------------------
Assignee: TengYao Chi
> Consider to distinguish between multiple "positions"
> ----------------------------------------------------
>
> Key: KAFKA-18344
> URL: https://issues.apache.org/jira/browse/KAFKA-18344
> Project: Kafka
> Issue Type: Improvement
> Components: clients, consumer
> Reporter: Matthias J. Sax
> Assignee: TengYao Chi
> Priority: Major
> Labels: needs-kip
>
> KafkaConsumer currently maintains a "position" which is the max offset of
> records returned via `poll()`.
> This "position" is used to compute the consumer "lag metrics". This implies,
> that lag is computed slightly different on the consumer, compared to other
> tools which use `endOffset - committedOffset`, because "position" does not
> reflect the latest _processed_ record, but might be ahead of what the
> application code did process. If lag is computed as "endOffset -
> committedOffset", lag is always behind, ie, larger than the real lag, what
> might actually provide better semantics. – It seems undesired that the
> consumer lag metric could be smaller and the actual lag...
> We should consider to update the position of the consumer differently:
> # A simple changes could be, to update the position to the offset of the
> first/oldest record in a `poll()` call (instead of latest/newest as we do
> right now), to avoid that the position get ahead and lag is "too small"
> # We could also try to hook into the returned `ConsumerRecords` iterator, to
> track the position more fine grained on a per-record basis
> # We could track multiple positions, like "processed positions" and "fetched
> position" (not that "fetched position" might be even further ahead than the
> current position, as based on `max.poll.records` not all fetch records might
> be returned from `poll()`)
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)