[
https://issues.apache.org/jira/browse/KAFKA-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias J. Sax updated KAFKA-7608:
-----------------------------------
Issue Type: Improvement (was: Bug)
> A Kafka Streams DSL transform or process call should potentially trigger a
> repartition
> --------------------------------------------------------------------------------------
>
> Key: KAFKA-7608
> URL: https://issues.apache.org/jira/browse/KAFKA-7608
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Affects Versions: 2.0.0
> Reporter: Andy Bryant
> Priority: Major
>
> Currently in Kafka Streams, if any DSL operation occurs that may modify the
> keys of the record stream, the stream is flagged for repartitioning.
> Currently this flag is checked prior to a stream join or an aggregation and
> if set the stream is piped through a transient repartition topic. This
> ensures messages with the same key are always co-located in the same
> partition and hence same stream task and state store.
> The same mechanism should be used to trigger repartitioning prior to stream
> {{transform}}, {{transformValues}} and {{process}} calls that specify one or
> more state stores.
> Currently without the forced repartitioning, for streams where the key has
> been modified, there is no guarantee the same keys will be processed by the
> same task which would be what you expect when using a state store. Given that
> aggregations and joins already automatically make this guarantee it seems
> inconsistent that {{transform}} and {{process}} do not provide the same
> guarantees.
> To achieve the same guarantees currently, developers must manually pipe the
> stream through a topic to force the repartitioning. This works, but is
> sub-optimal since you don't get the handy optimisation where the repartition
> topic contents is purged after use.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)