[
https://issues.apache.org/jira/browse/KAFKA-13501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867880#comment-17867880
]
Matthias J. Sax commented on KAFKA-13501:
-----------------------------------------
I don't think state updated solve this? Also not sure why it's labeled with
"new-streams-runtime-should-fix" – I don't see how.
It a task fails locally, and we would restart the task locally we rebuild state
from scratch. With state-updater the same happens. The difference w/
state-update is "only" (can be significant) that we would not block all tasks
from processing any longer, but keep processing all other tasks, while
state-updater does the restore.
However, for the failed task, we still have offline time. The idea of this
ticket was to say: if we have two instance A and B, and the local failures
happens on A, and B has a standby, let's trigger a rebalance, and move the
failed task to B to avoid offline time for the failed task all together. – On
instance A, we might still re-build the state using state-updated, but B would
take over processing in the mean time. And after A is done restoring, we could
do another rebalance, to move the active back from B to A (and still keep a
standby on B).
Does this make sense? (Maybe the ticket description was too brief?)
> Avoid state restore via rebalance if standbys are enabled
> ---------------------------------------------------------
>
> Key: KAFKA-13501
> URL: https://issues.apache.org/jira/browse/KAFKA-13501
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Matthias J. Sax
> Priority: Major
> Labels: new-streams-runtime-should-fix
>
> There are certain scenario in which Kafka Streams wipes out local state and
> rebuilt it from scratch. This is a thread local cleanup, ie, no rebalance is
> triggered, and we end up with an offline task until state restoration
> finished.
> If standby tasks are enable, it might actually make sense to trigger a
> rebalance instead, to get the task re-assigned to the instance hosting the
> standby so get the task active again quickly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)