[
https://issues.apache.org/jira/browse/KAFKA-16931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Edoardo Comar updated KAFKA-16931:
----------------------------------
Summary: Transient REST failures to forward fenceZombie requests leave
Connect Tasks in FAILED state (was: A transient REST failure to forward
fenceZombie request leaves Connect Task in FAILED state)
> Transient REST failures to forward fenceZombie requests leave Connect Tasks
> in FAILED state
> -------------------------------------------------------------------------------------------
>
> Key: KAFKA-16931
> URL: https://issues.apache.org/jira/browse/KAFKA-16931
> Project: Kafka
> Issue Type: Bug
> Components: connect
> Reporter: Edoardo Comar
> Priority: Major
>
> When Kafka Connect runs in exactly_once mode, a task restart will fence
> possible zombies tasks.
> This is achieved forwarding the request to the leader worker using the REST
> protocol.
> At scale, in distributed mode, occasionally an HTTPs request may fail because
> of a networking glitch, reconfiguration etc
> Currently there is no attempt to retry the REST request, the task is left in
> a FAILED state and requires an external restart (with the REST API).
> Would this issue require a small KIP to introduce configuration entries to
> limit the number of retries, backoff times etc ?
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)