[
https://issues.apache.org/jira/browse/KAFKA-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jennifer Thompson updated KAFKA-7845:
-------------------------------------
Summary: Kafka clients do not re-resolve ips when a broker is replaced.
(was: NotLeaderForPartitionException error when publishing after a broker has
died)
> Kafka clients do not re-resolve ips when a broker is replaced.
> --------------------------------------------------------------
>
> Key: KAFKA-7845
> URL: https://issues.apache.org/jira/browse/KAFKA-7845
> Project: Kafka
> Issue Type: Bug
> Components: clients
> Affects Versions: 2.1.0
> Reporter: Jennifer Thompson
> Priority: Major
>
> When one of our Kafka brokers dies and a new one replaces it (via an aws
> ASG), the clients that publish to Kafka still try to publish to the old
> brokers.
> We see errors likeĀ
> {code:java}
> 2019-01-18 20:16:16 WARN NetworkClient:721 - [Producer clientId=producer-1]
> Connection to node 2 (/10.130.98.111:9092) could not be established. Broker
> may not be available.
> 2019-01-18 20:19:09 WARN Sender:596 - [Producer clientId=producer-1] Got
> error produce response with correlation id 3414 on topic-partition aa.pga-2,
> retrying (4 attempts left). Error: NOT_LEADER_FOR_PARTITION
> 2019-01-18 20:19:09 WARN Sender:641 - [Producer clientId=producer-1] Received
> invalid metadata error in produce request on partition aa.pga-2 due to
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is
> not the leader for that topic-partition.. Going to request metadata update now
> 2019-01-18 20:21:19 WARN NetworkClient:721 - [Producer clientId=producer-1]
> Connection to node 2 (/10.130.98.111:9092) could not be established. Broker
> may not be available.
> 2019-01-18 20:21:50 ERROR ProducerBatch:233 - Error executing user-provided
> callback on message for topic-partition 'aa.test-liz-0'{code}
> and
> {code:java}
> [2019-01-18 20:28:47,732] ERROR WorkerSourceTask{id=rabbit-vpc-2-kafka-1}
> Failed to flush, timed out while waiting for producer to flush outstanding 27
> messages (org.apache.kafka.connect.runtime.WorkerSourceTask)
> [2019-01-18 20:28:47,732] ERROR WorkerSourceTask{id=rabbit-vpc-2-kafka-1}
> Failed to commit offsets
> (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter)
> {code}
> The ip address referenced is for the broker that died. We have Kafka Manager
> running as well, and that picks up the new broker.
> This started happening after we upgraded to 2.1. When had Kafka 1.1, brokers
> could failover without a problem.
> One thing that might be considered unusual about our deployment is that we
> reuse the same broker id and EBS volume for the new broker, so that
> partitions do not have to be reassigned.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)