cwildman opened a new pull request, #15441: URL: https://github.com/apache/kafka/pull/15441
…ient ## Description Brokers can respond to metadata requests with uninitialized metadata when they are starting up. The [NetworkClient](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L1187-L1191) detects this scenario and ignores the empty metadata so that it can be retried later. Unfortunately the KafkaAdminClient only detects empty metadata for [listConsumerGroups](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/KafkaAdminClient.java#L3369-L3371). The impact of this is that other metadata requests fail with incorrect errors when handling an uninitialized metadata response. For example describeTopics will throw an UnknownTopicOrPartitionException for topics that do exist in the cluster. This PR changes the KafkaAdminClient to detect any MetadataResponse that contains an empty broker set and throws a StaleMetadataException, that enables the call to be automatically retried. For example `describeTopics`, `listTopics`, `describeCluster` and the clients own metadata fetches will now be retried if the returned brokers set is empty. Additionally any calls that rely on metadata using the AllBrokerStrategy or the PartitionLeaderStrategy. `listConsumerGroups` was already retrying and will continue to do so. ## Discussion I think the better long term solution here is to have the brokers respond with a specific error when their metadata is uninitialized. This would be a clearer signal to all clients instead of relying on the obscure empty brokers condition. That would be a larger change so I'd like some feedback on whether that's the direction we want to go. I don't think the StaleMetadataException is the perfect exception for the uninitialized metadata scenario but there was precedent for it already in `listConsumerGroups` so I went with that. Open to creating a new exception type if people would like or just making the message within more clear. ## Testing I wrote a unit test that proves `describeTopics` will now retry when it receives an empty metadata response. This same test fails with an UnknownTopicOrPartitionException without my change. I also patched one of the other test scenarios (describeProducers) to include brokers in its mock metadata response, because that test would fail otherwise now. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
