showuon commented on PR #16118: URL: https://github.com/apache/kafka/pull/16118#issuecomment-2146749642
I was trying to know the root cause of this problem, that why does it fail after upgrade, but not fail without upgrade. My understanding is that because before upgrade, the topic image doesn't have dirID for the assignment. After upgrade, the assignment has the dirID. So in the `ReplicaManager#applyDelta`, we'll have have directoryId changes in `localChanges`, which will invoke `AssignmentEvent` [here](https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L2748). With that, we'll get the unexpected `NOT_LEADER_OR_FOLLOWER` error. And I also confirmed, without your change in this PR, this issue also exists. That is: 1. Launch a 3.6.0 controller and a 3.6.0 broker(BrokerA) in Kraft mode; 2. Create a topic with 1 partition; ~~3. Launch a 3.6.0 broker(Broker B) in Kraft mode and reassign the step 2 partition to Broker B;~~ 4. Upgrade Broker B to 3.7.0; 5. Upgrade Broker A, Controllers to 3.7.0 6. Upgrade MV to 3.7: ./bin/kafka-features.sh --bootstrap-server localhost:9092 upgrade --metadata 3.7 7. reassign the step 2 partition to Broker A (or B) So I think we might need to think about a good solution to fix from the root. I will create another ticket to track this issue. That said, I think this PR already fixed the issue in JIRA. Let's complete it! :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
