dongxiaoman opened a new pull request #7064: URL: https://github.com/apache/incubator-pinot/pull/7064
### Summary **Allow us to update Helix `hostname` property so controllers and brokers can be replaced easily.** Right now Pinotserver has the feature to update hostname in Helix, so we can re-use the instance-id with a different FQDN hostname, but Pinot broker/controller are still bound to the Helix limitation of naming convention. In this PR we introduce a few configuration properties so we can update Controller and Broker's hostnames. This PR was tested on our AWS fleet successfully. ### Background Pinot/Helix has a limitation that the Helix `"instanceId"` has to follow some naming format like "Controller_hostname4_9000" where the first part is host "type", second part is host name, and 3rd part is port. This caused trouble for us in the early days when EC2 hosts are replaced with new hosts and hostnames change. With my company's cloud infra in EC2, we replace hosts frequently, meaning hosts are taken down and new hosts join cluster very often. Every time a new host is swapped in, it shows up in Pinot as a brand new "participant" in cluster, and the old hosts are "dead" forever. This caused many operation pain. To work around this issue, when a new host is replaced, we keep track of the old "instanceId", but register a new hostname into Helix cluster with the same Id. ### Technical details The limitation is caused by Helix `0.9.8`'s assumption of inferring hostname from instanceId with a fixed format, around https://github.com/apache/helix/blob/helix-0.9.300-release/helix-core/src/main/java/org/apache/helix/manager/zk/ParticipantManager.java#L146 ### Changes For Pinot cluster <img width="752" alt="Screen Shot 2021-06-15 at 4 18 33 PM" src="https://user-images.githubusercontent.com/11821736/122143928-4e60ed00-ce07-11eb-86d5-06c39d1c6161.png"> ### Motivation <!-- Why are you making this change? This can be a link to a Jira task. --> Reduce one dependency, and also makes Pinot better. Pinot controller helix hostname cannot be customized ### Testing Tried out in our own fleet. ## Upgrade Notes Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion) * [ ] Yes (Please label as **<code>backward-incompat</code>**, and complete the section below on Release Notes) Does this PR fix a zero-downtime upgrade introduced earlier? * [ ] Yes (Please label this as **<code>backward-incompat</code>**, and complete the section below on Release Notes) Does this PR otherwise need attention when creating release notes? Things to consider: - New configuration options - Deprecation of configurations - Signature changes to public methods/interfaces - New plugins added or old plugins removed * [ ] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes) ## Release Notes <!-- If you have tagged this as either backward-incompat or release-notes, you MUST add text here that you would like to see appear in release notes of the next release. --> Adds <!-- If you have a series of commits adding or enabling a feature, then add this section only in final commit that marks the feature completed. Refer to earlier release notes to see examples of text. --> ## Documentation <!-- If you have introduced a new feature or configuration, please add it to the documentation as well. See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org