thelabdude opened a new pull request #2114: URL: https://github.com/apache/lucene-solr/pull/2114
# Description This is the backport to 8x from master, original PR was #2010 See JIRA for description of the issue: https://issues.apache.org/jira/browse/SOLR-12182 # Solution This PR computes the `base_url` for a Replica using the stored `node_name` and a global `urlScheme` rather than storing the `base_url` in `state.json`. This avoids storing an incorrect URL scheme for replicas in persistent storage. The `base_url` is computed when read back from ZK and dropped when marshaling the Replica state to JSON. This also means we don't need a migration tool as stored state is "healed" on-the-fly when read back from ZK. The unfortunate aspect of this PR is we need to keep the URL scheme for the cluster in a global variable (so that it is available when reading from ZK). The global `urlScheme` still comes from the cluster property but is then stored in a global singleton, see: `org.apache.solr.common.cloud.UrlScheme`. Alternatively, we could just keep the `urlScheme` in a static in ZkStateReader, I felt the global singleton `UrlScheme.INSTANCE` made it clearer that this was a global thing but it also made more sense with my first implementation that tried to make rolling restart upgrades to TLS less chaotic. It's a trivial change to move all this over to ZkStateReader and remove UrlScheme. I initially tried setting a `ThreadLocal` that gives access to the `urlScheme` whenever we need to read these props from ZK. However, that ended up being problematic because we tend to read ZkNodeProps from ZK in many places. In reality, the `urlScheme` really is an immutable global variable that should be set once during initialization by reading from the cluster property stored in ZK. So I felt trying to treat this global as something that was highly dynamic made the code overly cumbersome. Put simply, we shouldn't support `urlScheme` changing in a live node after initialization, it's bad for business. I also tried to get rid of the `urlScheme` cluster property (re: https://issues.apache.org/jira/browse/SOLR-10202) but I'm not sure how SolrCloud client applications can resolve the correct `urlScheme` for the cluster without this property? On the server-side, sure we can just get the `urlScheme` from a Java System Property, but that won't be set for remote client applications that initialize via a connection to ZooKeeper. So I'm keeping the cluster property `urlScheme` for now. We also need to consider how to enable TLS on an existing cluster (with active collections) using a rolling restart process. The current `org.apache.solr.cloud.SSLMigrationTest` just stopped all test nodes at once and then brought them back with TLS enabled. Based on feedback, I've since removed the option to pull the active urlScheme from live nodes as we're not able to ensure zero-downtime when moving from `http` -> `https` for clusters with existing collections and live traffic. Put simply, the feature was a bit trappy in that it tried to reduce chaos when doing a rolling restart to enable TLS, but it made no guarantees. Thus, users just need to be sure to enable TLS before building production clusters! Lastly, I've tried to clean-up some of the places that access the baseUrl on replicas to be more consistent, so you'll see some of that in this PR as well. # Tests Many existing tests cover regression caused by these code changes. Added simple unit test for UrlScheme. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [x] I have run `./gradlew check`. - [x] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
