daniellavoie opened a new pull request #5608: URL: https://github.com/apache/incubator-pinot/pull/5608
## Description **DISCLAIMER** : This PR looks scarier than it is :) This change introduces configuration externalization to work with different environments. Properties can now be loaded from properties files, YAML files, environment variables, and command-line arguments to externalize configuration independently. Properties are now accessible from a `PinotConfiguration` class which act has an adapter to Apache Commons Configuration. This initial phase does not change how services are started and how they are configured. A future commit will enable properties loaded from `PINOT_X` environment variables. ## Motivation In order to ease cloud native deployments of Pinot, it is a general best practice to allow software to be configured with environment variables or CLI arguments. Config file based configuration is not considered best practice as it requires additional effort to mount volumes and maintain the property files. CLI arguments and environment variables are easily referred in an helm chart, a docker-compose manifest or simple docker run commands. ## Features: ### Configuration overwrite priority: Constructing a `PinotConfiguration` instance offers the opportunity to provide base properties, environment variables. Properties will also be loaded from any config file referenced with a `config.paths` property found in the base properties or environment variables. The priority is established as follow: `Base properties > Env variables > Config Files` As explained previously, this initial phase only loads properties from existing CLI commands and config files (`Base properties > Config Files`). ### Relaxed property binding from environment variables The `PinotConfiguration` class introduces a relaxed property binding mechanism to help, as much as possible, to follow operating system strict rules around environment variables naming concentions. For example, Linux shell variables can contain only letters (a to z or A to Z), numbers (0 to 9) or the underscore character (_). By convention, Unix shell variables will also have their names in UPPERCASE. To convert a property name in the canonical-form to an environment variable name you can follow these rules: * Prefix with `PINOT_` to avoid conflict with other services. * Replace dots (.) with underscores (_). * Remove any dashes (-). * Convert to uppercase. As an example, the `PINOT_CONTROLLER_PORT` would be interpreted as the `controller.port` property. ## Side effects of the change: This first phase of the refactoring only introduces the new `PinotConfiguration` abstraction. The dependency `org.apache.commons.configuration.Configuration` used to be to be referred in a big portion of the Pinot codebase. It has been replaced by the Pinot specific `PinotConfiguration` class. This change will simplify the a future upgrade of Apache Commons Configuration from 1.x to 2.x that new package namespaces and breaking changes. ## Breaking change and upgrade notes No impact or configuration changes on standard deployments. With Apache Commons Configuration, duplicate properties may be defined and will be made available as string array values. Prior to this change, retrieving a property with `getString` would return the first occurrences of the duplicate property. Using `getStringArray` would return all occurrences. With this new change, duplicate properties defined from multiple sources (eg: env variables and properties files) will be overwritten using the overwrite priority described earlier. Array properties now needs to be defined as a comma separated string from a single properties. Duplicate properties from difference sources will not be merged to an array property. Only the property highest priority source will be available from `PinotConfiguration.getProperty`. ## Notes for reviewers: The change impacts a large portion of the codebase. While it can be scary, the changes represents mostly a replacement of `org.apache.commons.configuration.Configuration` to `org.apache.pinot.spi.env.PinotConfiguration`. The following classes should be the main area of attention for the review: * `org.apache.pinot.spi.env.PinotConfiguration` * `org.apache.pinot.controller.ControllerConf` * `org.apache.pinot.tools.utils.PinotConfigUtils` * `org.apache.pinot.spi.env.CommonsConfigurationUtils` * `org.apache.pinot.spi.env.Environment` * `org.apache.pinot.spi.env.SystemEnvironment` ## Release Notes No release note required since these changes are not user facing yet. ## Documentation No documentation change required since these changes are not user facing yet. Future phase of this refactoring will enable environment variables properties and include documentation changes to expose the relaxed binding mechanism. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org