Re: [DISCUSS] FIP-12: Server Dynamic Config

Hongshun Wang Mon, 11 Aug 2025 21:08:44 -0700

Hi Jark,

Thanks for the thoughtful feedback. Here are my responses to your questions:


>  Could you clarify why default was chosen over the parent server node in the 
> FIP?
This name is inspired by Kafka’s configuration hierarchy. I agree that global 
is clearer and more intuitive than default. Thanks for the suggestion — I’ve 
updated the naming accordingly, and it does look better now.


>  ConfigEntry necessary in **AlterConfigOp**?*
> Naming consistency improvement
Modified it.


> It's also not clear to me how to rotate the secrets using 
> "password.encoder.old.secret”.

 You're right — the term “rotate” might be misleading. Here’s what this 
mechanism actually does: When users want to change the encryption key for 
security reasons (e.g., key rotation), simply updating password.encoder.secret 
would prevent the server from decoding existing encrypted values in ZooKeeper 
(since they were encrypted with the old key).

>  How the rotation works internally? How it handles failures duringrotation?


We can look at how Kafka handles this: the fallback mechanism only takes effect 
during server restart. On startup, the system first attempts to decode the 
configuration using password.encoder.old.secret. If that fails, it assumes the 
value was encrypted with the current password.encoder.secret and tries decoding 
with the new key.
After startup, whenever configuration properties are read from ZooKeeper, any 
password that cannot be decoded is silently ignored (i.e., removed from the 
config set), and a warning is logged. The relevant logic is shown below:

java

```java
 private[server] def fromPersistentProps(persistentProps: Properties,
                                          perBrokerConfig: Boolean): Properties 
= {
    val props = persistentProps.clone().asInstanceOf[Properties]

    def decodePassword(configName: String, value: String): Unit = {
            if (value != null) {
                try {
                props.setProperty(configName, 
passwordEncoder.decode(value).value)
                } catch {
                case e: Exception =>
                    error(s"Dynamic password config $configName could not be 
decoded, ignoring.", e)
                    props.remove(configName)
                }
            }
            }

    props.asScala.forKeyValue { (name, value) =>
      if (isPasswordConfig(name))
        decodePassword(name, value)
    }
    props
    }

````

Given this behavior, in Fluss we have two possible approaches:
1. Follow Kafka’s lenient model: During restart, if a config cannot be decoded 
by either password.encoder.old.secret or password.encoder.secret, it is 
ignored. The server continues to start up, using whatever valid configurations 
remain.
2. Fail-fast on undecodable secrets: During restart, if any password cannot be 
decoded by either the old or new secret, the server fails to start. This 
ensures configuration integrity but requires manual intervention. 

Which choice are you preferred?


Best
Hongshun

> 2025年8月10日 12:17，Jark Wu <[email protected]> 写道：
> 
> Besides, I noticed that there are many Kafka concepts mentioned in
> "Sensitive Configuration Encryption" section, like "broker",
> "server.properties", and we should change them to Fluss concepts.
> 
> It's also not clear to me how to rotate the secrets using
> "password.encoder.old.secret". Do we need to provide a specific API for the
> rotation? How the rotation works internally? How it handles failures during
> rotation? If it is complex, we can move it to the future works with another
> FIP and keep current FIP simple.
> 
> Best,
> Jark
> 
> On Sun, 10 Aug 2025 at 11:20, Jark Wu <[email protected]> wrote:
> 
>> Hi Hong,
>> 
>> Thanks for putting together this design proposal—I appreciate the thorough
>> work. I have a few comments and suggestions for consideration:
>> 
>> *1. ZooKeeper path: `/config/server/default`*
>> Could you clarify why choose `default` over the parent `server` node in
>> the FIP? If the intention is to support per-server overrides in the future,
>> using a name like `all` (e.g., `/config/server/all`) or `global` might be
>> more intuitive than `default`. The term "default" could be misinterpreted
>> as providing fallback or static default values, rather than active dynamic
>> overrides.
>> 
>> *2. Backward compatibility of `password.encoder.secret`*
>> Marking `password.encoder.secret` as required raises concerns about
>> backward compatibility. Since this config doesn’t exist in v0.7
>> deployments, upgrading to v0.8 could fail if the field is strictly
>> required.
>> 
>> *3. Is ConfigEntry necessary in **AlterConfigOp**?*
>> The ConfigSource in ConfigEntry is meaningless and confusing for
>> AlterConfigOp, because all the config to alter are dynamic. I prefer that
>> AlterConfigOp just contains 3 members: configKey, configValue and opType to
>> make the inerface clean.
>> 
>> *4. Naming consistency improvements*
>> To improve consistency across the codebase, I recommend the following
>> renames:
>>   - `ConfigEntry.name` → `ConfigEntry.key`
>>   - `PbDescribeConfigsResponseInfo.config_name` → `config_key`
>>   - `PbAlterConfigsRequestInfo.config_name` → `config_key`
>>   - `PbAlterConfigsRequestInfo.config_operation` → `op_type` (to align
>> with `AlterConfigOp.opType`)
>> 
>> 
>> Let me know your thoughts.
>> 
>> Best,
>> Jark
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Fri, 8 Aug 2025 at 10:39, Wang Cheng <[email protected]> wrote:
>> 
>>> Hi Hongshun,
>>> 
>>> 
>>> 
>>> Since tablet servers lazily apply config changes, could we provide an RPC
>>> call like IsAlterConfigDone to allow users to track the progress?
>>> 
>>> Do we have a RESET command to restore the value of a run-time parameter
>>> to the default value?
>>> 
>>> I'm still unclear about the need for two distinct implementations for
>>> dynamic server-level and table-level configurations. In modern distributed
>>> databases like PGXL [1] (a distributed PostgreSQL variant), both DDL
>>> operations and SET commands are handled uniformly via a two-phase commit
>>> protocol to avoid any inconsistency problems across all data nodes and
>>> coordinators.
>>> 
>>> 
>>> 
>>> [1]
>>> https://postgres-x2.github.io/presentation_docs/2014-07-PGXC-Implementation/pgxc.pdf
>>> 
>>> 
>>> 
>>> Regards,
>>> Cheng
>>> 
>>> 
>>> 
>>> &nbsp;
>>> 
>>> 
>>> 
>>> 
>>> ------------------ Original ------------------
>>> From:
>>>                                                    "dev"
>>>                                                                  <
>>> [email protected]&gt;;
>>> Date:&nbsp;Thu, Aug 7, 2025 11:51 AM
>>> To:&nbsp;"dev"<[email protected]&gt;;
>>> 
>>> Subject:&nbsp;[DISCUSS] FIP-12: Server Dynamic Config
>>> 
>>> 
>>> 
>>> Hi devs,
>>> 
>>> I'd like to start a discussion about FIP-12: Server Dynamic Config[1].
>>> Currently, any changes to the server.yaml configuration in a Fluss cluster
>>> require a full restart of the cluster, which negatively impacts stability
>>> and availability. To improve operational agility and reduce downtime, we
>>> propose introducing dynamic configuration capabilities, enabling runtime
>>> modification of key parameters—such as enabling/disabling lake-streaming
>>> integration features or managing user accounts—without requiring service
>>> interruption.
>>> 
>>> The POC[2] code is provided to enable lake format. You can try and give
>>> some advice.
>>> 
>>> Best
>>> Hongshun
>>> 
>>> 
>>> [1]
>>> 
>>> https://cwiki.apache.org/confluence/display/FLUSS/FIP-12%3A+Server+Dynamic+Config
>>> [2] https://github.com/loserwang1024/fluss/tree/poc-dymanic-config.
>> 
>>

Re: [DISCUSS] FIP-12: Server Dynamic Config

Reply via email to