The quick feedback I gathered pinging TCM / Accord channels:

Paxos - no particular opinion
Accord - people will probably want to configure some of them
TCM - we might just hide them for now and create a ticket to expose later

Personally I do not know how to comment on these. Some are very
specific and require knowledge to explain what they are for which I do
not have.

So I would say, let's hide them all, create the tickets and if
somebody finds a need for exposing them they can do so (might be
really anybody).

On Sun, Apr 12, 2026 at 10:34 PM Pedro Gordo <[email protected]> wrote:
>
> I've been working on this ticket. I have a JUnit test 
> (ConfigYamlCoverageTest) with forward and reverse checks, and an 
> @HiddenInYaml annotation ready to go. I've triaged all 131 Config.java fields 
> that are missing from the yaml files and have decision suggestions for most 
> of them.
>
> Before I share the full spreadsheet, I'd like to gauge whether we can handle 
> some groups at the group level rather than property-by-property:
>
> Deprecated fields (13 fields): back_pressure_*, otc_coalescing_*, 
> otc_backlog_*, windows_timer_interval, max_streaming_retries, 
> repair_session_max_tree_depth, scripted_user_defined_functions_enabled, 
> use_deterministic_table_id, cms_default_retry_backoff, 
> cms_default_max_retry_backoff. These are all @Deprecated in Config.java. The 
> test already skips them automatically, but should they also be annotated 
> @HiddenInYaml for clarity?
> Paxos v2 (16 fields): All the paxos_* and skip_paxos_repair_* fields from 
> CEP-14. Internal LWT protocol tuning.
> TCM/CMS (12 fields): CMS timeouts/retries (cms_*), metadata snapshotting, 
> discovery timeout, unsafe_tcm_mode, progress barrier fields 
> (progress_barrier_*), and short_rpc_timeout. All internal CEP-21 
> infrastructure.
> Accord (3 fields): accord_preaccept_timeout, concurrent_accord_operations, 
> consensus_migration_cache_size. CEP-15 internals. Still maturing.
>
> Does anyone see a reason any of these should be exposed in cassandra.yaml 
> rather than marked @HiddenInYaml? If we can agree on these groups, it reduces 
> the remaining discussion to about 13 individual fields which is much more 
> manageable.
>
> On Tue, 4 Mar 2025 at 09:31, Dmitry Konstantinov <[email protected]> wrote:
>>
>> >> https://docs.google.com/spreadsheets/d/11MOxhNqwE1tWP4ex2gzKG2pmeAWFaHDKo-CRp25h9BU/edit?gid=0#gid=0
>> We still have a lot of rows empty. I have added many default values and a 
>> Cassandra version when a parameter was introduced (to differentiate some 
>> recent parameters from old ones) based on source code but it would be nice 
>> to get a description for parameters from the authors as well as 
>> classification exposed/hidden.
>> Maybe we should not wait for collecting info about all parameters and update 
>> what we have + use a threshold in the Ant validation task to fail when new 
>> missed parameters are added. The logic in the dev branch here 
>> https://issues.apache.org/jira/browse/CASSANDRA-20249 already supports a 
>> threshold.
>>
>>
>> On Tue, 28 Jan 2025 at 00:04, Josh McKenzie <[email protected]> wrote:
>>>
>>> Good point re: the implications of parsing and durability in the face of 
>>> seeing unknown or missing parameters. I don't think widening the scope on 
>>> that would be ideal, especially considering the entire impetus for this 
>>> conversation is "we've misbehaved with our config and have a bunch of 
>>> undocumented stuff we're not sure is still useful, or what it's for". =/
>>>
>>> On Mon, Jan 27, 2025, at 3:41 PM, Štefan Miklošovič wrote:
>>>
>>> "we take "unclaimed" items and move them to their own InternalConfig.java 
>>> or something"
>>>
>>> This is interesting. If we are meant to be still able to put these 
>>> properties into cassandra.yaml (even they are "internal ones") and they 
>>> would be just in InternalConfig.java for some basic separation of internal 
>>> / user-facing configuration, then we would need to have two yaml loaders:
>>>
>>> 1) the one as we have now which loads cassandra.yaml it into Config.java
>>> 2) the second one which would load cassandra.yaml into InternalConfig.java
>>>
>>> For both cases, we could not fail when there are unrecognized properties in 
>>> cassandra.yaml while parsing it (1), because every loader, for Config.java 
>>> as well as InternalConfig.java, is parsing just some "subset" of yaml.
>>>
>>> (1) 
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/config/YamlConfigurationLoader.java#L443-L444
>>>
>>> If we just had "public InternalConfig internal = new InternalConfig" as a 
>>> field in Config.java, then this would lead to properties being effectively 
>>> renamed in cassandra.yaml like
>>>
>>> internal:
>>>     some_currently_internal_property: false
>>>
>>> instead of just
>>>
>>> some_currently_internal_property: false
>>>
>>> I do not think we want to have them renamed / under different configuration 
>>> sections in yaml. I get that they are "internal" etc but we just don't know 
>>> how / where it is used and deployed and just blindly renaming them is not a 
>>> good idea imho.
>>>
>>> On Mon, Jan 27, 2025 at 8:46 PM Josh McKenzie <[email protected]> wrote:
>>>
>>>
>>> This may be an off-base comparison, but this reminds me of struggles we've 
>>> had getting to 0 failing unit tests before and the debates on fencing off a 
>>> snapshot of the current "failure set" so you can have a set point where no 
>>> further degradation is allowed in a primary data set.
>>>
>>> All of which is to say - maybe at the end of the spreadsheet, we take 
>>> "unclaimed" items and move them to their own InternalConfig.java or 
>>> something and add an ant target that a) disallows further addition to 
>>> InternalConfig.java w/out throwing an error / needing whitelist update, and 
>>> b) disallows further regression in the Config.java <-> cassandra.yaml 
>>> relationship for non-annotated fields.
>>>
>>> That way we can at least halt the progression of the disease even if we're 
>>> stymied on cleaning up some of the existing symptoms.
>>>
>>> On Mon, Jan 27, 2025, at 1:38 PM, Štefan Miklošovič wrote:
>>>
>>> Indeed, we need to balance that and thoughtfully choose what is going to be 
>>> added and what not. However, we should not hide something which is meant to 
>>> be tweaked by a user. The config is intimidating mostly because everything 
>>> is just in one file. I merely remember discussions a few years ago which 
>>> were about splitting cassandra.yaml into multiple files which would be 
>>> focused just on one subsystem / would cover some logically isolated domain.
>>>
>>> Anyway, I think the main goal of this effort for now would be to at least 
>>> map where we are at. Some of them are genuinely missing. E.g. guardrails, 
>>> how is a user meant to know about that if it is not even documented ...
>>>
>>> On Mon, Jan 27, 2025 at 6:16 PM Chris lohfink <[email protected]> wrote:
>>>
>>> Might be a bit of a balance between exposing what people actually are 
>>> likely to need to modify vs having a super intimidating config file. It's 
>>> already nearly 2000 lines. Personally I'd rather see some 
>>> auto-documentation or something that's in the docs than an effort to 
>>> manually add another 1000 lines.
>>>
>>> Chris
>>>
>>> On Fri, Jan 24, 2025 at 9:41 AM Dmitry Konstantinov <[email protected]> 
>>> wrote:
>>>
>>> Maybe I missed some patterns but it looks like a pretty good estimation, I 
>>> did like 10 random checks manually to verify :-)
>>> I will try to make an ant target with a similar logic (hopefully, during 
>>> the weekend)
>>> I will create a ticket to track this activity (to share attachments there 
>>> to not overload the thread with such outputs in future).
>>>
>>> On Fri, 24 Jan 2025 at 15:37, Štefan Miklošovič <[email protected]> 
>>> wrote:
>>>
>>> Oh my god, 112? :DD I was thinking it would be less than 10.
>>>
>>> Anyway, I think we need to integrate this to some ant target. If you 
>>> expanded on this, that would be great.
>>>
>>> On Fri, Jan 24, 2025 at 4:31 PM Dmitry Konstantinov <[email protected]> 
>>> wrote:
>>>
>>> A very primitive implementation of the 1st idea below:
>>>
>>> String configUrl = 
>>> "file:///Users/dmitry/IdeaProjects/cassandra-trunk/conf/cassandra.yaml";
>>> Field[] allFields = Config.class.getFields();
>>> List<String> topLevelPropertyNames = new ArrayList<>();
>>> for(Field field : allFields)
>>> {
>>>     if (!Modifier.isStatic(field.getModifiers()))
>>>     {
>>>         topLevelPropertyNames.add(field.getName());
>>>     }
>>> }
>>>
>>> URL url = new URL(configUrl);
>>> List<String> lines = Files.readAllLines(Paths.get(url.toURI()));
>>>
>>> int missedCount = 0;
>>> for (String propertyName : topLevelPropertyNames)
>>> {
>>>     boolean found = false;
>>>     for (String line : lines)
>>>     {
>>>         if (line.startsWith(propertyName + ":")
>>>             || line.startsWith("#" + propertyName + ":")
>>>             || line.startsWith("# " + propertyName + ":")) {
>>>             found = true;
>>>             break;
>>>         }
>>>     }
>>>     if (!found)
>>>     {
>>>         missedCount++;
>>>         System.out.println(propertyName);
>>>     }
>>> }
>>> System.out.println("Total missed:" + missedCount);
>>>
>>>
>>> It prints the following config property names which are defined in 
>>> Config.java but not present as "property" or "# property " in a file:
>>>
>>> permissions_cache_max_entries
>>> roles_cache_max_entries
>>> credentials_cache_max_entries
>>> auto_bootstrap
>>> force_new_prepared_statement_behaviour
>>> use_deterministic_table_id
>>> repair_request_timeout
>>> stream_transfer_task_timeout
>>> cms_await_timeout
>>> cms_default_max_retries
>>> cms_default_retry_backoff
>>> epoch_aware_debounce_inflight_tracker_max_size
>>> metadata_snapshot_frequency
>>> available_processors
>>> repair_session_max_tree_depth
>>> use_offheap_merkle_trees
>>> internode_max_message_size
>>> native_transport_max_message_size
>>> native_transport_max_request_data_in_flight_per_ip
>>> native_transport_max_request_data_in_flight
>>> native_transport_receive_queue_capacity
>>> min_free_space_per_drive
>>> max_space_usable_for_compactions_in_percentage
>>> reject_repair_compaction_threshold
>>> concurrent_index_builders
>>> max_streaming_retries
>>> commitlog_max_compression_buffers_in_pool
>>> max_mutation_size
>>> dynamic_snitch
>>> failure_detector
>>> use_creation_time_for_hint_ttl
>>> key_cache_migrate_during_compaction
>>> key_cache_invalidate_after_sstable_deletion
>>> paxos_cache_size
>>> file_cache_round_up
>>> disk_optimization_estimate_percentile
>>> disk_optimization_page_cross_chance
>>> purgeable_tobmstones_metric_granularity
>>> windows_timer_interval
>>> otc_coalescing_strategy
>>> otc_coalescing_window_us
>>> otc_coalescing_enough_coalesced_messages
>>> otc_backlog_expiration_interval_ms
>>> scripted_user_defined_functions_enabled
>>> user_defined_functions_threads_enabled
>>> allow_insecure_udfs
>>> allow_extra_insecure_udfs
>>> user_defined_functions_warn_timeout
>>> user_defined_functions_fail_timeout
>>> user_function_timeout_policy
>>> back_pressure_enabled
>>> back_pressure_strategy
>>> repair_command_pool_full_strategy
>>> repair_command_pool_size
>>> block_for_peers_timeout_in_secs
>>> block_for_peers_in_remote_dcs
>>> skip_stream_disk_space_check
>>> snapshot_on_repaired_data_mismatch
>>> validation_preview_purge_head_start
>>> initial_range_tombstone_list_allocation_size
>>> range_tombstone_list_growth_factor
>>> snapshot_on_duplicate_row_detection
>>> check_for_duplicate_rows_during_reads
>>> check_for_duplicate_rows_during_compaction
>>> autocompaction_on_startup_enabled
>>> auto_optimise_inc_repair_streams
>>> auto_optimise_full_repair_streams
>>> auto_optimise_preview_repair_streams
>>> consecutive_message_errors_threshold
>>> internode_error_reporting_exclusions
>>> compact_tables_enabled
>>> vector_type_enabled
>>> intersect_filtering_query_warned
>>> intersect_filtering_query_enabled
>>> streaming_slow_events_log_timeout
>>> repair_state_expires
>>> repair_state_size
>>> paxos_variant
>>> skip_paxos_repair_on_topology_change
>>> paxos_purge_grace_period
>>> paxos_on_linearizability_violations
>>> paxos_state_purging
>>> paxos_repair_enabled
>>> paxos_topology_repair_no_dc_checks
>>> paxos_topology_repair_strict_each_quorum
>>> skip_paxos_repair_on_topology_change_keyspaces
>>> paxos_contention_wait_randomizer
>>> paxos_contention_min_wait
>>> paxos_contention_max_wait
>>> paxos_contention_min_delta
>>> paxos_repair_parallelism
>>> sstable_read_rate_persistence_enabled
>>> client_request_size_metrics_enabled
>>> max_top_size_partition_count
>>> max_top_tombstone_partition_count
>>> min_tracked_partition_size
>>> min_tracked_partition_tombstone_count
>>> top_partitions_enabled
>>> severity_during_decommission
>>> progress_barrier_min_consistency_level
>>> progress_barrier_default_consistency_level
>>> progress_barrier_timeout
>>> progress_barrier_backoff
>>> discovery_timeout
>>> unsafe_tcm_mode
>>> cql_start_time
>>> native_transport_throw_on_overload
>>> native_transport_queue_max_item_age_threshold
>>> native_transport_min_backoff_on_queue_overload
>>> native_transport_max_backoff_on_queue_overload
>>> native_transport_timeout
>>> enforce_native_deadline_for_hints
>>> Total missed:112
>>>
>>>
>>>
>>> On Fri, 24 Jan 2025 at 15:10, Štefan Miklošovič <[email protected]> 
>>> wrote:
>>>
>>> It should also work the other way around. If there is a property which is 
>>> commented out in yaml and it is not in Config.java, that should fail as 
>>> well. If it is not commented out and it is not in Config.java, that will 
>>> fail in runtime as it fails on unrecognized property.
>>>
>>> This will be used in practice very rarely as we seldom remove the 
>>> properties in Config but if we do and a property is commented out, we 
>>> should not ship a dead property name, even commented out.
>>>
>>> On Fri, Jan 24, 2025 at 3:51 PM Paulo Motta <[email protected]> wrote:
>>>
>>> >  >  If "# my_cool_property: true" is NOT in cassandra.yaml, we might 
>>> > indeed add it, also commented out. I think it would be quite easy to 
>>> > check against yaml if there is a line starting on "# my_cool_property" or 
>>> > just on "my_cool_property". Both cases would satisfy the check.
>>>
>>> Makes sense, I think this would be good to have as a lint or test to easily 
>>> catch overlooks during review.
>>>
>>> On Fri, Jan 24, 2025 at 9:44 AM Štefan Miklošovič <[email protected]> 
>>> wrote:
>>>
>>>
>>>
>>> On Fri, Jan 24, 2025 at 3:27 PM Paulo Motta <[email protected]> wrote:
>>>
>>> > from time to time I see configuration properties in Config.java and they 
>>> > are clearly not in cassandra.yaml. Not every property in Config is in 
>>> > cassandra.yaml. I would like to know if there is some specific reason 
>>> > behind that.
>>>
>>> I think one of the original reasons was to "hide" advanced configs that are 
>>> not meant to be updated, unless in very niche circumstances. However I 
>>> think this has been extrapolated to non-advanced settings.
>>>
>>> > Question related to that is if we could not have a build-time check that 
>>> > all properties in Config have to be in cassandra.yaml and fail the build 
>>> > if a property in Config does not have its counterpart in yaml.
>>>
>>> Are you saying every configuration property should be commented-out, or do 
>>> you think that every Config property should be specified in cassandra.yaml 
>>> with their default uncomented ? One issue with that is that you could cause 
>>> user confusion if you "reveal" a niche/advanced config that is not meant to 
>>> be updated. I think this would be addressed by the @HiddenInYaml flag you 
>>> are proposing in a later post.
>>>
>>>
>>> Yes, then can stay hidden, but we should annotate it with @Hidden or 
>>> similar. As of now, if that property is not in yaml, we just don't know if 
>>> it was forgotten to be added or if we have not added it on purpose.
>>>
>>> They can keep being commented out if they currently are. Imagine a property 
>>> in Config.java
>>>
>>> public boolean my_cool_property = true;
>>>
>>> and then this in cassandra.yaml
>>>
>>> # my_cool_property: true
>>>
>>> It is completely ok.
>>>
>>> If "# my_cool_property: true" is NOT in cassandra.yaml, we might indeed add 
>>> it, also commented out. I think it would be quite easy to check against 
>>> yaml if there is a line starting on "# my_cool_property" or just on 
>>> "my_cool_property". Both cases would satisfy the check.
>>>
>>>
>>>
>>> > There are dozens of properties in Config and I have a strong suspicion 
>>> > that we missed to publish some to yaml so users do not even know such a 
>>> > property exists and as of now we do not even know which they are.
>>>
>>> I believe this is a problem. I think most properties should be in 
>>> cassandra.yaml, unless they are very advanced or not meant to be updated.
>>>
>>> Another tangential issue is that there are features/settings that don't 
>>> even have a Config entry, but are just controlled by JVM properties.
>>>
>>> I think that we should attempt to unify Config and jvm properties under a 
>>> predictable structure. For example, if there is a YAML config 
>>> enable_user_defined_functions, then there should be a respective JVM flag 
>>> -Dcassandra.enable_user_defined_functions, and vice versa.
>>>
>>>
>>> Yeah, good idea.
>>>
>>>
>>>
>>> On Fri, Jan 24, 2025 at 9:16 AM Štefan Miklošovič <[email protected]> 
>>> wrote:
>>>
>>> Hello,
>>>
>>> from time to time I see configuration properties in Config.java and they 
>>> are clearly not in cassandra.yaml. Not every property in Config is in 
>>> cassandra.yaml. I would like to know if there is some specific reason 
>>> behind that.
>>>
>>> Question related to that is if we could not have a build-time check that 
>>> all properties in Config have to be in cassandra.yaml and fail the build if 
>>> a property in Config does not have its counterpart in yaml.
>>>
>>> There are dozens of properties in Config and I have a strong suspicion that 
>>> we missed to publish some to yaml so users do not even know such a property 
>>> exists and as of now we do not even know which they are.
>>>
>>>
>>>
>>> --
>>> Dmitry Konstantinov
>>>
>>>
>>>
>>> --
>>> Dmitry Konstantinov
>>>
>>>
>>>
>>
>>
>> --
>> Dmitry Konstantinov

Reply via email to