So if you forget to update cassandra.yaml you fail the build?  Makes sense to 
me.

One additional thing I would like to see is the reverse… did you put something 
in yaml that isn’t in Config?  This is a bug I have seen a few times…. Mostly 
because people don’t know the rules from SnakeYAML so are not aware that their 
property isn’t exposed (not something I have seen but know about, if you have a 
getter without a setter its “exposed” to the vtable but not yaml...).

> On Jan 24, 2025, at 8:25 AM, Bernardo Botella <conta...@bernardobotella.com> 
> wrote:
> 
> Love the suggestion of marking the hidden/advanced configuration properties 
> with annotations. Leaving a “configuration” property out of the main 
> configuration file should be deliberate and well thought and argued. I highly 
> doubt we have 112 “advanced” properties that really need to be hidden to 
> protect users from themselves :-(
> 
> Agreed with Ekaterina that it is worth reviewing the current properties and 
> come up with the subset that should stay hidden, and expose the rest on the 
> yaml file.
> 
> 
>> On Jan 24, 2025, at 8:00 AM, Dmitry Konstantinov <netud...@gmail.com> wrote:
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-20249
>> 
>> On Fri, 24 Jan 2025 at 15:40, Dmitry Konstantinov <netud...@gmail.com 
>> <mailto:netud...@gmail.com>> wrote:
>>> Maybe I missed some patterns but it looks like a pretty good estimation, I 
>>> did like 10 random checks manually to verify :-)
>>> I will try to make an ant target with a similar logic (hopefully, during 
>>> the weekend)
>>> I will create a ticket to track this activity (to share attachments there 
>>> to not overload the thread with such outputs in future).
>>> 
>>> On Fri, 24 Jan 2025 at 15:37, Štefan Miklošovič <smikloso...@apache.org 
>>> <mailto:smikloso...@apache.org>> wrote:
>>>> Oh my god, 112? :DD I was thinking it would be less than 10.
>>>> 
>>>> Anyway, I think we need to integrate this to some ant target. If you 
>>>> expanded on this, that would be great.
>>>> 
>>>> On Fri, Jan 24, 2025 at 4:31 PM Dmitry Konstantinov <netud...@gmail.com 
>>>> <mailto:netud...@gmail.com>> wrote:
>>>>> A very primitive implementation of the 1st idea below:
>>>>> 
>>>>> String configUrl = 
>>>>> "file:///Users/dmitry/IdeaProjects/cassandra-trunk/conf/cassandra.yaml";
>>>>> Field[] allFields = Config.class.getFields();
>>>>> List<String> topLevelPropertyNames = new ArrayList<>();
>>>>> for(Field field : allFields)
>>>>> {
>>>>>     if (!Modifier.isStatic(field.getModifiers()))
>>>>>     {
>>>>>         topLevelPropertyNames.add(field.getName());
>>>>>     }
>>>>> }
>>>>> 
>>>>> URL url = new URL(configUrl);
>>>>> List<String> lines = Files.readAllLines(Paths.get(url.toURI()));
>>>>> 
>>>>> int missedCount = 0;
>>>>> for (String propertyName : topLevelPropertyNames)
>>>>> {
>>>>>     boolean found = false;
>>>>>     for (String line : lines)
>>>>>     {
>>>>>         if (line.startsWith(propertyName + ":")
>>>>>             || line.startsWith("#" + propertyName + ":")
>>>>>             || line.startsWith("# " + propertyName + ":")) {
>>>>>             found = true;
>>>>>             break;
>>>>>         }
>>>>>     }
>>>>>     if (!found)
>>>>>     {
>>>>>         missedCount++;
>>>>>         System.out.println(propertyName);
>>>>>     }
>>>>> }
>>>>> System.out.println("Total missed:" + missedCount);
>>>>> 
>>>>> It prints the following config property names which are defined in 
>>>>> Config.java but not present as "property" or "# property " in a file:
>>>>> permissions_cache_max_entries
>>>>> roles_cache_max_entries
>>>>> credentials_cache_max_entries
>>>>> auto_bootstrap
>>>>> force_new_prepared_statement_behaviour
>>>>> use_deterministic_table_id
>>>>> repair_request_timeout
>>>>> stream_transfer_task_timeout
>>>>> cms_await_timeout
>>>>> cms_default_max_retries
>>>>> cms_default_retry_backoff
>>>>> epoch_aware_debounce_inflight_tracker_max_size
>>>>> metadata_snapshot_frequency
>>>>> available_processors
>>>>> repair_session_max_tree_depth
>>>>> use_offheap_merkle_trees
>>>>> internode_max_message_size
>>>>> native_transport_max_message_size
>>>>> native_transport_max_request_data_in_flight_per_ip
>>>>> native_transport_max_request_data_in_flight
>>>>> native_transport_receive_queue_capacity
>>>>> min_free_space_per_drive
>>>>> max_space_usable_for_compactions_in_percentage
>>>>> reject_repair_compaction_threshold
>>>>> concurrent_index_builders
>>>>> max_streaming_retries
>>>>> commitlog_max_compression_buffers_in_pool
>>>>> max_mutation_size
>>>>> dynamic_snitch
>>>>> failure_detector
>>>>> use_creation_time_for_hint_ttl
>>>>> key_cache_migrate_during_compaction
>>>>> key_cache_invalidate_after_sstable_deletion
>>>>> paxos_cache_size
>>>>> file_cache_round_up
>>>>> disk_optimization_estimate_percentile
>>>>> disk_optimization_page_cross_chance
>>>>> purgeable_tobmstones_metric_granularity
>>>>> windows_timer_interval
>>>>> otc_coalescing_strategy
>>>>> otc_coalescing_window_us
>>>>> otc_coalescing_enough_coalesced_messages
>>>>> otc_backlog_expiration_interval_ms
>>>>> scripted_user_defined_functions_enabled
>>>>> user_defined_functions_threads_enabled
>>>>> allow_insecure_udfs
>>>>> allow_extra_insecure_udfs
>>>>> user_defined_functions_warn_timeout
>>>>> user_defined_functions_fail_timeout
>>>>> user_function_timeout_policy
>>>>> back_pressure_enabled
>>>>> back_pressure_strategy
>>>>> repair_command_pool_full_strategy
>>>>> repair_command_pool_size
>>>>> block_for_peers_timeout_in_secs
>>>>> block_for_peers_in_remote_dcs
>>>>> skip_stream_disk_space_check
>>>>> snapshot_on_repaired_data_mismatch
>>>>> validation_preview_purge_head_start
>>>>> initial_range_tombstone_list_allocation_size
>>>>> range_tombstone_list_growth_factor
>>>>> snapshot_on_duplicate_row_detection
>>>>> check_for_duplicate_rows_during_reads
>>>>> check_for_duplicate_rows_during_compaction
>>>>> autocompaction_on_startup_enabled
>>>>> auto_optimise_inc_repair_streams
>>>>> auto_optimise_full_repair_streams
>>>>> auto_optimise_preview_repair_streams
>>>>> consecutive_message_errors_threshold
>>>>> internode_error_reporting_exclusions
>>>>> compact_tables_enabled
>>>>> vector_type_enabled
>>>>> intersect_filtering_query_warned
>>>>> intersect_filtering_query_enabled
>>>>> streaming_slow_events_log_timeout
>>>>> repair_state_expires
>>>>> repair_state_size
>>>>> paxos_variant
>>>>> skip_paxos_repair_on_topology_change
>>>>> paxos_purge_grace_period
>>>>> paxos_on_linearizability_violations
>>>>> paxos_state_purging
>>>>> paxos_repair_enabled
>>>>> paxos_topology_repair_no_dc_checks
>>>>> paxos_topology_repair_strict_each_quorum
>>>>> skip_paxos_repair_on_topology_change_keyspaces
>>>>> paxos_contention_wait_randomizer
>>>>> paxos_contention_min_wait
>>>>> paxos_contention_max_wait
>>>>> paxos_contention_min_delta
>>>>> paxos_repair_parallelism
>>>>> sstable_read_rate_persistence_enabled
>>>>> client_request_size_metrics_enabled
>>>>> max_top_size_partition_count
>>>>> max_top_tombstone_partition_count
>>>>> min_tracked_partition_size
>>>>> min_tracked_partition_tombstone_count
>>>>> top_partitions_enabled
>>>>> severity_during_decommission
>>>>> progress_barrier_min_consistency_level
>>>>> progress_barrier_default_consistency_level
>>>>> progress_barrier_timeout
>>>>> progress_barrier_backoff
>>>>> discovery_timeout
>>>>> unsafe_tcm_mode
>>>>> cql_start_time
>>>>> native_transport_throw_on_overload
>>>>> native_transport_queue_max_item_age_threshold
>>>>> native_transport_min_backoff_on_queue_overload
>>>>> native_transport_max_backoff_on_queue_overload
>>>>> native_transport_timeout
>>>>> enforce_native_deadline_for_hints
>>>>> Total missed:112
>>>>> 
>>>>> 
>>>>> On Fri, 24 Jan 2025 at 15:10, Štefan Miklošovič <smikloso...@apache.org 
>>>>> <mailto:smikloso...@apache.org>> wrote:
>>>>>> It should also work the other way around. If there is a property which 
>>>>>> is commented out in yaml and it is not in Config.java, that should fail 
>>>>>> as well. If it is not commented out and it is not in Config.java, that 
>>>>>> will fail in runtime as it fails on unrecognized property.
>>>>>> 
>>>>>> This will be used in practice very rarely as we seldom remove the 
>>>>>> properties in Config but if we do and a property is commented out, we 
>>>>>> should not ship a dead property name, even commented out. 
>>>>>> 
>>>>>> On Fri, Jan 24, 2025 at 3:51 PM Paulo Motta <pa...@apache.org 
>>>>>> <mailto:pa...@apache.org>> wrote:
>>>>>>> >  >  If "# my_cool_property: true" is NOT in cassandra.yaml, we might 
>>>>>>> > indeed add it, also commented out. I think it would be quite easy to 
>>>>>>> > check against yaml if there is a line starting on "# 
>>>>>>> > my_cool_property" or just on "my_cool_property". Both cases would 
>>>>>>> > satisfy the check.
>>>>>>> 
>>>>>>> Makes sense, I think this would be good to have as a lint or test to 
>>>>>>> easily catch overlooks during review.
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Jan 24, 2025 at 9:44 AM Štefan Miklošovič 
>>>>>>> <smikloso...@apache.org <mailto:smikloso...@apache.org>> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Jan 24, 2025 at 3:27 PM Paulo Motta <pa...@apache.org 
>>>>>>>> <mailto:pa...@apache.org>> wrote:
>>>>>>>>> > from time to time I see configuration properties in Config.java and 
>>>>>>>>> > they are clearly not in cassandra.yaml. Not every property in 
>>>>>>>>> > Config is in cassandra.yaml. I would like to know if there is some 
>>>>>>>>> > specific reason behind that.
>>>>>>>>> 
>>>>>>>>> I think one of the original reasons was to "hide" advanced configs 
>>>>>>>>> that are not meant to be updated, unless in very niche circumstances. 
>>>>>>>>> However I think this has been extrapolated to non-advanced settings.
>>>>>>>>> 
>>>>>>>>> > Question related to that is if we could not have a build-time check 
>>>>>>>>> > that all properties in Config have to be in cassandra.yaml and fail 
>>>>>>>>> > the build if a property in Config does not have its counterpart in 
>>>>>>>>> > yaml.
>>>>>>>>> 
>>>>>>>>> Are you saying every configuration property should be commented-out, 
>>>>>>>>> or do you think that every Config property should be specified in 
>>>>>>>>> cassandra.yaml with their default uncomented ? One issue with that is 
>>>>>>>>> that you could cause user confusion if you "reveal" a niche/advanced 
>>>>>>>>> config that is not meant to be updated. I think this would be 
>>>>>>>>> addressed by the @HiddenInYaml flag you are proposing in a later post.
>>>>>>>> 
>>>>>>>> Yes, then can stay hidden, but we should annotate it with @Hidden or 
>>>>>>>> similar. As of now, if that property is not in yaml, we just don't 
>>>>>>>> know if it was forgotten to be added or if we have not added it on 
>>>>>>>> purpose.
>>>>>>>> 
>>>>>>>> They can keep being commented out if they currently are. Imagine a 
>>>>>>>> property in Config.java
>>>>>>>> 
>>>>>>>> public boolean my_cool_property = true;
>>>>>>>> 
>>>>>>>> and then this in cassandra.yaml
>>>>>>>> 
>>>>>>>> # my_cool_property: true
>>>>>>>> 
>>>>>>>> It is completely ok.
>>>>>>>> 
>>>>>>>> If "# my_cool_property: true" is NOT in cassandra.yaml, we might 
>>>>>>>> indeed add it, also commented out. I think it would be quite easy to 
>>>>>>>> check against yaml if there is a line starting on "# my_cool_property" 
>>>>>>>> or just on "my_cool_property". Both cases would satisfy the check.
>>>>>>>> 
>>>>>>>>  
>>>>>>>>> > There are dozens of properties in Config and I have a strong 
>>>>>>>>> > suspicion that we missed to publish some to yaml so users do not 
>>>>>>>>> > even know such a property exists and as of now we do not even know 
>>>>>>>>> > which they are.
>>>>>>>>> 
>>>>>>>>> I believe this is a problem. I think most properties should be in 
>>>>>>>>> cassandra.yaml, unless they are very advanced or not meant to be 
>>>>>>>>> updated.
>>>>>>>>> 
>>>>>>>>> Another tangential issue is that there are features/settings that 
>>>>>>>>> don't even have a Config entry, but are just controlled by JVM 
>>>>>>>>> properties.
>>>>>>>>> 
>>>>>>>>> I think that we should attempt to unify Config and jvm properties 
>>>>>>>>> under a predictable structure. For example, if there is a YAML config 
>>>>>>>>> enable_user_defined_functions, then there should be a respective JVM 
>>>>>>>>> flag -Dcassandra.enable_user_defined_functions, and vice versa.
>>>>>>>> 
>>>>>>>> Yeah, good idea.
>>>>>>>>  
>>>>>>>>> 
>>>>>>>>> On Fri, Jan 24, 2025 at 9:16 AM Štefan Miklošovič 
>>>>>>>>> <smikloso...@apache.org <mailto:smikloso...@apache.org>> wrote:
>>>>>>>>>> Hello,
>>>>>>>>>> 
>>>>>>>>>> from time to time I see configuration properties in Config.java and 
>>>>>>>>>> they are clearly not in cassandra.yaml. Not every property in Config 
>>>>>>>>>> is in cassandra.yaml. I would like to know if there is some specific 
>>>>>>>>>> reason behind that.
>>>>>>>>>> 
>>>>>>>>>> Question related to that is if we could not have a build-time check 
>>>>>>>>>> that all properties in Config have to be in cassandra.yaml and fail 
>>>>>>>>>> the build if a property in Config does not have its counterpart in 
>>>>>>>>>> yaml.
>>>>>>>>>> 
>>>>>>>>>> There are dozens of properties in Config and I have a strong 
>>>>>>>>>> suspicion that we missed to publish some to yaml so users do not 
>>>>>>>>>> even know such a property exists and as of now we do not even know 
>>>>>>>>>> which they are.
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Dmitry Konstantinov
>>> 
>>> 
>>> 
>>> --
>>> Dmitry Konstantinov
>> 
>> 
>> 
>> --
>> Dmitry Konstantinov
> 

Reply via email to