Re: [DISCUSS] Nested YAML configs for new features
The problem with scoping this to “features” is that we end up with at best local coherence. The config file as a whole will end up just as incoherent through its design evolution as it has historically. If you take a look at my proposed layout for the overall config, there is a “limits” section that specifies thresholds for reporting warnings and errors for various scenario. In this case, we probably don’t also want per-feature limits? We’re also mixing terminology already, with limits/thresholds and fail/abort. It’s a lot of work to come up with a coherent and intuitive config layout. We probably want to at least create some documentation in-tree stipulating terminology with respect to plurals, verbs/nouns, and specific terms (period, abort, limit, datacenter vs dc, etc), but ideally we would have a common end goal for the config file. > leave non-features to CASSANDRA-15234 IMO 15234 has sailed – it’s been held up for a long time, and was brought to this list for discussion with no engagement. Ekaterina is long overdue being able to commit her work. From: David Capwell Date: Monday, 29 November 2021 at 23:44 To: dev@cassandra.apache.org Subject: Re: [DISCUSS] Nested YAML configs for new features > but I would hate to repeat the mistakes of our past by evolving the config > in a new direction without any coherent overarching design. At the start I asked to keep the thread local to new features, but to more flesh out an “overarching design” maybe we should increase the “desired” scope to be “feature” (and leave non-features to CASSANDRA-15234 - Standardise config and JVM parameters)? Aka, do we think the following is more ideal (configs scoped to a feature) hinted_handoff: enabled: true disabled_datacenters: - DC1 - DC2 max_window: 3h flush_period: 10s max_file_size: 128mb compression: class_name: LZ4Compressor parameters: a: b track_warnings: enabled: true local_read_size: warn_threshold: 1mb abort_threshold: 10mb coordinator_read_size: warn_threshold: 5mb abort_threshold: 20mb OR # I had to rename hint configs as there was 0 consistent naming hinted_handoff_enabled: true hinted_handoff_disabled_datacenters: - 'DC1' - 'DC2' hinted_handoff_max_window: 3h hinted_handoff_max_file_size: 128mb hinted_handoff_flush_period: 10s hinted_handoff_compression: class_name: LZ4Compressor parameters: a: b track_warnings_enabled: true track_warnings_local_read_size_warn_threshold: 1mb track_warnings_local_read_size_abort_threshold: 10mb track_warnings_coordinator_read_size_warn_threshold: 5mb track_warnings_coordinator_read_size_abort_threshold: 20mb The main issue I have with flat structure is that we have no way to enforce standard naming; if you look at the hint example there were at least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can we actually maintain that?). And one of the core reasons track_warnings went nested was that warn/abort some times became warn/fail and threshold some times was thresholds…. By embracing nested structure we can actually enforce consistency, with flat we have no way to maintain consistency. Additionally by embracing the nested structure we can accept a flat one as well (PR in CASSANDRA-17166 shows this working) if users desire it; so we get the consistency of nested, and the “grep” benefits of flat. > On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote: > > If we’re thinking of moving towards nested configuration, then before > employing the approach further we would ideally consider what a fully nested > config looks like for the project. Ekaterina has done a lot to clean up > inconsistent naming, but I would hate to repeat the mistakes of our past by > evolving the config in a new direction without any coherent overarching > design. > > In case anyone missed it in the earlier discussion, this was my attempt to > prototype a nested config: > https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml > > I don’t have any specific attachment to it, but settling on some approximate > scheme would be helpful IMO. > > From: David Capwell > Date: Monday, 29 November 2021 at 20:38 > To: dev@cassandra.apache.org > Subject: Re: [DISCUSS] Nested YAML configs for new features >> What should our default example cassandra.yaml file use (flat or nested)? >> Currently default shows nested > > Was told this statement was confusing, so trying to clarify. At the moment > we do not allow a nested config to be expressed in any way outside of nesting > it (excluding YAML’s ability to inline objects), so if we did allow flat > config representation of nested configs, then this would be a brand new > feature; we currently show the nested structure in cassandra.yaml > >> On Nov 29, 2021, at 11:58 AM, David Capwell >> wrote: >> >> Thanks everyone for the comments, I hope below is a good summary of
Re: [DISCUSS] Nested YAML configs for new features
“ IMO 15234 has sailed – it’s been held up for a long time, and was brought to this list for discussion with no engagement. Ekaterina is long overdue being able to commit her work. “ Sailed? I submitted the patch a week ago for review. Not sure how to understand this statement. Can elaborate, please? On Tue, 30 Nov 2021 at 8:09, bened...@apache.org wrote: > The problem with scoping this to “features” is that we end up with at best > local coherence. The config file as a whole will end up just as incoherent > through its design evolution as it has historically. > > If you take a look at my proposed layout for the overall config, there is > a “limits” section that specifies thresholds for reporting warnings and > errors for various scenario. In this case, we probably don’t also want > per-feature limits? We’re also mixing terminology already, with > limits/thresholds and fail/abort. > > It’s a lot of work to come up with a coherent and intuitive config layout. > We probably want to at least create some documentation in-tree stipulating > terminology with respect to plurals, verbs/nouns, and specific terms > (period, abort, limit, datacenter vs dc, etc), but ideally we would have a > common end goal for the config file. > > > leave non-features to CASSANDRA-15234 > > IMO 15234 has sailed – it’s been held up for a long time, and was brought > to this list for discussion with no engagement. Ekaterina is long overdue > being able to commit her work. > > > From: David Capwell > Date: Monday, 29 November 2021 at 23:44 > To: dev@cassandra.apache.org > Subject: Re: [DISCUSS] Nested YAML configs for new features > > but I would hate to repeat the mistakes of our past by evolving the > config in a new direction without any coherent overarching design. > > At the start I asked to keep the thread local to new features, but to more > flesh out an “overarching design” maybe we should increase the “desired” > scope to be “feature” (and leave non-features to CASSANDRA-15234 - > Standardise config and JVM parameters)? Aka, do we think the following is > more ideal (configs scoped to a feature) > > hinted_handoff: > enabled: true > disabled_datacenters: > - DC1 > - DC2 > max_window: 3h > flush_period: 10s > max_file_size: 128mb > compression: > class_name: LZ4Compressor > parameters: > a: b > > track_warnings: > enabled: true > local_read_size: > warn_threshold: 1mb > abort_threshold: 10mb > coordinator_read_size: > warn_threshold: 5mb > abort_threshold: 20mb > > > OR > > # I had to rename hint configs as there was 0 consistent naming > hinted_handoff_enabled: true > hinted_handoff_disabled_datacenters: > - 'DC1' > - 'DC2' > hinted_handoff_max_window: 3h > hinted_handoff_max_file_size: 128mb > hinted_handoff_flush_period: 10s > hinted_handoff_compression: > class_name: LZ4Compressor > parameters: > a: b > > track_warnings_enabled: true > track_warnings_local_read_size_warn_threshold: 1mb > track_warnings_local_read_size_abort_threshold: 10mb > track_warnings_coordinator_read_size_warn_threshold: 5mb > track_warnings_coordinator_read_size_abort_threshold: 20mb > > > The main issue I have with flat structure is that we have no way to > enforce standard naming; if you look at the hint example there were at > least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can we > actually maintain that?). And one of the core reasons track_warnings went > nested was that warn/abort some times became warn/fail and threshold some > times was thresholds…. By embracing nested structure we can actually > enforce consistency, with flat we have no way to maintain consistency. > > Additionally by embracing the nested structure we can accept a flat one as > well (PR in CASSANDRA-17166 shows this working) if users desire it; so we > get the consistency of nested, and the “grep” benefits of flat. > > > > On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote: > > > > If we’re thinking of moving towards nested configuration, then before > employing the approach further we would ideally consider what a fully > nested config looks like for the project. Ekaterina has done a lot to clean > up inconsistent naming, but I would hate to repeat the mistakes of our past > by evolving the config in a new direction without any coherent overarching > design. > > > > In case anyone missed it in the earlier discussion, this was my attempt > to prototype a nested config: > https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml > > > > I don’t have any specific attachment to it, but settling on some > approximate scheme would be helpful IMO. > > > > From: David Capwell > > Date: Monday, 29 November 2021 at 20:38 > > To: dev@cassandra.apache.org > > Subject: Re: [DISCUSS] Nested YAML configs for new features > >> What should our default example cassandra.yaml file use (flat or > nested)? Currently default shows
Re: [DISCUSS] Nested YAML configs for new features
I mean that it has been waiting for months, is ready to go, and I don’t want to hold you up any longer. From: Ekaterina Dimitrova Date: Tuesday, 30 November 2021 at 13:44 To: dev@cassandra.apache.org Subject: Re: [DISCUSS] Nested YAML configs for new features “ IMO 15234 has sailed – it’s been held up for a long time, and was brought to this list for discussion with no engagement. Ekaterina is long overdue being able to commit her work. “ Sailed? I submitted the patch a week ago for review. Not sure how to understand this statement. Can elaborate, please? On Tue, 30 Nov 2021 at 8:09, bened...@apache.org wrote: > The problem with scoping this to “features” is that we end up with at best > local coherence. The config file as a whole will end up just as incoherent > through its design evolution as it has historically. > > If you take a look at my proposed layout for the overall config, there is > a “limits” section that specifies thresholds for reporting warnings and > errors for various scenario. In this case, we probably don’t also want > per-feature limits? We’re also mixing terminology already, with > limits/thresholds and fail/abort. > > It’s a lot of work to come up with a coherent and intuitive config layout. > We probably want to at least create some documentation in-tree stipulating > terminology with respect to plurals, verbs/nouns, and specific terms > (period, abort, limit, datacenter vs dc, etc), but ideally we would have a > common end goal for the config file. > > > leave non-features to CASSANDRA-15234 > > IMO 15234 has sailed – it’s been held up for a long time, and was brought > to this list for discussion with no engagement. Ekaterina is long overdue > being able to commit her work. > > > From: David Capwell > Date: Monday, 29 November 2021 at 23:44 > To: dev@cassandra.apache.org > Subject: Re: [DISCUSS] Nested YAML configs for new features > > but I would hate to repeat the mistakes of our past by evolving the > config in a new direction without any coherent overarching design. > > At the start I asked to keep the thread local to new features, but to more > flesh out an “overarching design” maybe we should increase the “desired” > scope to be “feature” (and leave non-features to CASSANDRA-15234 - > Standardise config and JVM parameters)? Aka, do we think the following is > more ideal (configs scoped to a feature) > > hinted_handoff: > enabled: true > disabled_datacenters: > - DC1 > - DC2 > max_window: 3h > flush_period: 10s > max_file_size: 128mb > compression: > class_name: LZ4Compressor > parameters: > a: b > > track_warnings: > enabled: true > local_read_size: > warn_threshold: 1mb > abort_threshold: 10mb > coordinator_read_size: > warn_threshold: 5mb > abort_threshold: 20mb > > > OR > > # I had to rename hint configs as there was 0 consistent naming > hinted_handoff_enabled: true > hinted_handoff_disabled_datacenters: > - 'DC1' > - 'DC2' > hinted_handoff_max_window: 3h > hinted_handoff_max_file_size: 128mb > hinted_handoff_flush_period: 10s > hinted_handoff_compression: > class_name: LZ4Compressor > parameters: > a: b > > track_warnings_enabled: true > track_warnings_local_read_size_warn_threshold: 1mb > track_warnings_local_read_size_abort_threshold: 10mb > track_warnings_coordinator_read_size_warn_threshold: 5mb > track_warnings_coordinator_read_size_abort_threshold: 20mb > > > The main issue I have with flat structure is that we have no way to > enforce standard naming; if you look at the hint example there were at > least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can we > actually maintain that?). And one of the core reasons track_warnings went > nested was that warn/abort some times became warn/fail and threshold some > times was thresholds…. By embracing nested structure we can actually > enforce consistency, with flat we have no way to maintain consistency. > > Additionally by embracing the nested structure we can accept a flat one as > well (PR in CASSANDRA-17166 shows this working) if users desire it; so we > get the consistency of nested, and the “grep” benefits of flat. > > > > On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote: > > > > If we’re thinking of moving towards nested configuration, then before > employing the approach further we would ideally consider what a fully > nested config looks like for the project. Ekaterina has done a lot to clean > up inconsistent naming, but I would hate to repeat the mistakes of our past > by evolving the config in a new direction without any coherent overarching > design. > > > > In case anyone missed it in the earlier discussion, this was my attempt > to prototype a nested config: > https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml > > > > I don’t have any specific attachment to it, but settling on some > approximate scheme would be helpful IMO. > > > > From
Re: [DISCUSS] Nested YAML configs for new features
Thank you for confirming as I misread your email at first :-) I had a chat with David last week and I don’t think his plan is reworking of 15234 but incremental improvements on top of it. Regarding config, after spending time cleaning around and looking more into detail my only appeal is: - Centralized management and not 5 places to change things when you add new config so we are less error-prone - Documenting things for people who add new config or for our users (I promised and I will do it for 15234 but it will be good to continue doing it with any further changes down the road) - be careful with breaking changes Thank you Ekaterina On Tue, 30 Nov 2021 at 8:59, bened...@apache.org wrote: > I mean that it has been waiting for months, is ready to go, and I don’t > want to hold you up any longer. > > From: Ekaterina Dimitrova > Date: Tuesday, 30 November 2021 at 13:44 > To: dev@cassandra.apache.org > Subject: Re: [DISCUSS] Nested YAML configs for new features > “ > IMO 15234 has sailed – it’s been held up for a long time, and was brought > to this list for discussion with no engagement. Ekaterina is long overdue > being able to commit her work. “ > > > Sailed? I submitted the patch a week ago for review. Not sure how to > understand this statement. Can elaborate, please? > > On Tue, 30 Nov 2021 at 8:09, bened...@apache.org > wrote: > > > The problem with scoping this to “features” is that we end up with at > best > > local coherence. The config file as a whole will end up just as > incoherent > > through its design evolution as it has historically. > > > > If you take a look at my proposed layout for the overall config, there is > > a “limits” section that specifies thresholds for reporting warnings and > > errors for various scenario. In this case, we probably don’t also want > > per-feature limits? We’re also mixing terminology already, with > > limits/thresholds and fail/abort. > > > > It’s a lot of work to come up with a coherent and intuitive config > layout. > > We probably want to at least create some documentation in-tree > stipulating > > terminology with respect to plurals, verbs/nouns, and specific terms > > (period, abort, limit, datacenter vs dc, etc), but ideally we would have > a > > common end goal for the config file. > > > > > leave non-features to CASSANDRA-15234 > > > > IMO 15234 has sailed – it’s been held up for a long time, and was brought > > to this list for discussion with no engagement. Ekaterina is long overdue > > being able to commit her work. > > > > > > From: David Capwell > > Date: Monday, 29 November 2021 at 23:44 > > To: dev@cassandra.apache.org > > Subject: Re: [DISCUSS] Nested YAML configs for new features > > > but I would hate to repeat the mistakes of our past by evolving the > > config in a new direction without any coherent overarching design. > > > > At the start I asked to keep the thread local to new features, but to > more > > flesh out an “overarching design” maybe we should increase the “desired” > > scope to be “feature” (and leave non-features to CASSANDRA-15234 - > > Standardise config and JVM parameters)? Aka, do we think the following > is > > more ideal (configs scoped to a feature) > > > > hinted_handoff: > > enabled: true > > disabled_datacenters: > > - DC1 > > - DC2 > > max_window: 3h > > flush_period: 10s > > max_file_size: 128mb > > compression: > > class_name: LZ4Compressor > > parameters: > > a: b > > > > track_warnings: > > enabled: true > > local_read_size: > > warn_threshold: 1mb > > abort_threshold: 10mb > > coordinator_read_size: > > warn_threshold: 5mb > > abort_threshold: 20mb > > > > > > OR > > > > # I had to rename hint configs as there was 0 consistent naming > > hinted_handoff_enabled: true > > hinted_handoff_disabled_datacenters: > > - 'DC1' > > - 'DC2' > > hinted_handoff_max_window: 3h > > hinted_handoff_max_file_size: 128mb > > hinted_handoff_flush_period: 10s > > hinted_handoff_compression: > > class_name: LZ4Compressor > > parameters: > > a: b > > > > track_warnings_enabled: true > > track_warnings_local_read_size_warn_threshold: 1mb > > track_warnings_local_read_size_abort_threshold: 10mb > > track_warnings_coordinator_read_size_warn_threshold: 5mb > > track_warnings_coordinator_read_size_abort_threshold: 20mb > > > > > > The main issue I have with flat structure is that we have no way to > > enforce standard naming; if you look at the hint example there were at > > least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can > we > > actually maintain that?). And one of the core reasons track_warnings > went > > nested was that warn/abort some times became warn/fail and threshold some > > times was thresholds…. By embracing nested structure we can actually > > enforce consistency, with flat we have no way to maintain consistency. > > > > Additionally by embracing the nested structure we can accept a flat one > as > > well (PR in CASSANDR