Re: [DISCUSS] Nested YAML configs for new features

2021-11-30 Thread bened...@apache.org
The problem with scoping this to “features” is that we end up with at best 
local coherence. The config file as a whole will end up just as incoherent 
through its design evolution as it has historically.

If you take a look at my proposed layout for the overall config, there is a 
“limits” section that specifies thresholds for reporting warnings and errors 
for various scenario. In this case, we probably don’t also want per-feature 
limits? We’re also mixing terminology already, with limits/thresholds and 
fail/abort.

It’s a lot of work to come up with a coherent and intuitive config layout. We 
probably want to at least create some documentation in-tree stipulating 
terminology with respect to plurals, verbs/nouns, and specific terms (period, 
abort, limit, datacenter vs dc, etc), but ideally we would have a common end 
goal for the config file.

> leave non-features to CASSANDRA-15234

IMO 15234 has sailed – it’s been held up for a long time, and was brought to 
this list for discussion with no engagement. Ekaterina is long overdue being 
able to commit her work.


From: David Capwell 
Date: Monday, 29 November 2021 at 23:44
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Nested YAML configs for new features
>  but I would hate to repeat the mistakes of our past by evolving the config 
> in a new direction without any coherent overarching design.

At the start I asked to keep the thread local to new features, but to more 
flesh out an “overarching design” maybe we should increase the “desired” scope 
to be “feature” (and leave non-features to CASSANDRA-15234 - Standardise config 
and JVM parameters)?  Aka, do we think the following is more ideal (configs 
scoped to a feature)

hinted_handoff:
  enabled: true
  disabled_datacenters:
- DC1
- DC2
  max_window: 3h
  flush_period: 10s
  max_file_size: 128mb
  compression:
class_name: LZ4Compressor
parameters:
  a: b

track_warnings:
  enabled: true
  local_read_size:
warn_threshold: 1mb
abort_threshold: 10mb
  coordinator_read_size:
warn_threshold: 5mb
abort_threshold: 20mb


OR

# I had to rename hint configs as there was 0 consistent naming
hinted_handoff_enabled: true
hinted_handoff_disabled_datacenters:
  - 'DC1'
  - 'DC2'
hinted_handoff_max_window: 3h
hinted_handoff_max_file_size: 128mb
hinted_handoff_flush_period: 10s
hinted_handoff_compression:
  class_name: LZ4Compressor
  parameters:
a: b

track_warnings_enabled: true
track_warnings_local_read_size_warn_threshold: 1mb
track_warnings_local_read_size_abort_threshold: 10mb
track_warnings_coordinator_read_size_warn_threshold: 5mb
track_warnings_coordinator_read_size_abort_threshold: 20mb


The main issue I have with flat structure is that we have no way to enforce 
standard naming; if you look at the hint example there were at least 3 naming 
conventions (CASSANDRA-15234 is to clean this up, but can we actually maintain 
that?).  And one of the core reasons track_warnings went nested was that 
warn/abort some times became warn/fail and threshold some times was 
thresholds…. By embracing nested structure we can actually enforce consistency, 
with flat we have no way to maintain consistency.

Additionally by embracing the nested structure we can accept a flat one as well 
(PR in CASSANDRA-17166 shows this working) if users desire it; so we get the 
consistency of nested, and the “grep” benefits of flat.


> On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote:
>
> If we’re thinking of moving towards nested configuration, then before 
> employing the approach further we would ideally consider what a fully nested 
> config looks like for the project. Ekaterina has done a lot to clean up 
> inconsistent naming, but I would hate to repeat the mistakes of our past by 
> evolving the config in a new direction without any coherent overarching 
> design.
>
> In case anyone missed it in the earlier discussion, this was my attempt to 
> prototype a nested config: 
> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
>
> I don’t have any specific attachment to it, but settling on some approximate 
> scheme would be helpful IMO.
>
> From: David Capwell 
> Date: Monday, 29 November 2021 at 20:38
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Nested YAML configs for new features
>> What should our default example cassandra.yaml file use (flat or nested)?  
>> Currently default shows nested
>
> Was told this statement was confusing, so trying to clarify.  At the moment 
> we do not allow a nested config to be expressed in any way outside of nesting 
> it (excluding YAML’s ability to inline objects), so if we did allow flat 
> config representation of nested configs, then this would be a brand new 
> feature; we currently show the nested structure in cassandra.yaml
>
>> On Nov 29, 2021, at 11:58 AM, David Capwell  
>> wrote:
>>
>> Thanks everyone for the comments, I hope below is a good summary of

Re: [DISCUSS] Nested YAML configs for new features

2021-11-30 Thread Ekaterina Dimitrova
“
IMO 15234 has sailed – it’s been held up for a long time, and was brought
to this list for discussion with no engagement. Ekaterina is long overdue
being able to commit her work. “


 Sailed? I submitted the patch a week ago for review. Not sure how to
understand this statement. Can elaborate, please?

On Tue, 30 Nov 2021 at 8:09, bened...@apache.org 
wrote:

> The problem with scoping this to “features” is that we end up with at best
> local coherence. The config file as a whole will end up just as incoherent
> through its design evolution as it has historically.
>
> If you take a look at my proposed layout for the overall config, there is
> a “limits” section that specifies thresholds for reporting warnings and
> errors for various scenario. In this case, we probably don’t also want
> per-feature limits? We’re also mixing terminology already, with
> limits/thresholds and fail/abort.
>
> It’s a lot of work to come up with a coherent and intuitive config layout.
> We probably want to at least create some documentation in-tree stipulating
> terminology with respect to plurals, verbs/nouns, and specific terms
> (period, abort, limit, datacenter vs dc, etc), but ideally we would have a
> common end goal for the config file.
>
> > leave non-features to CASSANDRA-15234
>
> IMO 15234 has sailed – it’s been held up for a long time, and was brought
> to this list for discussion with no engagement. Ekaterina is long overdue
> being able to commit her work.
>
>
> From: David Capwell 
> Date: Monday, 29 November 2021 at 23:44
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >  but I would hate to repeat the mistakes of our past by evolving the
> config in a new direction without any coherent overarching design.
>
> At the start I asked to keep the thread local to new features, but to more
> flesh out an “overarching design” maybe we should increase the “desired”
> scope to be “feature” (and leave non-features to CASSANDRA-15234 -
> Standardise config and JVM parameters)?  Aka, do we think the following is
> more ideal (configs scoped to a feature)
>
> hinted_handoff:
>   enabled: true
>   disabled_datacenters:
> - DC1
> - DC2
>   max_window: 3h
>   flush_period: 10s
>   max_file_size: 128mb
>   compression:
> class_name: LZ4Compressor
> parameters:
>   a: b
>
> track_warnings:
>   enabled: true
>   local_read_size:
> warn_threshold: 1mb
> abort_threshold: 10mb
>   coordinator_read_size:
> warn_threshold: 5mb
> abort_threshold: 20mb
>
>
> OR
>
> # I had to rename hint configs as there was 0 consistent naming
> hinted_handoff_enabled: true
> hinted_handoff_disabled_datacenters:
>   - 'DC1'
>   - 'DC2'
> hinted_handoff_max_window: 3h
> hinted_handoff_max_file_size: 128mb
> hinted_handoff_flush_period: 10s
> hinted_handoff_compression:
>   class_name: LZ4Compressor
>   parameters:
> a: b
>
> track_warnings_enabled: true
> track_warnings_local_read_size_warn_threshold: 1mb
> track_warnings_local_read_size_abort_threshold: 10mb
> track_warnings_coordinator_read_size_warn_threshold: 5mb
> track_warnings_coordinator_read_size_abort_threshold: 20mb
>
>
> The main issue I have with flat structure is that we have no way to
> enforce standard naming; if you look at the hint example there were at
> least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can we
> actually maintain that?).  And one of the core reasons track_warnings went
> nested was that warn/abort some times became warn/fail and threshold some
> times was thresholds…. By embracing nested structure we can actually
> enforce consistency, with flat we have no way to maintain consistency.
>
> Additionally by embracing the nested structure we can accept a flat one as
> well (PR in CASSANDRA-17166 shows this working) if users desire it; so we
> get the consistency of nested, and the “grep” benefits of flat.
>
>
> > On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote:
> >
> > If we’re thinking of moving towards nested configuration, then before
> employing the approach further we would ideally consider what a fully
> nested config looks like for the project. Ekaterina has done a lot to clean
> up inconsistent naming, but I would hate to repeat the mistakes of our past
> by evolving the config in a new direction without any coherent overarching
> design.
> >
> > In case anyone missed it in the earlier discussion, this was my attempt
> to prototype a nested config:
> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
> >
> > I don’t have any specific attachment to it, but settling on some
> approximate scheme would be helpful IMO.
> >
> > From: David Capwell 
> > Date: Monday, 29 November 2021 at 20:38
> > To: dev@cassandra.apache.org 
> > Subject: Re: [DISCUSS] Nested YAML configs for new features
> >> What should our default example cassandra.yaml file use (flat or
> nested)?  Currently default shows

Re: [DISCUSS] Nested YAML configs for new features

2021-11-30 Thread bened...@apache.org
I mean that it has been waiting for months, is ready to go, and I don’t want to 
hold you up any longer.

From: Ekaterina Dimitrova 
Date: Tuesday, 30 November 2021 at 13:44
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Nested YAML configs for new features
“
IMO 15234 has sailed – it’s been held up for a long time, and was brought
to this list for discussion with no engagement. Ekaterina is long overdue
being able to commit her work. “


 Sailed? I submitted the patch a week ago for review. Not sure how to
understand this statement. Can elaborate, please?

On Tue, 30 Nov 2021 at 8:09, bened...@apache.org 
wrote:

> The problem with scoping this to “features” is that we end up with at best
> local coherence. The config file as a whole will end up just as incoherent
> through its design evolution as it has historically.
>
> If you take a look at my proposed layout for the overall config, there is
> a “limits” section that specifies thresholds for reporting warnings and
> errors for various scenario. In this case, we probably don’t also want
> per-feature limits? We’re also mixing terminology already, with
> limits/thresholds and fail/abort.
>
> It’s a lot of work to come up with a coherent and intuitive config layout.
> We probably want to at least create some documentation in-tree stipulating
> terminology with respect to plurals, verbs/nouns, and specific terms
> (period, abort, limit, datacenter vs dc, etc), but ideally we would have a
> common end goal for the config file.
>
> > leave non-features to CASSANDRA-15234
>
> IMO 15234 has sailed – it’s been held up for a long time, and was brought
> to this list for discussion with no engagement. Ekaterina is long overdue
> being able to commit her work.
>
>
> From: David Capwell 
> Date: Monday, 29 November 2021 at 23:44
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >  but I would hate to repeat the mistakes of our past by evolving the
> config in a new direction without any coherent overarching design.
>
> At the start I asked to keep the thread local to new features, but to more
> flesh out an “overarching design” maybe we should increase the “desired”
> scope to be “feature” (and leave non-features to CASSANDRA-15234 -
> Standardise config and JVM parameters)?  Aka, do we think the following is
> more ideal (configs scoped to a feature)
>
> hinted_handoff:
>   enabled: true
>   disabled_datacenters:
> - DC1
> - DC2
>   max_window: 3h
>   flush_period: 10s
>   max_file_size: 128mb
>   compression:
> class_name: LZ4Compressor
> parameters:
>   a: b
>
> track_warnings:
>   enabled: true
>   local_read_size:
> warn_threshold: 1mb
> abort_threshold: 10mb
>   coordinator_read_size:
> warn_threshold: 5mb
> abort_threshold: 20mb
>
>
> OR
>
> # I had to rename hint configs as there was 0 consistent naming
> hinted_handoff_enabled: true
> hinted_handoff_disabled_datacenters:
>   - 'DC1'
>   - 'DC2'
> hinted_handoff_max_window: 3h
> hinted_handoff_max_file_size: 128mb
> hinted_handoff_flush_period: 10s
> hinted_handoff_compression:
>   class_name: LZ4Compressor
>   parameters:
> a: b
>
> track_warnings_enabled: true
> track_warnings_local_read_size_warn_threshold: 1mb
> track_warnings_local_read_size_abort_threshold: 10mb
> track_warnings_coordinator_read_size_warn_threshold: 5mb
> track_warnings_coordinator_read_size_abort_threshold: 20mb
>
>
> The main issue I have with flat structure is that we have no way to
> enforce standard naming; if you look at the hint example there were at
> least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can we
> actually maintain that?).  And one of the core reasons track_warnings went
> nested was that warn/abort some times became warn/fail and threshold some
> times was thresholds…. By embracing nested structure we can actually
> enforce consistency, with flat we have no way to maintain consistency.
>
> Additionally by embracing the nested structure we can accept a flat one as
> well (PR in CASSANDRA-17166 shows this working) if users desire it; so we
> get the consistency of nested, and the “grep” benefits of flat.
>
>
> > On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote:
> >
> > If we’re thinking of moving towards nested configuration, then before
> employing the approach further we would ideally consider what a fully
> nested config looks like for the project. Ekaterina has done a lot to clean
> up inconsistent naming, but I would hate to repeat the mistakes of our past
> by evolving the config in a new direction without any coherent overarching
> design.
> >
> > In case anyone missed it in the earlier discussion, this was my attempt
> to prototype a nested config:
> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
> >
> > I don’t have any specific attachment to it, but settling on some
> approximate scheme would be helpful IMO.
> >
> > From

Re: [DISCUSS] Nested YAML configs for new features

2021-11-30 Thread Ekaterina Dimitrova
Thank you for confirming as I misread your email at first :-)
I had a chat with David last week and I don’t think his plan is reworking
of 15234 but incremental improvements on top of it.
Regarding config, after spending time cleaning around and looking more into
detail my only appeal is:
- Centralized management and not 5 places to change things when you add new
config so we are less error-prone
- Documenting things for people who add new config or for our users (I
promised and I will do it for 15234 but it will be good to continue doing
 it with any further changes down the road)
- be careful with breaking changes

Thank you
Ekaterina

On Tue, 30 Nov 2021 at 8:59, bened...@apache.org 
wrote:

> I mean that it has been waiting for months, is ready to go, and I don’t
> want to hold you up any longer.
>
> From: Ekaterina Dimitrova 
> Date: Tuesday, 30 November 2021 at 13:44
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> “
> IMO 15234 has sailed – it’s been held up for a long time, and was brought
> to this list for discussion with no engagement. Ekaterina is long overdue
> being able to commit her work. “
>
>
>  Sailed? I submitted the patch a week ago for review. Not sure how to
> understand this statement. Can elaborate, please?
>
> On Tue, 30 Nov 2021 at 8:09, bened...@apache.org 
> wrote:
>
> > The problem with scoping this to “features” is that we end up with at
> best
> > local coherence. The config file as a whole will end up just as
> incoherent
> > through its design evolution as it has historically.
> >
> > If you take a look at my proposed layout for the overall config, there is
> > a “limits” section that specifies thresholds for reporting warnings and
> > errors for various scenario. In this case, we probably don’t also want
> > per-feature limits? We’re also mixing terminology already, with
> > limits/thresholds and fail/abort.
> >
> > It’s a lot of work to come up with a coherent and intuitive config
> layout.
> > We probably want to at least create some documentation in-tree
> stipulating
> > terminology with respect to plurals, verbs/nouns, and specific terms
> > (period, abort, limit, datacenter vs dc, etc), but ideally we would have
> a
> > common end goal for the config file.
> >
> > > leave non-features to CASSANDRA-15234
> >
> > IMO 15234 has sailed – it’s been held up for a long time, and was brought
> > to this list for discussion with no engagement. Ekaterina is long overdue
> > being able to commit her work.
> >
> >
> > From: David Capwell 
> > Date: Monday, 29 November 2021 at 23:44
> > To: dev@cassandra.apache.org 
> > Subject: Re: [DISCUSS] Nested YAML configs for new features
> > >  but I would hate to repeat the mistakes of our past by evolving the
> > config in a new direction without any coherent overarching design.
> >
> > At the start I asked to keep the thread local to new features, but to
> more
> > flesh out an “overarching design” maybe we should increase the “desired”
> > scope to be “feature” (and leave non-features to CASSANDRA-15234 -
> > Standardise config and JVM parameters)?  Aka, do we think the following
> is
> > more ideal (configs scoped to a feature)
> >
> > hinted_handoff:
> >   enabled: true
> >   disabled_datacenters:
> > - DC1
> > - DC2
> >   max_window: 3h
> >   flush_period: 10s
> >   max_file_size: 128mb
> >   compression:
> > class_name: LZ4Compressor
> > parameters:
> >   a: b
> >
> > track_warnings:
> >   enabled: true
> >   local_read_size:
> > warn_threshold: 1mb
> > abort_threshold: 10mb
> >   coordinator_read_size:
> > warn_threshold: 5mb
> > abort_threshold: 20mb
> >
> >
> > OR
> >
> > # I had to rename hint configs as there was 0 consistent naming
> > hinted_handoff_enabled: true
> > hinted_handoff_disabled_datacenters:
> >   - 'DC1'
> >   - 'DC2'
> > hinted_handoff_max_window: 3h
> > hinted_handoff_max_file_size: 128mb
> > hinted_handoff_flush_period: 10s
> > hinted_handoff_compression:
> >   class_name: LZ4Compressor
> >   parameters:
> > a: b
> >
> > track_warnings_enabled: true
> > track_warnings_local_read_size_warn_threshold: 1mb
> > track_warnings_local_read_size_abort_threshold: 10mb
> > track_warnings_coordinator_read_size_warn_threshold: 5mb
> > track_warnings_coordinator_read_size_abort_threshold: 20mb
> >
> >
> > The main issue I have with flat structure is that we have no way to
> > enforce standard naming; if you look at the hint example there were at
> > least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can
> we
> > actually maintain that?).  And one of the core reasons track_warnings
> went
> > nested was that warn/abort some times became warn/fail and threshold some
> > times was thresholds…. By embracing nested structure we can actually
> > enforce consistency, with flat we have no way to maintain consistency.
> >
> > Additionally by embracing the nested structure we can accept a flat one
> as
> > well (PR in CASSANDR