> > We might anyway want to introduce e.g. a LIKE filtering option to > find/discover flattened config parameters?
+100 Le lun. 29 nov. 2021 à 17:51, [email protected] <[email protected]> a écrit : > Maybe we can make our query language more expressive 😊 > > We might anyway want to introduce e.g. a LIKE filtering option to > find/discover flattened config parameters? > > From: Benjamin Lerer <[email protected]> > Date: Monday, 29 November 2021 at 16:41 > To: [email protected] <[email protected]> > Subject: Re: [DISCUSS] Nested YAML configs for new features > > > > I don’t think it’s necessarily a requirement that we use the flattened > > version in vtables. At the very least we can make use of sets, lists, > etc. > > But we can probably also use UDTs if this improves clarity. > > > In my opinion part of the issue is on the query side. How do we select a > nested set or a specific set easily? UDTs are not great for this type of > queries. For collection we can use CONTAINS and element or range selection > but insertion might be the problem. > > Le lun. 29 nov. 2021 à 17:23, Bowen Song <[email protected]> a écrit : > > > In ElasticSearch, the default is a flattened format with almost all > > lines commented out. See > > > > > https://github.com/elastic/elasticsearch/blob/master/distribution/src/config/elasticsearch.yml > > > > I guess they chose to do that because user can uncomment individual > > lines to make changes. In a structured config file, the user will have > > to uncomment all lines containing the parent keys to get it work. For > > example, if someone wants to set the config keyABB to a non-default > > value, they will have to correctly uncomment 3 lines: keyA, keyAB and > > keyABB, which can be annoying and could easily maker a mistake. If any > > of the first two keys is not uncommented, the YAML file will still be > > valid but the config like keyX.keyAB.keyABB might just get silently > > ignored by the database. > > > > keyX: > > keyY: > > keyZ: value > > # keyA: > > # keyAA: > > # key AAA: value > > # keyAB: > > # keyABA: value > > # keyABB: value > > > > On 29/11/2021 15:54, Benjamin Lerer wrote: > > > I do not think that supporting both options is an issue. The settings > > > virtual table would have to use the flattened version. > > > If we support both formats, the question would be: what should be the > one > > > used by default in the configuration file? > > > > > > Le ven. 26 nov. 2021 à 15:40,[email protected] <[email protected] > > > > a > > > écrit : > > > > > >> This is the approach I favour for config files also. We had a much > less > > >> engaged discussion on this topic only a few months ago, so glad to see > > more > > >> people getting involved now. > > >> > > >> I would however personally prefer to see the configuration file slowly > > >> deprecated (if perhaps never retired), in favour of virtual tables, so > > that > > >> operators may easily set configurations for the entire cluster. > Ideally > > it > > >> would be possible to specify configuration per cluster, per DC and per > > >> node, with the most specific configuration applying I would like to > see > > a > > >> similar hierarchy for Keyspace, Table and Per-Query options. Ideally > > only > > >> the barest minimum number of options would be necessary to supply in a > > >> config file, and only on first launch – seed nodes, for instance. > > >> > > >> So whatever design we employ here, we should IMO be aiming for it to > be > > >> compatible with a CQL representation also. > > >> > > >> > > >> From: Bowen Song<[email protected]> > > >> Date: Wednesday, 24 November 2021 at 18:15 > > >> To:[email protected] <[email protected]> > > >> Subject: Re: [DISCUSS] Nested YAML configs for new features > > >> Since you mentioned ElasticSearch, I'm actually pretty happy with > their > > >> config file syntax. It allows the user to completely flatten out the > > >> entire config file. To give people who isn't familiar with > ElasticSearch > > >> an idea, here is a config file we use: > > >> > > >> cluster.name: foobar > > >> > > >> node.remote_cluster_client: false > > >> node.name: "foo.example.com" > > >> node.master: true > > >> node.data: true > > >> node.ingest: true > > >> node.ml: false > > >> > > >> xpack.ml.enabled: false > > >> xpack.security.enabled: false > > >> xpack.security.audit.enabled: false > > >> xpack.watcher.enabled: false > > >> > > >> action.auto_create_index: "+.,-*" > > >> > > >> network.host: _global_ > > >> > > >> discovery.zen.hosts_provider: file > > >> discovery.zen.minimum_master_nodes: 2 > > >> > > >> http.publish_host: "foo.example.com" > > >> http.publish_port: 443 > > >> http.bind_host: 127.0.0.1 > > >> > > >> transport.publish_host: "bar.example.com" > > >> transport.bind_host: 0.0.0.0 > > >> > > >> indices.fielddata.cache.size: 1GB > > >> indices.breaker.total.use_real_memory: false > > >> > > >> path.logs: /var/log/elasticsearch > > >> path.data: /var/lib/elasticsearch/data > > >> > > >> As you can see we can use the flat (grep-able) syntax for everything. > > >> This is also human readable because we can group options together by > > >> inserting empty lines between them. > > >> > > >> The equivalent of the above in a structured syntax will be: > > >> > > >> cluster: > > >> name: foobar > > >> > > >> node: > > >> remote_cluster_client: false > > >> name: "foo.example.com" > > >> master: true > > >> data: true > > >> ingest: true > > >> ml: false > > >> > > >> xpack: > > >> ml: > > >> enabled: false > > >> security: > > >> enabled: false > > >> audit: > > >> enabled: false > > >> watcher: > > >> enabled: false > > >> > > >> action: > > >> auto_create_index: "+.,-*" > > >> > > >> network: > > >> host: _global_ > > >> > > >> discovery: > > >> zen: > > >> hosts_provider: file > > >> minimum_master_nodes: 2 > > >> > > >> http: > > >> publish_host: "foo.example.com" > > >> publish_port: 443 > > >> bind_host: 127.0.0.1 > > >> > > >> transport: > > >> publish_host: "bar.example.com" > > >> bind_host: 0.0.0.0 > > >> > > >> indices: > > >> fielddata: > > >> cache: > > >> size: 1GB > > >> indices: > > >> breaker: > > >> total: > > >> use_real_memory: false > > >> > > >> path: > > >> logs: /var/log/elasticsearch > > >> data: /var/lib/elasticsearch/data > > >> > > >> This may be easier to read for some people, but it is a total > nightmare > > >> for "grep" - so many keys have identical names, such as "enabled". > > >> > > >> Also, for the virtual tables, it would be a lot easier to represent > > >> individual values in a virtual table when the config is flat and keys > > >> are unique. The virtual tables would need to either support the > encoding > > >> and decoding of the structured config into a flat structure, or use > JSON > > >> encoded string value. The use of JSON would make querying individual > > >> value much harder. > > >> > > >> On 22/11/2021 16:16, Joseph Lynch wrote: > > >>> Isn't one of the primary reasons to have a YAML configuration instead > > >>> of a properties file is to allow typed and structured (implies > nested) > > >>> configuration? I think it makes a lot of sense to group related > > >>> configuration options (e.g. a feature) into a typed class when we're > > >>> talking about more than one or two related options. > > >>> > > >>> It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs > to > > >>> period encoded key->value pairs when required (usually when providing > > >>> a property or override layer), Spring and Elasticsearch yamls both > > >>> come to mind. It seems pretty reasonable to support dot encoding and > > >>> decoding, for example {"a": {"b": 12}} -> '"a.b": 12'. > > >>> > > >>> Regarding quickly telling what configuration a node is running I > think > > >>> we should lean on virtual tables for "what is the current > > >>> configuration" now that we have them, as others have said the written > > >>> cassandra.yaml is not necessarily the current configuration ... and > > >>> also grep -C or -A exist for this reason. > > >>> > > >>> -Joey > > >>> > > >>> On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer<[email protected]> > > >> wrote: > > >>>> I do not have a strong opinion for one or the other but wanted to > > raise > > >> the > > >>>> issue I see with the "Settings" virtual table. > > >>>> > > >>>> Currently the "Settings" virtual table converts nested options into > > flat > > >>>> options using a "_" separator. For those options it allows a user to > > >> query > > >>>> the all set of options through some hack. > > >>>> If we decide to move to more nesting (more than one level), it seems > > to > > >> me > > >>>> that we need to change the way this table is behaving and how we can > > >> query > > >>>> its data. > > >>>> > > >>>> We would need to start using "." as a nesting separator to ensure > that > > >>>> things are consistent between the configuration and the table and > add > > >>>> support for LIKE restrictions for filtering queries to allow > operators > > >> to > > >>>> be able to select the precise set of settings that the operator is > > >> looking > > >>>> for. > > >>>> > > >>>> Doing so is not really complicated in itself but might impact some > > >> users. > > >>>> Le ven. 19 nov. 2021 à 22:39, David Capwell<[email protected] > > .invalid> > > >> a > > >>>> écrit : > > >>>> > > >>>>>> it is really handy to grep > > >>>>>> cassandra.yaml on some config key and you know the value > instantly. > > >>>>> You can still do that > > >>>>> > > >>>>> $ grep -A2 coordinator_read_size conf/cassandra.yaml > > >>>>> # coordinator_read_size: > > >>>>> # warn_threshold_kb: 0 > > >>>>> # abort_threshold_kb: 0 > > >>>>> > > >>>>> I was also arguing we should support nested and flat, so if your > > infra > > >>>>> works better with flat then you could use > > >>>>> > > >>>>> track_warnings.coordinator_read_size.warn_threshold_kb: 0 > > >>>>> track_warnings.coordinator_read_size.abort_threshold_kb: 0 > > >>>>> > > >>>>>> On Nov 19, 2021, at 1:34 PM, David Capwell<[email protected]> > > >> wrote: > > >>>>>>> With the flat structure it turns into properties file - would it > be > > >>>>>>> possible to support both formats - nested yaml and flat > properties? > > >>>>>> For majority of our configs yes, but there are a subset where flat > > >>>>> properties is annoying > > >>>>>> hinted_handoff_disabled_datacenters - set type, so you could do > > >>>>> hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to > > deal > > >>>>> with separators as the format doesn’t support > > >>>>>> seed_provider.parameters - this is a map type… so would need to do > > >>>>> something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we > > >> special > > >>>>> case maps as dynamic fields? Then seed_provider.parameters.a=a? > We > > >> have > > >>>>> ParameterizedClass all over the code > > >>>>>> So, as long as we define how to deal with java collections; we > could > > >> in > > >>>>> theory support properties files (not arguing for that in this > thread) > > >> as > > >>>>> well as system properties. > > >>>>>>> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski < > > >>>>> [email protected]> wrote: > > >>>>>>> With the flat structure it turns into properties file - would it > be > > >>>>>>> possible to support both formats - nested yaml and flat > properties? > > >>>>>>> > > >>>>>>> > > >>>>>>> - - -- --- ----- -------- ------------- > > >>>>>>> Jacek Lewandowski > > >>>>>>> > > >>>>>>> > > >>>>>>> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe < > > >>>>> [email protected]> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> If it's nested, "track_warnings" would still work if you're > > grepping > > >>>>> around > > >>>>>>>> vim or less. > > >>>>>>>> > > >>>>>>>> I'd have to concede the point about grep output, although there > > are > > >>>>> tools > > >>>>>>>> likehttps://github.com/kislyuk/yq that could probably be bent > to > > >> do > > >>>>> what > > >>>>>>>> you want. > > >>>>>>>> > > >>>>>>>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic < > > >>>>>>>> [email protected]> wrote: > > >>>>>>>> > > >>>>>>>>> Hi David, > > >>>>>>>>> > > >>>>>>>>> while I do not oppose nested structure, it is really handy to > > grep > > >>>>>>>>> cassandra.yaml on some config key and you know the value > > instantly. > > >>>>>>>>> This is not possible when it is nested (easily & fastly) as it > is > > >> on > > >>>>>>>>> two lines. Or maybe my grepping is just not advanced enough to > > >> cover > > >>>>>>>>> this case? If it is flat, I can just grep "track_warnings" and > I > > >> have > > >>>>>>>>> them all. > > >>>>>>>>> > > >>>>>>>>> Can you elaborate on your last bullet point? Parsing layer ... > > >> What do > > >>>>>>>>> you mean specifically? > > >>>>>>>>> > > >>>>>>>>> Thanks > > >>>>>>>>> > > >>>>>>>>> On Fri, 19 Nov 2021 at 19:36, David Capwell<[email protected] > > > > >>>>> wrote: > > >>>>>>>>>> This has been brought up in a few tickets, so pushing to the > dev > > >>>>> list. > > >>>>>>>>>> CASSANDRA-15234 - Standardise config and JVM parameters > > >>>>>>>>>> CASSANDRA-16896 - hard/soft limits for queries > > >>>>>>>>>> CASSANDRA-17147 - Guardrails prototype > > >>>>>>>>>> > > >>>>>>>>>> In short, do we as a project wish to move "new features" into > > >> nested > > >>>>>>>>>> YAML when the feature has "enough" to justify the nesting? I > > >> would > > >>>>>>>>>> really like to focus this discussion on new features rather > than > > >>>>>>>>>> retroactively grouping (leaving that to CASSANDRA-15234), as > > >> there is > > >>>>>>>>>> already a place to talk about that. > > >>>>>>>>>> > > >>>>>>>>>> To get things started, let's start with the track-warning > > feature > > >>>>>>>>>> (hard/soft limits for queries), currently the configs look as > > >> follows > > >>>>>>>>>> (assuming 15234) > > >>>>>>>>>> > > >>>>>>>>>> track_warnings: > > >>>>>>>>>> enabled: true > > >>>>>>>>>> coordinator_read_size: > > >>>>>>>>>> warn_threshold: 10kb > > >>>>>>>>>> abort_threshold: 1mb > > >>>>>>>>>> local_read_size: > > >>>>>>>>>> warn_threshold: 10kb > > >>>>>>>>>> abort_threshold: 1mb > > >>>>>>>>>> row_index_size: > > >>>>>>>>>> warn_threshold: 100mb > > >>>>>>>>>> abort_threshold: 1gb > > >>>>>>>>>> > > >>>>>>>>>> or should this be "flat" > > >>>>>>>>>> > > >>>>>>>>>> track_warnings_enabled: true > > >>>>>>>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb > > >>>>>>>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb > > >>>>>>>>>> track_warnings_local_read_size_warn_threshold: 10kb > > >>>>>>>>>> track_warnings_local_read_size_abort_threshold: 1mb > > >>>>>>>>>> track_warnings_row_index_size_warn_threshold: 100mb > > >>>>>>>>>> track_warnings_row_index_size_abort_threshold: 1gb > > >>>>>>>>>> > > >>>>>>>>>> For me I prefer nested for a few reasons > > >>>>>>>>>> * easier to enforce consistency as the configs can use shared > > >> types; > > >>>>>>>>>> in the track warnings patch I had mismatches cross configs > (warn > > >> vs > > >>>>>>>>>> warns, fail vs abort, etc.) before going nested, now > everything > > >>>>> reuses > > >>>>>>>>>> the same types > > >>>>>>>>>> * even though it is longer, things can be more clear how they > > are > > >>>>>>>> related > > >>>>>>>>>> * parsing layer can add support for mixed or purely flat > > >> depending on > > >>>>>>>>>> user preference (example: > > >>>>>>>>>> track_warnings.row_index_size.abort_threshold, using the '.' > > >> notation > > >>>>>>>>>> to represent nested structures) > > >>>>>>>>>> > > >>>>>>>>>> Thoughts? > > >>>>>>>>>> > > >>>>>>>>>> > > >> --------------------------------------------------------------------- > > >>>>>>>>>> To unsubscribe,e-mail:[email protected] > > >>>>>>>>>> For additional commands,e-mail:[email protected] > > >>>>>>>>>> > > >> --------------------------------------------------------------------- > > >>>>>>>>> To unsubscribe,e-mail:[email protected] > > >>>>>>>>> For additional commands,e-mail:[email protected] > > >>>>>>>>> > > >>>>>>>>> > > >>>>> > --------------------------------------------------------------------- > > >>>>> To unsubscribe,e-mail:[email protected] > > >>>>> For additional commands,e-mail:[email protected] > > >>>>> > > >>>>> > > >>> --------------------------------------------------------------------- > > >>> To unsubscribe,e-mail:[email protected] > > >>> For additional commands,e-mail:[email protected] > > >>> >
