Let's take this discussion in a different direction: If we add a --legacy <version> argument where we are supporting an old version for those who need/want it but have the (breaking) changes on the default this feels like a compromise - and then we can deprecate the legacy format without impacting innovation. We can also flip this with requiring a flag for the changed format if we feel this is better.
This let's us innovate without breaking anyone. Thoughts? Thanks, German ________________________________ From: Miklosovic, Stefan <stefan.mikloso...@netapp.com> Sent: Thursday, July 13, 2023 8:20 AM To: dev@cassandra.apache.org <dev@cassandra.apache.org> Subject: [EXTERNAL] Re: Changing the output of tooling between majors "Dinesh's message cautions against making "breaking" changes that are likely to break parsing of output by current users (e.g., changes to naming/meaning/" That is 100% correct. So by that logic, changing the output which you grep on to something else will break your scripts if you expect it there. For example, take sstablemetadata command - I know it is not nodetool but it does not matter. This is just an example. Same "problem" can be found in nodetool probably, sstablemetadata just came to my mind first as that is what I hit recently. sstablemetadata write this: Repaired at: 0 Originating host id: d2d12c56-7d9c-49a7-aaef-05bd2633b09e Pending repair: -- Replay positions covered: {CommitLogPosition(segmentId=1689261027905, position=59450)=CommitLogPosition(segmentId=1689261027905, position=60508)} totalColumnsSet: 0 totalRows: 1 Estimated tombstone drop times: Do you see "totalColumsSet" and "totalRows" when all other keys in that ouput (in whole command) are following different format? In this case, it should be "Total columns set" and "Total rows". So when we change it to that, anybody who is grepping "totalRows" will have no output. That is a breaking change to me. His script stopped to work. You are correct and I agree with you completely that STRICT ADDITIONS (what I was suggesting) are fine because we are not breaking anything to anybody. So here, if I want to change this, by what Dinesh says, (we change the naming and we break it), I need to offer JSON / YAML alternative to what sstablemetadata prints currently. (might be as well nodetool, just an example). ________________________________________ From: C. Scott Andreas <sc...@paradoxica.net> Sent: Thursday, July 13, 2023 17:01 To: dev@cassandra.apache.org Cc: dev@cassandra.apache.org Subject: Re: Changing the output of tooling between majors NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. Dinesh's message cautions against making "breaking" changes that are likely to break parsing of output by current users (e.g., changes to naming/meaning/position of existing fields vs. adding new ones). I don't read his message as saying that any change to nodetool output is conditional on offering a JSON/YAML representation, though. What are some changes that you'd like to make? – Scott On Jul 13, 2023, at 7:44 AM, "Miklosovic, Stefan" <stefan.mikloso...@netapp.com> wrote: For example Dinesh said this: "Until nodetool can support JSON as output format for all interaction and there is a significant adoption in the user community, I would strongly advise against making breaking changes to the CLI output." That is where I get the need to have a JSON output in order to fix a typo from. That is if we look at fixing a typo as a breaking change. Which I would say it is as if somebody is "greping" it and it is not there, it will break. Do you understand that the same way or am I interpreting that wrong? ________________________________________ From: C. Scott Andreas <sc...@paradoxica.net> Sent: Thursday, July 13, 2023 16:35 To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> Cc: dev Subject: Re: Changing the output of tooling between majors NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. "From what I see you guys want to condition any change by offering json/yaml as well." I don't think I've seen a proposal to block changes to nodetool output on machine-parseable formats in this thread. Additions of new delimited fields to nodetool output are mostly straightforward. Changes to fields that exist today are likely to cause problems - as Josh mentions. These seem best to take on a case-by-case basis rather than trying to hammer out an abstract policy. What changes would you like to make? I do think we will have difficulty evolving output formats of text-based Cassandra tooling until we offer machine-parseable output formats. – Scott On Jul 13, 2023, at 6:39 AM, Josh McKenzie <jmcken...@apache.org> wrote: I just find it ridiculous we can not change "someProperty: 10" to "Some Property: 10" and there is so much red tape about that. Well, we're talking about programmatic parsing here. This feels like complaining about a compiler that won't let you build if you're missing a ; We can change it, but that doesn't mean the aggregate cost/benefit across our entire ecosystem is worth it. The value of correcting a typo is pretty small, and the cost for everyone downstream is not. This is why we should spellcheck things in API's before we release them. :) On Wed, Jul 12, 2023, at 2:45 PM, Miklosovic, Stefan wrote: Eric, I appreciate your feedback on this, especially more background about where you are comming from in the second paragraph. I think we are on the same page afterall. I definitely understand that people are depending on this output and we need to be careful. That is why I propose to change it only each major. What I feel is that everybody's usage / expectations is little bit different and outputs of the commands are very diverse and it is hard to balance this so everybody is happy. I am trying to come up with a solution which would not change the most important commands unnecessarily while also having some free room to tweak the existing commands where we see it appropriate. I just find it ridiculous we can not change "someProperty: 10" to "Some Property: 10" and there is so much red tape about that. If I had to summarize this whole discussion, the best conclustion I can think of is to not change what is used the most (this would probably need to be defined more explicitly) and if we have to change something else we better document that extensively and provide json/yaml for people to be able to divorce from the parsing of human-readable format (which probably all agree should not happen in the first place). What I am afraid of is that in order to satisfy these conditions, if, for example, we just want to fix a typo or the format of a key of some value, the we would need to deliver JSON/YAML format as well if there is not any yet and that would mean that the change of such triviality would require way more work in terms of the implementation of JSON/YAML format output. Some commands are quite sophisticated and I do not want to be blocked to change a field in human-readable out because providing corresponding JSON/YAML format would be gigantic portion of the work itself. From what I see you guys want to condition any change by offering json/yaml as well and I dont know if that is just not too much. ________________________________________ From: Eric Evans <eev...@wikimedia.org<mailto:eev...@wikimedia.org>> Sent: Wednesday, July 12, 2023 19:48 To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org><mailto:dev@cassandra.apache.org> Subject: Re: Changing the output of tooling between majors You don't often get email from eev...@wikimedia.org<mailto:eev...@wikimedia.org><mailto:eev...@wikimedia.org>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. On Wed, Jul 12, 2023 at 1:54 AM Miklosovic, Stefan <stefan.mikloso...@netapp.com<mailto:stefan.mikloso...@netapp.com><mailto:stefan.mikloso...@netapp.com<mailto:stefan.mikloso...@netapp.com>>> wrote: I agree with Jackson that having a different output format (JSON/YAML) in order to be able to change the default output resolves nothing in practice. As Jackson said, "operators who maintain these scripts aren’t going to re-write them just because a better way of doing them is newly available, usually they’re too busy with other work and will keep using those old scripts until they stop working". This is true. If this approach is adopted, what will happen in practice is that we change the output and we provide a different format and then a user detects this change because his scripts changed. As he has existing solution in place which parses the text from human-readable output, he will try to fix that, he will not suddenly convert all scripting he has to parsing JSON just because we added it. Starting with JSON parsing might be done if he has no scripting in place yet but then we would not cover already existing deployments. I think this is quite an extreme conclusion to draw. If tooling had stable, structured output formats, and if we documented an expectation that human-readable console output was unstable, then presumably it would be safe to assume that any new scripters would avail themselves of the stable formats, or expect breakage later. I think it's also fair to assume that at least some people would spend the time to convert their scripts, particularly if forced to revisit them (for example, after a breaking change to console output). As someone who manages several large-scale mission-critical Cassandra clusters under constrained resources, this is how I would approach it. TL;DR Don't let perfect by the enemy of good<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPerfect_is_the_enemy_of_good&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cc64a38a8cbb04d68807908db83b4d34a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638248584902482700%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vHDB8PBizpHJLMRh%2BDg%2F8bKIOb2IyKMxF1p1lsqyDwE%3D&reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPerfect_is_the_enemy_of_good&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cc64a38a8cbb04d68807908db83b4d34a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638248584902482700%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vHDB8PBizpHJLMRh%2BDg%2F8bKIOb2IyKMxF1p1lsqyDwE%3D&reserved=0><https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPerfect_is_the_enemy_of_good&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cc64a38a8cbb04d68807908db83b4d34a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638248584902482700%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vHDB8PBizpHJLMRh%2BDg%2F8bKIOb2IyKMxF1p1lsqyDwE%3D&reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPerfect_is_the_enemy_of_good&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cc64a38a8cbb04d68807908db83b4d34a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638248584902482700%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vHDB8PBizpHJLMRh%2BDg%2F8bKIOb2IyKMxF1p1lsqyDwE%3D&reserved=0<https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good>>>> [ ... ] For that reason, what we could agree on is that we would never change the output for "tier 1" commands and if we ever changed something, it would be STRICT ADDITIONS only. In other words, everything it printed, it would continue to print that for ever. Only new lines could be introduced. We need to do this because Cassandra is evolving over time and we need to keep the output aligned as new functionality appears. But the output would be backward compatible. Plus, we are talking about majors only. The only reason we would ever changed the output on "tier 1" commands, if is not an addition, is the fix of the typo in the existing output. This would again happened only in majors. All other output for all other commands might be changed but their output will not need to be strictly additive. This would again happen only between majors. What is you opinion about this? To be clear about where I'm coming from: I'm not arguing against you or anyone else making changes like these (in major versions, or otherwise). If —for example— we had console output that was incorrect, incomplete, or obviously misleading, I'd absolutely want to see that fixed, script breakage be damned. All I want is for folks to recognize the problems this sort of thing can create, and show a bit of empathy before submitting a change. For operators on the receiving end, it can be really frustrating, especially when there is no normative change (i.e. it's in service of aesthetics). -- Eric Evans<mailto:eev...@wikimedia.org<mailto:eev...@wikimedia.org>> Staff SRE, Data Persistence Wikimedia Foundation