I agree with Jackson that having a different output format (JSON/YAML) in order to be able to change the default output resolves nothing in practice.
As Jackson said, "operators who maintain these scripts aren’t going to re-write them just because a better way of doing them is newly available, usually they’re too busy with other work and will keep using those old scripts until they stop working". This is true. If this approach is adopted, what will happen in practice is that we change the output and we provide a different format and then a user detects this change because his scripts changed. As he has existing solution in place which parses the text from human-readable output, he will try to fix that, he will not suddenly convert all scripting he has to parsing JSON just because we added it. Starting with JSON parsing might be done if he has no scripting in place yet but then we would not cover already existing deployments. I asked users how they see it in this thread (1). I think that we were doing a lot of assumptions how people view this problem and I am glad that we have actual data we can back our solution with, rather than guestimating it. I will try to summarize it here and I will try to be as objective as possible: 1) Parsing JSON / YAML is harder to parse in vanilla POSIX shell. There needs to be additional tooling provided to parse it (jq, for example), which needs additional package and tooling installation. One person has zero interest in JSON/YAML parsing. 2) CQL is even harder to parse as it is not visually so "brief" and it is not a good output to parse for machines. CQL shell is used for another purpose (mostly human interaction). 3) Nobody actually said they are parsing it directly from JMX (which is understandable) People are in general not opposing the idea of changing the output as such, what they are concerned about is the deployment and testing phase, multiplied by number of nodes. For this reason, they seem to think that clearly documented changes which happen only on majors are the best compromise. On one hand they do not want to have frequent changes, on the other hand they also do not want to parse it from different format unless they are absolutely forced to do so. I think that providing JSON/YAML format is nice addition, but it is not a strict requirement. The people who participated in that survey have also mentioned the same group of command they parse the output from, these are: info netstats status version ring tpstats If you compare these commands with the last group of comamnds in email here (2), at the bottom, you see that I was more or less right when I identified that only these commands are worth to parse mechanically. There is clearly a group of commands whose output is so important and so frequently queried that the change of their output would be the most invasive (and so least desirable). For that reason, what we could agree on is that we would never change the output for "tier 1" commands and if we ever changed something, it would be STRICT ADDITIONS only. In other words, everything it printed, it would continue to print that for ever. Only new lines could be introduced. We need to do this because Cassandra is evolving over time and we need to keep the output aligned as new functionality appears. But the output would be backward compatible. Plus, we are talking about majors only. The only reason we would ever changed the output on "tier 1" commands, if is not an addition, is the fix of the typo in the existing output. This would again happened only in majors. All other output for all other commands might be changed but their output will not need to be strictly additive. This would again happen only between majors. What is you opinion about this? Regards (1) https://lists.apache.org/thread/drrpskmoyd2t4tcyk6jgx52y8fhhtjt6 (2) https://lists.apache.org/thread/2w5pdd4ncsc8s3qz0fbw2rkgy30ky4r6 ________________________________________ From: Fleming, Jackson <jackson.flem...@netapp.com> Sent: Tuesday, July 11, 2023 1:06 To: dev@cassandra.apache.org Subject: Re: Changing the output of tooling between majors NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. We use Nodetool in scripts sparsely, in my opinion trying to programmatically parse the human readable output should be avoided as much as possible, it’s usually leads to implementations that are brittle. I certainly agree you don’t want to make these kinds of changes in 3.11 or 4.x (and I don’t think that’s what Stefan was suggesting), but I don’t necessarily agree that you can’t make these kinds of changes in major versions. Chasing compatibility like this seems like a deep rabbit hole one could possibly go down, I personally don’t see it as unreasonable for commands that are designed to be read by humans to be updated over time to improve readability, as that is the purpose of those commands. While people script against that output I don’t think anyone is going to say it’s an official API, the project also makes no public commitment to that either. If the proposal is to treat Nodetool input and output like a contract/API, it’d be great for a formal specification, or at least the documentation to be updated to cover what users should expect as output from Nodetool, if the project is going to such effort to maintain a specification, why not make it official? That way the maintainers of scripts have a fighting chance of finding incompatibilities before upgrading their infrastructure and the project could make these kinds of changes and provide a mechanism for users to validate. Currently the argument could be made that there’s no guarantee about Nodetool output since it’s not actually written down anywhere official outside the codebase. Isn’t this one of the reasons Cassandra maintains the NEWS and CHANGES files in the repo, and follows semantic versioning, to communicate potentially breaking changes as clearly as possible? Surely a message like (but with some more detail) “Nodetool command x has had its human readable output restructured, item y was removed/renamed to z” would suffice. Not sure if you can deprecate the human readable output without generating a lot of noise for the user, and if it’s being parsed by a bash script, the user would never see it anyway, but sounds like that’s what the project needs. To the note about having users migrate over to more machine friendly output types (JSON etc), in my experience the operators who maintain these scripts aren’t going to re-write them just because a better way of doing them is newly available, usually they’re too busy with other work and will keep using those old scripts until they stop working, so in my view it’s not really a solution to this problem. Regards, Jackson From: Eric Evans <john.eric.ev...@gmail.com> Date: Tuesday, 11 July 2023 at 4:14 am To: dev@cassandra.apache.org <dev@cassandra.apache.org> Subject: Re: Changing the output of tooling between majors You don't often get email from john.eric.ev...@gmail.com. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. On Sun, Jul 9, 2023 at 9:10 PM Dinesh Joshi <djo...@apache.org<mailto:djo...@apache.org>> wrote: On Jul 8, 2023, at 8:43 AM, Miklosovic, Stefan <stefan.mikloso...@netapp.com<mailto:stefan.mikloso...@netapp.com>> wrote: If we are providing CQL / JSON / YAML for couple years, I do not believe that the argument "lets not break it for folks in nodetool" is still relevant. CQL output is there from times of 4.0 at least (at least!) and YAML / JSON is also not something completely new. It is not like we are suddenly forcing people to change their habits, there was enough time to update the stuff to CQL / json / yaml etc ... What % of Cassandra users are using 4.0+? Operators who upgrade to 4.0 and beyond may still use their existing scripts. Therefore keeping things stable is important. Until nodetool can support JSON as output format for all interaction and there is a significant adoption in the user community, I would strongly advise against making breaking changes to the CLI output. +1 -- Eric Evans john.eric.ev...@gmail.com<mailto:john.eric.ev...@gmail.com>