Re: Changing the output of tooling between majors

Miklosovic, Stefan Tue, 11 Jul 2023 23:54:25 -0700

I agree with Jackson that having a different output format (JSON/YAML) in order 
to be able to change the default output resolves nothing in practice.


As Jackson said, "operators who maintain these scripts aren’t going to re-write 
them just because a better way of doing them is newly available, usually 
they’re too busy with other work and will keep using those old scripts until 
they stop working".

This is true. If this approach is adopted, what will happen in practice is that 
we change the output and we provide a different format and then a user detects 
this change because his scripts changed. As he has existing solution in place 
which parses the text from human-readable output, he will try to fix that, he 
will not suddenly convert all scripting he has to parsing JSON just because we 
added it. Starting with JSON parsing might be done if he has no scripting in 
place yet but then we would not cover already existing deployments.

I asked users how they see it in this thread (1). I think that we were doing a 
lot of assumptions how people view this problem and I am glad that we have 
actual data we can back our solution with, rather than guestimating it. I will 
try to summarize it here and I will try to be as objective as possible:

1) Parsing JSON / YAML is harder to parse in vanilla POSIX shell. There needs 
to be additional tooling provided to parse it (jq, for example), which needs 
additional package and tooling installation. One person has zero interest in 
JSON/YAML parsing.
2) CQL is even harder to parse as it is not visually so "brief" and it is not a 
good output to parse for machines. CQL shell is used for another purpose 
(mostly human interaction).
3) Nobody actually said they are parsing it directly from JMX (which is 
understandable)

People are in general not opposing the idea of changing the output as such, 
what they are concerned about is the deployment and testing phase, multiplied 
by number of nodes.

For this reason, they seem to think that clearly documented changes which 
happen only on majors are the best compromise. On one hand they do not want to 
have frequent changes, on the other hand they also do not want to parse it from 
different format unless they are absolutely forced to do so.

I think that providing JSON/YAML format is nice addition, but it is not a 
strict requirement.

The people who participated in that survey have also mentioned the same group 
of command they parse the output from, these are:

info
netstats
status
version
ring
tpstats

If you compare these commands with the last group of comamnds in email here 
(2), at the bottom, you see that I was more or less right when I identified 
that only these commands are worth to parse mechanically. There is clearly a 
group of commands whose output is so important and so frequently queried that 
the change of their output would be the most invasive (and so least desirable).

For that reason, what we could agree on is that we would never change the 
output for "tier 1" commands and if we ever changed something, it would be 
STRICT ADDITIONS only. In other words, everything it printed, it would continue 
to print that for ever. Only new lines could be introduced. We need to do this 
because Cassandra is evolving over time and we need to keep the output aligned 
as new functionality appears. But the output would be backward compatible. 
Plus, we are talking about majors only.

The only reason we would ever changed the output on "tier 1" commands, if is 
not an addition, is the fix of the typo in the existing output. This would 
again happened only in majors.

All other output for all other commands might be changed but their output will 
not need to be strictly additive. This would again happen only between majors.

What is you opinion about this?

Regards

(1) https://lists.apache.org/thread/drrpskmoyd2t4tcyk6jgx52y8fhhtjt6
(2) https://lists.apache.org/thread/2w5pdd4ncsc8s3qz0fbw2rkgy30ky4r6

________________________________________
From: Fleming, Jackson <[email protected]>
Sent: Tuesday, July 11, 2023 1:06
To: [email protected]
Subject: Re: Changing the output of tooling between majors

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



We use Nodetool in scripts sparsely, in my opinion trying to programmatically 
parse the human readable output should be avoided as much as possible, it’s 
usually leads to implementations that are brittle.

I certainly agree you don’t want to make these kinds of changes in 3.11 or 4.x 
(and I don’t think that’s what Stefan was suggesting), but I don’t necessarily 
agree that you can’t make these kinds of changes in major versions. Chasing 
compatibility like this seems like a deep rabbit hole one could possibly go 
down, I personally don’t see it as unreasonable for commands that are designed 
to be read by humans to be updated over time to improve readability, as that is 
the purpose of those commands. While people script against that output I don’t 
think anyone is going to say it’s an official API, the project also makes no 
public commitment to that either.

If the proposal is to treat Nodetool input and output like a contract/API, it’d 
be great for a formal specification, or at least the documentation to be 
updated to cover what users should expect as output from Nodetool, if the 
project is going to such effort to maintain a specification, why not make it 
official? That way the maintainers of scripts have a fighting chance of finding 
incompatibilities before upgrading their infrastructure and the project could 
make these kinds of changes and provide a mechanism for users to validate.

Currently the argument could be made that there’s no guarantee about Nodetool 
output since it’s not actually written down anywhere official outside the 
codebase.

Isn’t this one of the reasons Cassandra maintains the NEWS and CHANGES files in 
the repo, and follows semantic versioning, to communicate potentially breaking 
changes as clearly as possible? Surely a message like (but with some more 
detail) “Nodetool command x has had its human readable output restructured, 
item y was removed/renamed to z” would suffice.

Not sure if you can deprecate the human readable output without generating a 
lot of noise for the user, and if it’s being parsed by a bash script, the user 
would never see it anyway, but sounds like that’s what the project needs.

To the note about having users migrate over to more machine friendly output 
types (JSON etc), in my experience the operators who maintain these scripts 
aren’t going to re-write them just because a better way of doing them is newly 
available, usually they’re too busy with other work and will keep using those 
old scripts until they stop working, so in my view it’s not really a solution 
to this problem.

Regards,

Jackson

From: Eric Evans <[email protected]>
Date: Tuesday, 11 July 2023 at 4:14 am
To: [email protected] <[email protected]>
Subject: Re: Changing the output of tooling between majors
You don't often get email from [email protected]. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




On Sun, Jul 9, 2023 at 9:10 PM Dinesh Joshi 
<[email protected]<mailto:[email protected]>> wrote:
On Jul 8, 2023, at 8:43 AM, Miklosovic, Stefan 
<[email protected]<mailto:[email protected]>> wrote:

If we are providing CQL / JSON / YAML for couple years, I do not believe that 
the argument "lets not break it for folks in nodetool" is still relevant. CQL 
output is there from times of 4.0 at least (at least!) and YAML / JSON is also 
not something completely new. It is not like we are suddenly forcing people to 
change their habits, there was enough time to update the stuff to CQL / json / 
yaml etc ...

What % of Cassandra users are using 4.0+? Operators who upgrade to 4.0 and 
beyond may still use their existing scripts. Therefore keeping things stable is 
important. Until nodetool can support JSON as output format for all interaction 
and there is a significant adoption in the user community, I would strongly 
advise against making breaking changes to the CLI output.

+1

--
Eric Evans
[email protected]<mailto:[email protected]>

Re: Changing the output of tooling between majors

Reply via email to