Re: Changing the output of tooling between majors

German Eichberger via dev Thu, 13 Jul 2023 09:08:57 -0700

Let's take this discussion in a different direction: If we add a --legacy 
<version> argument where we are supporting an old version for those who 
need/want it but have the (breaking) changes on the default this feels like a 
compromise - and then we can deprecate the legacy format without impacting 
innovation. We can also flip this with requiring a flag for the changed format 
if we feel this is better.

This let's us innovate without breaking anyone. Thoughts?

Thanks,
German

________________________________
From: Miklosovic, Stefan <[email protected]>
Sent: Thursday, July 13, 2023 8:20 AM
To: [email protected] <[email protected]>
Subject: [EXTERNAL] Re: Changing the output of tooling between majors

"Dinesh's message cautions against making "breaking" changes that are likely to 
break parsing of output by current users (e.g., changes to naming/meaning/"

That is 100% correct. So by that logic, changing the output which you grep on 
to something else will break your scripts if you expect it there.

For example, take sstablemetadata command - I know it is not nodetool but it 
does not matter. This is just an example. Same "problem" can be found in 
nodetool probably, sstablemetadata just came to my mind first as that is what I 
hit recently.

sstablemetadata write this:

Repaired at: 0
Originating host id: d2d12c56-7d9c-49a7-aaef-05bd2633b09e
Pending repair: --
Replay positions covered: {CommitLogPosition(segmentId=1689261027905, 
position=59450)=CommitLogPosition(segmentId=1689261027905, position=60508)}
totalColumnsSet: 0
totalRows: 1
Estimated tombstone drop times:

Do you see "totalColumsSet" and "totalRows" when all other keys in that ouput 
(in whole command) are following different format? In this case, it should be 
"Total columns set" and "Total rows".

So when we change it to that, anybody who is grepping "totalRows" will have no 
output. That is a breaking change to me. His script stopped to work.

You are correct and I agree with you completely that STRICT ADDITIONS (what I 
was suggesting) are fine because we are not breaking anything to anybody.

So here, if I want to change this, by what Dinesh says, (we change the naming 
and we break it), I need to offer JSON / YAML alternative to what 
sstablemetadata prints currently. (might be as well nodetool, just an example).

________________________________________
From: C. Scott Andreas <[email protected]>
Sent: Thursday, July 13, 2023 17:01
To: [email protected]
Cc: [email protected]
Subject: Re: Changing the output of tooling between majors

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

Dinesh's message cautions against making "breaking" changes that are likely to 
break parsing of output by current users (e.g., changes to 
naming/meaning/position of existing fields vs. adding new ones). I don't read 
his message as saying that any change to nodetool output is conditional on 
offering a JSON/YAML representation, though.

What are some changes that you'd like to make?

– Scott

On Jul 13, 2023, at 7:44 AM, "Miklosovic, Stefan" 
<[email protected]> wrote:

For example Dinesh said this:

"Until nodetool can support JSON as output format for all interaction and there 
is a significant adoption in the user community, I would strongly advise 
against making breaking changes to the CLI output."

That is where I get the need to have a JSON output in order to fix a typo from. 
That is if we look at fixing a typo as a breaking change. Which I would say it 
is as if somebody is "greping" it and it is not there, it will break.

Do you understand that the same way or am I interpreting that wrong?

________________________________________
From: C. Scott Andreas <[email protected]>
Sent: Thursday, July 13, 2023 16:35
To: [email protected]<mailto:[email protected]>
Cc: dev
Subject: Re: Changing the output of tooling between majors

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

"From what I see you guys want to condition any change by offering json/yaml as 
well."

I don't think I've seen a proposal to block changes to nodetool output on 
machine-parseable formats in this thread.

Additions of new delimited fields to nodetool output are mostly 
straightforward. Changes to fields that exist today are likely to cause 
problems - as Josh mentions. These seem best to take on a case-by-case basis 
rather than trying to hammer out an abstract policy. What changes would you 
like to make?

I do think we will have difficulty evolving output formats of text-based 
Cassandra tooling until we offer machine-parseable output formats.

– Scott

On Jul 13, 2023, at 6:39 AM, Josh McKenzie <[email protected]> wrote:

I just find it ridiculous we can not change "someProperty: 10" to "Some 
Property: 10" and there is so much red tape about that.
Well, we're talking about programmatic parsing here. This feels like 
complaining about a compiler that won't let you build if you're missing a ;

We can change it, but that doesn't mean the aggregate cost/benefit across our 
entire ecosystem is worth it. The value of correcting a typo is pretty small, 
and the cost for everyone downstream is not. This is why we should spellcheck 
things in API's before we release them. :)

On Wed, Jul 12, 2023, at 2:45 PM, Miklosovic, Stefan wrote:
Eric,

I appreciate your feedback on this, especially more background about where you 
are comming from in the second paragraph.

I think we are on the same page afterall. I definitely understand that people 
are depending on this output and we need to be careful. That is why I propose 
to change it only each major. What I feel is that everybody's usage / 
expectations is little bit different and outputs of the commands are very 
diverse and it is hard to balance this so everybody is happy.

I am trying to come up with a solution which would not change the most 
important commands unnecessarily while also having some free room to tweak the 
existing commands where we see it appropriate. I just find it ridiculous we can 
not change "someProperty: 10" to "Some Property: 10" and there is so much red 
tape about that.

If I had to summarize this whole discussion, the best conclustion I can think 
of is to not change what is used the most (this would probably need to be 
defined more explicitly) and if we have to change something else we better 
document that extensively and provide json/yaml for people to be able to 
divorce from the parsing of human-readable format (which probably all agree 
should not happen in the first place).

What I am afraid of is that in order to satisfy these conditions, if, for 
example, we just want to fix a typo or the format of a key of some value, the 
we would need to deliver JSON/YAML format as well if there is not any yet and 
that would mean that the change of such triviality would require way more work 
in terms of the implementation of JSON/YAML format output. Some commands are 
quite sophisticated and I do not want to be blocked to change a field in 
human-readable out because providing corresponding JSON/YAML format would be 
gigantic portion of the work itself.

From what I see you guys want to condition any change by offering json/yaml as 
well and I dont know if that is just not too much.

________________________________________
From: Eric Evans <[email protected]<mailto:[email protected]>>
Sent: Wednesday, July 12, 2023 19:48
To: 
[email protected]<mailto:[email protected]><mailto:[email protected]>
Subject: Re: Changing the output of tooling between majors

You don't often get email from 
[email protected]<mailto:[email protected]><mailto:[email protected]>. 
Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

On Wed, Jul 12, 2023 at 1:54 AM Miklosovic, Stefan 
<[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
 wrote:
I agree with Jackson that having a different output format (JSON/YAML) in order 
to be able to change the default output resolves nothing in practice.

As Jackson said, "operators who maintain these scripts aren’t going to re-write 
them just because a better way of doing them is newly available, usually 
they’re too busy with other work and will keep using those old scripts until 
they stop working".

This is true. If this approach is adopted, what will happen in practice is that 
we change the output and we provide a different format and then a user detects 
this change because his scripts changed. As he has existing solution in place 
which parses the text from human-readable output, he will try to fix that, he 
will not suddenly convert all scripting he has to parsing JSON just because we 
added it. Starting with JSON parsing might be done if he has no scripting in 
place yet but then we would not cover already existing deployments.

I think this is quite an extreme conclusion to draw. If tooling had stable, 
structured output formats, and if we documented an expectation that 
human-readable console output was unstable, then presumably it would be safe to 
assume that any new scripters would avail themselves of the stable formats, or 
expect breakage later. I think it's also fair to assume that at least some 
people would spend the time to convert their scripts, particularly if forced to 
revisit them (for example, after a breaking change to console output). As 
someone who manages several large-scale mission-critical Cassandra clusters 
under constrained resources, this is how I would approach it.

TL;DR Don't let perfect by the enemy of 
good<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPerfect_is_the_enemy_of_good&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cc64a38a8cbb04d68807908db83b4d34a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638248584902482700%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vHDB8PBizpHJLMRh%2BDg%2F8bKIOb2IyKMxF1p1lsqyDwE%3D&reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPerfect_is_the_enemy_of_good&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cc64a38a8cbb04d68807908db83b4d34a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638248584902482700%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vHDB8PBizpHJLMRh%2BDg%2F8bKIOb2IyKMxF1p1lsqyDwE%3D&reserved=0><https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPerfect_is_the_enemy_of_good&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cc64a38a8cbb04d68807908db83b4d34a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638248584902482700%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vHDB8PBizpHJLMRh%2BDg%2F8bKIOb2IyKMxF1p1lsqyDwE%3D&reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPerfect_is_the_enemy_of_good&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cc64a38a8cbb04d68807908db83b4d34a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638248584902482700%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vHDB8PBizpHJLMRh%2BDg%2F8bKIOb2IyKMxF1p1lsqyDwE%3D&reserved=0<https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good>>>>

[ ... ]

For that reason, what we could agree on is that we would never change the 
output for "tier 1" commands and if we ever changed something, it would be 
STRICT ADDITIONS only. In other words, everything it printed, it would continue 
to print that for ever. Only new lines could be introduced. We need to do this 
because Cassandra is evolving over time and we need to keep the output aligned 
as new functionality appears. But the output would be backward compatible. 
Plus, we are talking about majors only.

The only reason we would ever changed the output on "tier 1" commands, if is 
not an addition, is the fix of the typo in the existing output. This would 
again happened only in majors.

All other output for all other commands might be changed but their output will 
not need to be strictly additive. This would again happen only between majors.

What is you opinion about this?

To be clear about where I'm coming from: I'm not arguing against you or anyone 
else making changes like these (in major versions, or otherwise). If —for 
example— we had console output that was incorrect, incomplete, or obviously 
misleading, I'd absolutely want to see that fixed, script breakage be damned. 
All I want is for folks to recognize the problems this sort of thing can 
create, and show a bit of empathy before submitting a change. For operators on 
the receiving end, it can be really frustrating, especially when there is no 
normative change (i.e. it's in service of aesthetics).

--
Eric Evans<mailto:[email protected]<mailto:[email protected]>>
Staff SRE, Data Persistence
Wikimedia Foundation

Re: Changing the output of tooling between majors

Reply via email to