Hey.
On Fri, 2024-12-13 at 23:05 +0100, Bernhard Voelker wrote: > I never saw a practical example why it would be dangerous. Well it seems to me, that in that case even a 1 in Million chance might have too catastrophic consequences to wait for it happening in the wild. Again, consider the find ... | xargs rm -rf example, in which a "line" is truncated to an incomplete "/". > Usually a data producer is buffered, and therefore atomically > outputting entries in a consistent way. It may were well be not buffered, for example when someone uses stdbuf with mode 0 to executed some utility which internally calls xargs. > Is there proof from the wild that there was data loss? Does one need proof to argue that a problem that is clearly there and might cause severe problems should be fixed, even if it were extremely unlikely to happen? > Second, my main point, is that I believe that there is confusion > about what -0, --null stands for. > The usage output clarifies: > > -0, --null items are separated by a null, not > whitespace; > disables quote and backslash > processing and > logical EOF processing > > The crucial word is "separate" which means it is something in between > 2 entries: > entry1 <separator> entry2 > It is and was never a "terminator", i.e., something acknowledging > that the previous > entry is committed. > entry1 <terminator> > Consequently, the logical EOF processing is not neccessary and > therefore > disabled, as stated above. Well the POSIX 2018 edition had no -0 option, and the 2024 edition, uses the word "delimit", not "separate". Though I'd also argue that "delimit" is more like "separate" and not like "terminate". However, as written in my previous mail, the current POSIX 2024 also strongly recommends to "ignore" any lines a that are not NUL terminated ("xargs should ignore the trailing non-null bytes (as this can signal incomplete data)") and says that in the future this may become a MUST. And the Austin Group issue I've mentioned in the previous mail already makes clear, that the technical corrigendum 1 for POSIX 2024 will change the "ignore" to a "error in case of". The xargs manpage (and info page) even says - contrary to the program usage: > -0, --null > Input items are terminated by a null character instead of by > whitespace, and the quotes and backslash are not special > (every character is taken literally). *terminated*, not *separated* And for text files (i.e. without -0) it would have in principle always been clear, that there must be a final \n . > Third, a change like this one seems a tough one, because tons of > scripts and users > rely on existing behavior. Indeed. But at least I wouldn't want to explain to someone who lost all his data, that this happened "by design". One could also argue that from the contradicting usage / manpage documentation no one could have ever really relied on the current behaviour and that it was simply a bug. > Finally, xargs(1) is not alone: there are several tools in the same > boat which > have an option to treat input separated/terminated by '\0', and which > usually > accept regular newline or whitespace-separated input. > The latter usually mandates to have a terminating newline at least, > because POSIX > says that text files have to end on a newline; otherwise they'd be be > treated > as binary files. > How about those? Which tools are you thinking about? When I think e.g. about grep, than of course, if the input is incomplete, grep's output could be wrong, too, and that in turn could lead again to very bad consequences. But the difference to xargs is, that grep itself does nothing and whoever called the foo|grep pipe, could still examine whether foo succeeded (even before -o pipefail become part of POSIX 2024, this was in principle already previously possible (in a portable way) with a hacky construct of redirects). For xargs, checking the exit status of foo afterwards, would already be too late. > After all, at least #3 (known behavior) can strike back quite hard. > Therefore I suggest thinking well through all the possible cases, and > their > pros and cons. Definitely. Which is why this should be made a bug to track the various opinions. Perhaps one could also announce that this is being considered in the next release of findutils, and ask for input from the community. Another idea would be to leave the behaviour undefined at the POSIX level, and (also there) introduce yet another option, which enforces that a non-(NUL/LF)-terminated "line" is ignored. That would have the benefit that every implementation could stay backwards-compatible, but still allow people to go the safe way. The only downside of course being, that one doesn't get the safe way out of the box. Cheers, Chris.