Date: Wed, 18 Sep 2024 07:21:30 -0400 From: Greg Wooledge <g...@wooledge.org> Message-ID: <zuq3umfwkcx3t...@wooledge.org>
| On Wed, Sep 18, 2024 at 08:05:10 +0300, Oğuz wrote: | > It boils down to this: | > f(){ echo $#;}; set "" "" ""; IFS=x; f $* | > bash, NetBSD and FreeBSD sh, and ksh88 all agree and print 2. pdksh | > prints 3 but mksh and oksh print 1. dash, ksh93, yash, and zsh print | > 0. There is no right answer there, 0 and 3 are the most likely results, but 1 and 2 are also possible. | At the risk of sounding like a broken record, using an unquoted $* or $@ | in a context where word splitting occurs is just *begging* for trouble. That's true, and while bug-bash perhaps isn't the best list for this (I deleted the other shell lists from this reply, as I don't think I get to send to those, as I'm not subscribed) that also isn't the point. As best I can tell there isn't really a shell implementers list (the austin group list gets some of it, but that covers the whole of the POSIX standard, and weird details of how some library function should work aren't really relevant to shell implementors). Steffen is concerned with what the implementation is supposed to do in these situations, not how some random script behaves, or doesn't. Giving guidance that you would give to a user having problems with a script they're trying to write isn't appropriate here, the only discussion that matters is what is the correct behaviour. And for the example from Oğuz, the value of IFS should be completely irrelevant, as there's nothing anywhere in that example which actually needs splitting. The expansion of $* (unquoted, in a context where field splitting occurs) is supposed to produce 3 fields (since there are 3 set numeric parameters) each of which contains nothing ("") - each of those is then subject to field splitting, but when there's nothing, there's nothing. The standard says that "any empty fields may be discarded" (that's actually before the field splitting is to happen, but here it makes no difference). Note the "any ... may", so the implementation is allowed to, but not required to, discard any of the three empty fields that have been produced. So it can discard none, (answer 3) or discard all of them (answer 0) which are the more reasonable choices, or it can discard 1 (answer 2) or 2 (answer 1) of the three. Any of those is possible. | Please don't do this in your scripts. So for a script writer, that's good advice, for an implementor of a shell, it is useless, it is also useless for determining whether or not a shell has a bug or not. Here, as best I can tell, none do - though I know that the shell I maintain (the NetBSD sh) gets to this point more by a fluke than anything else, treating the $* as if it were "$*" - except unquoted and thus subject to field splitting (perhaps bash does the same thing). Then the expansion of $* above gives xx (not nothing) which is then field split, which produces 2 fields (as each x is really a field terminator, no field follows the final one). It doesn't matter what IFS[0] is for this, as long as it isn't white space, the same result will always happen (the expansion inserts it, field splitting removes it). When IFS[0] is white space, different rules apply to the field splitting algorithm, which is why the results differ in that case. Nothing allows empty fields produced by field splitting to be discarded, so we end up with 2 fields remaining. How 0 or 3 are produced is easy to see (either all, or none, of the empty fields are discarded, either of which is a reasonable choice) - then field splitting would happen on the ones not discarded, but cannot split anything when the fields are empty. I'm not sure what the implementation mechanics are which would actually produce 1 field as the result. So Steffen, if you were writing a shell, then you could do whatever you like in this case, the value of IFS really should not matter at all, and either 0 or 3 fields are sensible answers. For a MUA, I think you get to do whatever you like, and trying to copy the various bizarre shell behaviour in this case (that different shells implement this differently is why the standard is so vague about what happens) doesn't make much sense. And certainly, if you're writing a script, just don't do things like this. kre